I’m getting my first look at a full-fledged nHibernate database developed by consultants for our company. I thought I’d share my initial impressions. I’ll be capturing trace events from the database over the next couple of weeks, so I’ll be following up on the behavior of nHibernate within this database as well.
The first thing I saw & thought was, “Foreign key constraints. Thank the gods.” That really is good news. I was frankly concerned that they might go with the “let the code handle it” approach. There are quite a few null columns. I’m also seeing tons & tons of nvarchar(255) which must the default string size. Lots of bit fields too. They also used bigint in a lot of places too. None of this is definitively good or bad, just observations.
There are tables that lack a primary key. That raises a bit of a concern. The primary keys I’ve looked at have all been clustered, which isn’t too problematic since that’s probably the primary access path. There are a few unique constraints on some of the tables too.
Overall though, I don’t see anything that, at first glance, makes me want to run screaming from the room (or pick up a big stick & start looking for developers). The devil is no doubt in the details. Time to get started looking at the trace events.
I had previously outlined a problem with nHibernate code that, in my opinion, was going to cause a lot of problems on production systems especially with memory and the procedure cache. However, it does appear that there is a possible solution. That doesn’t eliminate my concerns over nHibernate, but it sure does help.
If I’m going to have to support it, I want to understand it. So, I got going yesterday, installing nHibernate 2.0 and walking through the Quick Start Guide. My C# is a bit rusty, to say the least, but I managed to sqeak by. The Guide is meant for an older version of nHibernate and there have been a few changes made that affect the code displayed. What that means is, I had to do more than simply type up what was presented to me. Which, was actually good because it forced me to do a bit more learning in order to get everything to work.
What I found was interesting. I can see why developers like this. It really does let you treat the database as just another object to program against. More than that, with pretty minimal work (some of which could be eliminated by code generation), you don’t have to think about databases at all. It’s slick, no denying it. More good news, the TSQL queries it generates are very consistent and appropriately formatted parameratized queries. This gives me a great deal of comfort that we’ll get consistent results with this product in place. I also didn’t find it terribly chatty, meaning I didn’t see lots of extraneous calls made to the database. That’s all the good news. Now for the bad. Here’s a sample of the code:
exec sp_executesql N’SELECT user0_.LogonId as LogonId0_0_, user0_.Name as Name0_0_, user0_.Password as Password0_0_, user0_.EmailAddress as EmailAdd4_0_0_, user0_.LastLogon as LastLogon0_0_ FROM users user0_ WHERE user0_.LogonId=@p0′,N’@p0 nvarchar(8)’,@p0=N’joe_cool’
Yes, that’s hard to read. Sorry, but that’s how it comes out, so that’s part of what you’ll be living with. First off, and this is usually a minor nit, but can be a serious problem, the generated code from the quick start guide did not provide a schema for the table, ‘users’. In tiny little procs like this, that’s neither a performance hit nor is it likely to cause expensive recompiles. If the tool generates larger more complex queries, that is potentially an issue. Second, and much more important, here’s the DDL to create the ‘users’ table.
CREATE TABLE [dbo].[users](
[LogonID] [nvarchar](20) NOT NULL DEFAULT (‘0’),
[Name] [nvarchar](40) NULL DEFAULT (NULL),
[Password] [nvarchar](20) NULL DEFAULT (NULL),
[EmailAddress] [nvarchar](40) NULL DEFAULT (NULL),
[LastLogon] [datetime] NULL DEFAULT (NULL),
PRIMARY KEY CLUSTERED
) ON [PRIMARY]
) ON [PRIMARY]
Nothing fancy. But can you spot the discrepancy? The clustered primary key is on a NVARCHAR(20) column. The code that calls it is defaulting each parameter, not to the size of the column, but to the amount of data supplied, NVARCHAR( 8 ) in this case. That very simple query could use implicit data conversions and cause the indexes to not be used. That isn’t a problem here, but it’s something I’m going to be keeping a close eye on.
Ultimately, it looks like this is something I can support. I’m going to spend more time with it to see if I can work out these minor kinks and to introduce joins and more complex data structures to see how well it deals with things. This should make things interesting for a while. I still wonder how we’re going to deal with this in production.
Steve Jones’ editorial today was questioning the use of LINQ. His focus was on the security aspects. The discussion went past that. This is a bit of circular reference since I posted over there and linked back to my ORM Concerns post below. There are a lot of interesting points being made. Some of it, from a DBA stand-point, is quite repetitive although I’d prefer to think of it as reinforcing. Steve’s editorial is worth a read and the discussion is excellent.
Object Relational Mapping (ORM) software is a great idea. You can’t deny that the mismatch between objects and relational data has to be dealt with. Instead of all the time, money and effort being spent here, why not get a tool that does most of the work for you? But… One direction that this can lead is towards dumb databases. After all, if putting a piece of software between the object & the db makes things easier, how much easier if the db and the object look exactly the same. Ta-da! Even less code to write & maintain. Unfortunately, TANSTAAFL (There Ain’t No Such Thing As A Free Lunch) still applies. What you save in initial coding you will pay for in reporting, data cleanup, integrity issues, data integration issues… Anyway, I’ve been researching this, since, as I mentioned before, my company is looking to implement ORM and the architects in charge of the project are really excited by the idea of making the database a reflection of the object. Here are the list of concerns and potential issues that I’ve come up regarding ORM. Any comments or suggestions around this would be useful. In no particular order:
- I/O increase due to “chattier” applications:
Most documentation indicates a lot more, smaller transactions, not to mention the possibility of frequent requests to verify structure (the app checking to see if the database has changed) prior to generating & running queries.
Mitigation is to ensure appropriate configuration & use of nHibernate. Monitoring can be done with Profiler
- I/O increase due to loading larger data sets more frequently
Ensure the use of “lazy” collections to reduce data moved
No other mitigation possible.
- General performance issues due to “generic” procedures using less efficient access methods
Generic data types used in queries can lead to indexes not being used
- Zero possibility to tune queries in a production setting
Any, all, changes require re-coding and re-deployment. There are no methods available for database only tuning except applying an index or forcing parameterization.
This includes transactional level reporting as well as moving data between a model driven design and a more flexible design (normalized or star schema or whatever) that better supports reporting. Coding time reduced on the front end is, to some degree, tacked on to the back-end.
A data cleansing mechanism may be required.
Redundancy of data an authoritative sources for data may require some refinement.
- Data integrity
Generated structures are dependent on discovery to determine the proper constraints required on the data, or all constraints are assumed to be in the code.
Without integrity maintained with data the possibility of “dirty” data is increased (“USA”, “U.S.A.”,”US” all values entered through app).
We will have to give over full read/write privileges at the table level to the application. I think, it’s possible, we’d have to give it ‘dbo’ in production. Based on a few statements in some of the research, it’s even possible we’d have to give it ‘sa’ (although that is completely unproven currently).
- Integration with other systems at the data level
Depending on the application, this may not be required. But if it is required at any point, it will entail a larger than normal effort to convert the data to a more normalized structure.
I posted this question over at SQL Server Central, just like my last post, I’m also posting it here. I need some help. I’ve been trying to research this and I can’t find good, hard facts. Any help would be deeply appreciated.
It looks like we might be facing a large project shifting over to using ORM methods through nHibernate. I’m trying to get a read from the database community on what exactly I should expect in terms of issues, challenges and headaches during the development process. I’m also interested in any long term maintenance issues, troubleshooting problems, etc. If your developers implemented ORM all the way down to storing object data on the database in a non-normalized/object oriented fashion, how did that affect you? Did it muck up reporting? What benefits did you realize on the database side of the house? I’m really as interested in benefits as I am costs.
I’m really looking for real-world, hands on information. Complaints or speculation about how stupid a lowest common denominator set of dynamic queries might be… well, I’ve got that complaint well in hand. I need as much hard data as I can collect so that I communicate enough information to my boss, his boss and his boss in order for them to make informed decisions about this and go into it with their eyes fully pinned open.