I’ve blogged in the past about the nHibernate project that has been going on at my company for the last eighteen months. Prior to today, I have only seen the database generated by nHibernate. But today I finally started seeing some TSQL code. My first impressions… oy!
There are two levels of concern coming out of the gate. First, it appears that some of the programming decisions, completely independent of nHibernate, are going to cause problems. Second, it appears we’re going to be hitting issues with nHibernate.
First, the programming decision. I’ve been using Confio to monitor this server for a few days now (look for upcoming posts on my eval of Confio). Only one day has captured any real activity from the nHibernate team (and yes, I’m basically spying on these guys because they are absolutely insistent that my team stay out of their way). The longest running query was two calls to this (names have been changed to protect my job, there are no innocents):
FROM Books WITH (updlock,rowlock)
WHERE BookID = 42
What they’re doing is locking the row so that no other user can get at it while it’s being edited. Total lock time for the first day was 38 seconds for two calls. I’ve posted about this previously, so I won’t go into it again, but this is potentially a disaster.
On to nHibernate. The next query was pretty benign:
WHERE OfficeID = 42
Unfortunately this table, as created out of nHibernate, has no primary key, no clustered index or any other type of index, so this is just a table scan. But it’s only on 109 rows… in dev, not production, oh, and most of the 109 rows have a null value for OfficeID, but it’s all good I suppose… until I notice that this query also had 38 seconds of wait time, but it was called 394,652 times… oy (and no, I’m not even one of the chosen, but that’s a wonderful expression for this situation). Looking at the data in cache, this query has been called, since it was created in cache on the 2nd, 598351 times with a total elapsed time on the server of 5429689ms. The average then is 9ms, but the max was 29296 or 29 seconds.
The next query up looked like this:
INNER JOIN Table2
ON Table1.ChildID = Table2.OtherID
WHERE tabl2.ParentID IN (@p1,@p1,@p2…@p99)
Yep. 99 parameters passed and used against an IN statement. I can’t post the execution plans on this stuff without talking it over with the boss, but suffice to say, it’s two table scans and a merge join to get the data back. Oh, and this was called 1,138 times with a minimum elapsed time of 976ms. Yeah, minimum time to call this proc is 1 second and it’s been called over a thousand times.
It went down hill from there. I’m looking at other queries, one that hits seven tables, but instead of using JOINs uses a bunch of sub-selects in the WHERE clause. It has a minimum run time of 9 seconds. Luckily it’s only been called 64 times.
This is a horror show.
A lot of benefit can be reaped if we’re able to go in and create some indexes on these tables, but that’s going to be an issue anyway because I’m seeing lots of implicit data conversions on the parameters being passed in, as I noted a couple of years ago. At this point in time, I would say, at best, I’m very underwhelmed by what nHibernate is producing. To be perfectly honest, I’m concerned for the health of the project.
I realize I’m prejudiced, being one of those evil DBA’s & all, but I can’t help but agree with him. It’s a short post, but worth the read.
I’m getting my first look at a full-fledged nHibernate database developed by consultants for our company. I thought I’d share my initial impressions. I’ll be capturing trace events from the database over the next couple of weeks, so I’ll be following up on the behavior of nHibernate within this database as well.
The first thing I saw & thought was, “Foreign key constraints. Thank the gods.” That really is good news. I was frankly concerned that they might go with the “let the code handle it” approach. There are quite a few null columns. I’m also seeing tons & tons of nvarchar(255) which must the default string size. Lots of bit fields too. They also used bigint in a lot of places too. None of this is definitively good or bad, just observations.
There are tables that lack a primary key. That raises a bit of a concern. The primary keys I’ve looked at have all been clustered, which isn’t too problematic since that’s probably the primary access path. There are a few unique constraints on some of the tables too.
Overall though, I don’t see anything that, at first glance, makes me want to run screaming from the room (or pick up a big stick & start looking for developers). The devil is no doubt in the details. Time to get started looking at the trace events.
I had no idea, but evidently I started a bit of a donny-brook. Cool!
I had previously outlined a problem with nHibernate code that, in my opinion, was going to cause a lot of problems on production systems especially with memory and the procedure cache. However, it does appear that there is a possible solution. That doesn’t eliminate my concerns over nHibernate, but it sure does help.
SQL Server 2008 has been released, but the full capabilities of the product are still be discovered. This isn’t a case of discovery so much as Microsoft, in the person of Bart Duncan, has shown us something new. Read the article. This is one of the most exciting things I’ve seen in a long time. It’s especially exciting as we see databases becoming less and less manageable through the implementation of tools like nHibernate.
As I noted in a previous post, nHibernate will create a lot of execution plans. With the capabilities here, we’ll be able to easily and quickly aggregate some of those plans to identify the costly queries coming out of nHibernate without having to resort to 24 hour Profiler monitoring.
I did a little bit, and I mean a little bit, of looking through the documentation on nHibernate and located a spot for the schema, actually a couple of spots. It can be added to the Hibernate Mapping definition, which will make it a default for all classes within the definition, and by extension all the tables in the database you connect to. You can also add it to the class definition, specifying a particular schema for a given table. So now the query looks like this:
exec sp_executesql N’INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)’,N’@p0 nvarchar(9),@p1 nvarchar(6),@p2 nvarchar(13),@p3 datetime,@p4 nvarchar(9)’,@p0=N’Jane Cool’,@p1=N’abc123′,@p2=N’firstname.lastname@example.org’,@p3=’2008-04-25 11:11:48:000′,@p4=N’jane_cool’
On to the data length problem.
If I’m going to have to support it, I want to understand it. So, I got going yesterday, installing nHibernate 2.0 and walking through the Quick Start Guide. My C# is a bit rusty, to say the least, but I managed to sqeak by. The Guide is meant for an older version of nHibernate and there have been a few changes made that affect the code displayed. What that means is, I had to do more than simply type up what was presented to me. Which, was actually good because it forced me to do a bit more learning in order to get everything to work.
What I found was interesting. I can see why developers like this. It really does let you treat the database as just another object to program against. More than that, with pretty minimal work (some of which could be eliminated by code generation), you don’t have to think about databases at all. It’s slick, no denying it. More good news, the TSQL queries it generates are very consistent and appropriately formatted parameratized queries. This gives me a great deal of comfort that we’ll get consistent results with this product in place. I also didn’t find it terribly chatty, meaning I didn’t see lots of extraneous calls made to the database. That’s all the good news. Now for the bad. Here’s a sample of the code:
exec sp_executesql N’SELECT user0_.LogonId as LogonId0_0_, user0_.Name as Name0_0_, user0_.Password as Password0_0_, user0_.EmailAddress as EmailAdd4_0_0_, user0_.LastLogon as LastLogon0_0_ FROM users user0_ WHERE user0_.LogonId=@p0′,N’@p0 nvarchar(8)’,@p0=N’joe_cool’
Yes, that’s hard to read. Sorry, but that’s how it comes out, so that’s part of what you’ll be living with. First off, and this is usually a minor nit, but can be a serious problem, the generated code from the quick start guide did not provide a schema for the table, ‘users’. In tiny little procs like this, that’s neither a performance hit nor is it likely to cause expensive recompiles. If the tool generates larger more complex queries, that is potentially an issue. Second, and much more important, here’s the DDL to create the ‘users’ table.
CREATE TABLE [dbo].[users](
[LogonID] [nvarchar](20) NOT NULL DEFAULT (‘0’),
[Name] [nvarchar](40) NULL DEFAULT (NULL),
[Password] [nvarchar](20) NULL DEFAULT (NULL),
[EmailAddress] [nvarchar](40) NULL DEFAULT (NULL),
[LastLogon] [datetime] NULL DEFAULT (NULL),
PRIMARY KEY CLUSTERED
) ON [PRIMARY]
) ON [PRIMARY]
Nothing fancy. But can you spot the discrepancy? The clustered primary key is on a NVARCHAR(20) column. The code that calls it is defaulting each parameter, not to the size of the column, but to the amount of data supplied, NVARCHAR( 8 ) in this case. That very simple query could use implicit data conversions and cause the indexes to not be used. That isn’t a problem here, but it’s something I’m going to be keeping a close eye on.
Ultimately, it looks like this is something I can support. I’m going to spend more time with it to see if I can work out these minor kinks and to introduce joins and more complex data structures to see how well it deals with things. This should make things interesting for a while. I still wonder how we’re going to deal with this in production.
Object Relational Mapping (ORM) software is a great idea. You can’t deny that the mismatch between objects and relational data has to be dealt with. Instead of all the time, money and effort being spent here, why not get a tool that does most of the work for you? But… One direction that this can lead is towards dumb databases. After all, if putting a piece of software between the object & the db makes things easier, how much easier if the db and the object look exactly the same. Ta-da! Even less code to write & maintain. Unfortunately, TANSTAAFL (There Ain’t No Such Thing As A Free Lunch) still applies. What you save in initial coding you will pay for in reporting, data cleanup, integrity issues, data integration issues… Anyway, I’ve been researching this, since, as I mentioned before, my company is looking to implement ORM and the architects in charge of the project are really excited by the idea of making the database a reflection of the object. Here are the list of concerns and potential issues that I’ve come up regarding ORM. Any comments or suggestions around this would be useful. In no particular order:
- I/O increase due to “chattier” applications:
Most documentation indicates a lot more, smaller transactions, not to mention the possibility of frequent requests to verify structure (the app checking to see if the database has changed) prior to generating & running queries.
Mitigation is to ensure appropriate configuration & use of nHibernate. Monitoring can be done with Profiler
- I/O increase due to loading larger data sets more frequently
Ensure the use of “lazy” collections to reduce data moved
No other mitigation possible.
- General performance issues due to “generic” procedures using less efficient access methods
Generic data types used in queries can lead to indexes not being used
- Zero possibility to tune queries in a production setting
Any, all, changes require re-coding and re-deployment. There are no methods available for database only tuning except applying an index or forcing parameterization.
This includes transactional level reporting as well as moving data between a model driven design and a more flexible design (normalized or star schema or whatever) that better supports reporting. Coding time reduced on the front end is, to some degree, tacked on to the back-end.
A data cleansing mechanism may be required.
Redundancy of data an authoritative sources for data may require some refinement.
- Data integrity
Generated structures are dependent on discovery to determine the proper constraints required on the data, or all constraints are assumed to be in the code.
Without integrity maintained with data the possibility of “dirty” data is increased (“USA”, “U.S.A.”,”US” all values entered through app).
We will have to give over full read/write privileges at the table level to the application. I think, it’s possible, we’d have to give it ‘dbo’ in production. Based on a few statements in some of the research, it’s even possible we’d have to give it ‘sa’ (although that is completely unproven currently).
- Integration with other systems at the data level
Depending on the application, this may not be required. But if it is required at any point, it will entail a larger than normal effort to convert the data to a more normalized structure.