I just got word that Red Gate has printed more copies of the book that they’ll be distributing at the Microsoft Professional Developers Conference that’s taking place in LA. I just wish we had a 2008 version of the book now because, while most of it is still applicable, there’s more that can be done with execution plans now.
Have I mentioned I think the missing index information that’s displayed with the statement text in the graphical execution plan in SQL Server 2008 is pretty slick? Well it is.
I hope they’re going to distribute it at the PASS Summit this year too. I think they are, but I don’t know that for a fact.
I’ve seen this question come by in the web searches multiple times. The nested loop join is also called an iterative join. This is because it takes the rows from the inner part of the join and compares them through an iterative process (one-by-one) to the rows in the outer part of the join. So, if the optimizer has correctly chosen this operation for your query, you should see FEWER rows in the top, or outer, part of the join and MORE rows in the bottom, or inner, part of the join.
Take this query as an example (run against AdventureWorks2008):
FROM [Sales].[SalesOrderHeader] soh
JOIN [Sales].[SalesOrderDetail] sod
ON soh.[SalesOrderID] = sod.[SalesOrderID]
WHERE soh.[SalesOrderID] = 47716
Here we have a single row from the SalesOrderHeader table and 55 rows from the SalesOrderDetail table. Here’s the execution plan:
This is a classic loop join. Now, if you see a loop join that’s reversed, with more rows on top or one where the same number of rows are on both sides of the join, that’s worth checking out as a problem. It could be missing or incorrect indexes, or, more likely, bad or out of date statistics. Do a statistics update, maybe with a full scan to see if that fixes the problem.
Update #2: As was pointed out in the comments below, I had the outer & inner labels swapped. I’ve updated the text to reflect the correction.
The other day a developer showed up at my desk. They were getting time-outs in production on a query that didn’t normally give them trouble. With the parameters they provided, I ran the query. It ran for over 30 seconds, the application side timeout, before it returned it’s data. So I ran it again with an execution plan. It had a bunch of index scans with loop joins across thousands of rows and even created a table spool with 700 million rows as part of the process. Clearly not good.
Next I looked at the query plan. It wasn’t too bad, as these things go. It was probably moving too many columns and apparently the business wanted a pivot on the data since they were using an aggregate method to pivot some of the columns. The only thing that jumped out at me about the query initially was the use of MAX(1) instead of TOP 1 for some of the versioned queries (look for an article in the next, and last, SQL Standard magazine on the topic of MAX vs. TOP).
I let the developer know that it was going to take a while to clean up everything and I got to work. A couple of other DBA’s saw what I was looking at and came over to offer suggestions (it sounds worse than it is, these guys are smart and helpful). There was a lot of discussion about the table spool. I said that changing the MAX queries to use TOP would fix it. There was a bit of argument, but the proof is always in the pudding, so I made the change (in a non-production environment, of course) and sure enough the table spool went away. We still had a bunch of index scans and the loop joins looked funny since there was 15,000 rows being filtered against 15,000 rows. One of the DBA’s with me, Scott, suggested that the statistics were out of date. So I updated the stats and reran the query. It didn’t make a bit of a difference. Another DBA, Det, suggested updating the indexes with a FULL SCAN. Sure enough, not only did the index scans go away, but the table spool and the joins were replaced with HASH joins that worked much better with the data we were returning.
A little more investigation found that 5000 rows had been added to the table for the organization we were working with, but somehow the random selection of rows used by the normal update of statistics wasn’t reflecting this change accurately. It needed the FULL SCAN. So, while scheduling the full scan is easy enough, we wanted to know how to know if it really was needed. Determining the age of the statistics is easy:
FROM sysindexes ind
WHERE object_name(ind.id) = ‘tablename’
But figuring out if the FULL SCAN is needed is still a bit elusive. Det’s suggestion was to query the statistics and then run a FULL SCAN update and then requery the statistics comparing the two. While that might show that the FULL SCAN, you just did, was needed, I’m still trying to find a more effecient mechanism.
All the videos I recorded on execution plans for JumpStartTV are now available. Please, go and check them out. Then hang around JumpStartTV and check out all the videos that Brian Knight did on SSIS. Also, if you’re interested, I think Andy and the team would like other people to submit videos as well. So if you’re watching my stumbling attempts and are convinced you can do better, do it and submit it.
I just finished setting up a new automated build process including TFS! I’m loving this new version. There’s so much more for me to learn and I’ve got to get it all in before PASS in November.
I was reading how they’ve changed the differential mechanism. It sounds like a huge improvement. Time for more testing.
I’ve been crazy busy and I didn’t mention this before, but last Wednesday I was elected (unapposed) to the post of President of the Southern New England SQL Server Users Group (SNESSUG).
A while back, I wrote a book, Dissecting SQL Server Execution Plans. Because of it, I had some conversations with Steve Jones & Andy Warren. For a SQL Server geek, heady company. Anyway, they asked me what my plans are for the book. Plans? I wrote it. I thought that was the plan. But they meant lessons, licensing and all that kind of stuff. I didn’t have a clue, but they did. A few weeks ago I flew down to Florida and recorded a bunch of short video lesson plans derived from the book and from discussions with Andy Warren. It was a blast.
Here’s a picture of me in their high-tech batcave/recording studio… OK. Kidding. It was a supply closet, but they’re running a seriously high-end operation down there, they’re just a bit pressed for space.
CTP 15 utterly hosed the virtual device I had it on. I tried uninstalling, but it just wouldn’t come off clean. I finally have rebuilt the virtual and reinstalled everything except VSTSDB GDR. Having learned my lesson (the hard way as usual), I started a differential on my virtual for the install of CTP 16 so that I can roll back and install the release candidate and the release as they come out without having to go through that whole rebuild thing again.
While Gert & crew are still making some changes, the fundamentals are still there so my presentation at PASS should work fine as currently defined. The only problem I ran into was that I couldn’t get the database to deploy by simply clicking on the deploy menu choice. It built the scripts for the deployment but it wouldn’t run them on the server specified. I tested the script. It worked great.
Some experimentation showed the problem. Even though the configuration I was on had the deployment event set to “Build Deployment Script and Deploy Database” it somehow hadn’t registered and it was only building the script. I simply clicked on the right drop down choice, tried again and everything went through, no issues.
So next, I tried running from a command line. It worked flawlessly. I can now deploy and only pass the configuration that I want and I’ll get everything built into the database. Unfortunately, when I was running the command line builds, it was opening a new instance of Visual Studio each time I ran it… I’ll do more research on Monday.
If you’re interested in VSTSDB, I’m presenting at the PASS Summit this year on that topic. Please swing by.
It looks like this year is going to be pretty good. They’ve expanded the program. If you’re not going, you should consider it.
Personally, I’m pretty excited. I’m presenting our build and deployment process using Visual Studio Team System Database Edition. I’ve got the first half of the presentation down cold because it’s how we’ve been doing our deployments for a couple of years now. The second half concerns me a bit. I’m showing the new release that works with SQL Server 2008. Unfortunately that new release isn’t done and I keep having to update my code and slides as the new versions come out with different functionality. I’m more than a little bit nervous about that part of the presentation. I wish there was a way to bounce it off Gert Drapers before I gave it. Maybe I’ll fire off an email to him after they finish coding. I’d do it sooner, but I’m not about to distract the man that’s making a tool I use so much.
Speaking of which, the new version, CTP 16, has a great addition. There’s an executable, I assume distributable without license, that allows for installs of VSTSDB projects without having a full Visual Studio install. This tool is going revolutionize, again, database development and deployment. You have to love it!
SQL Server 2008 has been released, but the full capabilities of the product are still be discovered. This isn’t a case of discovery so much as Microsoft, in the person of Bart Duncan, has shown us something new. Read the article. This is one of the most exciting things I’ve seen in a long time. It’s especially exciting as we see databases becoming less and less manageable through the implementation of tools like nHibernate.
As I noted in a previous post, nHibernate will create a lot of execution plans. With the capabilities here, we’ll be able to easily and quickly aggregate some of those plans to identify the costly queries coming out of nHibernate without having to resort to 24 hour Profiler monitoring.