I like to look at the searches that pop up as links into the blog just for ideas on posts. What the heck do I do with this one: “nested loop joins” “sql server 2008”
Yes, SQL Server 2008 has Nested Loop Joins. They do the same thing in 2008 that they did in 2005 and 2000.
The short answer; just like eating an elephant, one bite at a time. The longer answer…
First you have to define what you mean by a complex execution plan. For example, I’ve seen 16 page stored procedures that consist of about 50 or more different statements executed in series (and I ‘ve heard of worse). No single statement was very big, but trying to dig through all the statements to identify which may be causing slow execution was a problem. This is the easiest type of complex execution plan to solve. In SQL Server 2005 and 2008, when you turn on a graphical plan for a query with multiple statements, each statement has a seperate execution plan. At the top of the plan is the query definition and a very important statement: Query Cost (relative to batch): x%.
This is telling you just how much the statement in question costs compared to the rest of the statements in the query. This is how you figure out which of the 50 or 500 statements you need to spend your time with. Easy.
More difficult is the second kind of query. This one statement contains hundreds of operations (the icons in a graphical execution plan). Now what? First, these queries, just like the small ones, have a cost for each operation. You just have to know how to navigate around the enormous execution plan. You can browse around using the scroll bars. Better still, there’s a little plus sign (+) in the lower right hand corner of the execution plan window.
Clicking on this opens an image of your entire execution plan, all umpty-gazillion operators, that allows you to scroll around the plan in the regular viewer window. You can simply browse until you find the operation that cost the most.
The image above is from a query written by one of our development teams. I’m not sure how many operators there are in the query. The query itself has six seperate CTE’s that are joining against a series of tables and recursing back to themselves. We’re working on tuning it. It’s a good example of the kind of query that you’ll need to browse through a lot of operators to find the costly one. In this case, one of the CTE’s results in 2.4 billion (yes, that’s a ‘b’) rows getting cross-joined and then filtered down to 50 rows that ultimately return. We found this by scrolling through the UI where we saw an incredibly fat pipe connecting one of the operaters to a join.
That should be enough to get you started. There’s more, including some tricks you can do with XML, but I think I’ll put it together in an article.
I just found my first fundamental difference between execution plans in SQL Server 2005 and SQL Server 2008. In 2005 when you run an query and have it display an XML execution plan by issuing the statement “SET STATISTICS XML ON” you get a second result set that includes a URL. The same thing happens in 2008. When you click on that URL in 2005 it opens the XML file in a new window. If you want to look at the graphical plan, you have to save the file with a “*.SQLPLAN” exentension. In 2008 if you click on the URL, it opens a new window, but it shows the graphical plan. What happened to the XML? The developers at Microsoft are smart, that’s what happened. If you then right click on that graphical plan a menu choice “Show Execution Plan XML…” is available. Clicking on that opens the XML plan.
So I went one step further. I knew that the graphical plan in 2005 was just an XML plan, if not so readily accessible (do a “Save As” and force an XML extension on it). So I right clicked on a graphical plan that I generated the old fashioned way. Sure enough, the same menu choice to show the XML was there. Nice functionality.
Sort of like Where’s Waldo but with geeks. I assure you, I’m somewhere in this photo.
I mean me, not you. I’ve been accepted to present at PASS. I’m jazzed and totally freaked at the same time. I put in two abstracts, one based on my book, “Dissecting SQL Server Execution Plans” and the other based on an article I wrote published at SQL Server Central, “Deploying with DBPro to Multiple Environments.” Why then am I freaked? I’ve spent a year delving into execution plans. I won’t say I’m an expert, but I’m comfortable. I’ve been using DBPro for two years now, but I’m hardly eating and breathing it on a daily basis. Well, I wasn’t. From this point forward I’ll be neck deep in it daily. Watch for posts on this topic.
I just spent two days learning about project management and the Feature Driven Development methodology from Jeff De Luca. He’s a fascinating and informative guy. He’s actually going to be running a project and mentoring a bunch of people where I work. It’s going to be interesting times. I expect to learn a lot.
Why buggy whips? What the heck do they have to do with FDD? Nothing, directly. A big part of FDD is the development of business models. These models can, and usually do, directly correlate to objects/classes in code. Because of this, object oriented methods are, not an inherent part of FDD, but certainly easily automated and used by those designing and developing systems in FDD. Buggy whips? I’m getting to it. Mr. De Luca has spent a lot of time working in object oriented languages, primarily Java and working with lots of object oriented development tools. Identifying automation methodologies to assist in developing, with or without these objects, is an inherent part of FDD and any intelligent developer’s approach. Buggy whips? Hang on. One major area of automation is around what the object oriented developer thinks of as the persistence layer. Others might refer to it as a database. Mr. De Luca very clearly stressed that writing TSQL and designing data storage were, for him and true adherents to object oriented approaches, a thing of the past. Sure, the need for data warehouses and operational data stores for historical storage and ad hoc reporting were necessary, but the days of designing a database along side the application were over.
Buggy whips. A long time ago, when I was just getting started in IT, I was working with desktop publishing software. I knew a bunch of people that did hot & cold typesetting and other types of traditional publishing. They were all convinced that desktop publishing was a niche or a flash in the pan. But someone I knew back then pointed out that these guys were manufacturing buggy whips as the Model T drove by. In other words, they were about to be out of a job and had better wake up and smell the coffee.
Buggy whips. I’ve worked as a developer and a dba and finally landed in the somewhat odd position of being a full time development dba. That means I spend a healthy chunk of every day thinking about database design and writing better TSQL code and trying to train developers in the same. I’m sort of wondering if I just saw a Model T drive by?
I was thinking about this and composing this post when I read Steve Jones’ editorial this morning. DBA’s are becoming a more demanded skill set. Of course, that’s the generic DBA. It doesn’t specify if that’s someone to design a BI system, a warehouse, run your backups, set up your DR plan, or help tune queries and design tables that aren’t coming out of an ORM tool.
Buggy whips. I think I agree with Steve that the constant growing of data means more and more demand for people to manage it, but now I’m wondering if that’s just the management side and not the development and design side? I’m wondering if I need to look into moving back into development if I want to do more than simply manage systems? I’m thinking maybe I need to spend more time learning BI. I’m wondering if anyone else saw that Model T drive by?
Just because a doctor says take two aspirin, taking an entire case won’t make you better. If one is good and two is better, 2 million might not be best. What the heck am I talking about? We have an application that for one reason and another, has some overly complex procedures and queries. A lot of time and effort has been spent getting these queries to work well. Some of that time & effort came from one Microsoft consultant, Bill Sulcius. He’s a great guy, very smart, very helpful and I learned a lot from him. On a few of the queries that Bill helped tune, he added some query hints, FAST 1 and KEEPFIXED PLAN. Where he added these hints, he could demonstrate, with good sets of data and thorough tests, a real benefit to their use. Excellent. Bill left the project and moved on. Unfortunately, because the project had gone to production and we had already put our very best efforts into it, my team sort of took our eye off it for a while. Oops.
After Bill left, leaving behind two aspirin in the form of those query hints, on a few queries in a few stored procedures, the developers got hold of them. These hints had been introduced because the optimizer was a bit overwhelmed by the complexity of some of these queries and had used HASH or MERGE joins in places where a LOOP join would have been better. The FAST 1 hint pretty much forces a LOOP join, everywhere. The intent of the hint is to sacrifice time spent in the optimizer trying to find an optimal plan and instead, get a plan that returns the first row as fast as possible, letting the rest of the processing and the rest of the rows come when they’re available. This can be a very useful hint. Unfortunately, our developers looked at it, saw it was useful, knew that, in some, and in their minds, all, joins work better as LOOP’s, so they decided to put it everywhere. I mean, EVERYWHERE. It was on every single SELECT statement in every single procedure in the entire project. Be afraid.
Lo and behold, over the last few weeks, we’ve been getting all kinds of time-outs on queries that have been working “fine” for a couple of years. Now, understand, “fine” is a very relative word. In this case, it means, the business users weren’t complaining. These queries have had cruddy performance for a long time. I can’t post all the code because it’s way too long and it’s private. I’ll just show some parts of the execution plan.
Here’s part of the plan that is using FAST 1:
If you click on it to see it larger, you’ll see a series of Index Seeks with Nested Loop Joins. The cost relative to the batch is 0%. Nothing could be better. However, this is the FAST 1 execution plan. Behind this plan, you have no idea what is being done to retrieve the rest of the data. In fact, according to Brad McGehee(scroll down to the bottom), there’s a whole second plan masked behind this one, retrieving the rest of the data after the FAST 1 is complete.
If I remove the FAST 1 hint, two things happen. First, the execution plan changes:
Looking at this new plan, it appears to be more costly. The percentage of the batch went to 3% and instead of nice nested loops, we’re seeing a merge. However, the number of scans were cut in half from 4822 to 2142 and the reads were the same going from 15151 to 7249. Overall execution time was reduced as well. This is a faster query.
It is possible in similar circumstances, using the types of queries outlined above, to acheive performance increases. The Query Optimization Team at Microsoft explains how in their blog. You should carefully test this in your own code and remember, the execution plan you see isn’t the total picture.
I’ve spent five days struggling with this, but I finally have a custom performance collection up and running. The documentation available for Operations Manager 2007 (SCOM), especially for customization, especially for databases, is weak, and that’s being kind. However, there are some guys working hard out there to make a difference. If you need help, the System Center Forum is the place to go. In particular, this little ditty on using property bag objects finally cracked the code for me. The good news, I’m going to write this up in an article for Steve over at SQL Server Central. The better news, I can finally report progress in my daily stand-ups. Sheesh!