Re-evaluating Execution Plans (again)

June 26, 2009 at 7:23 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

I keep thinking I’ve got a handle on the way execution plans are dealt with in SQL Server. I had a pretty simplistic view of things, there’s the estimated plan which comes out of the optimizer and there’s the actual plan which comes out of the data engine. The one shows the best guess for what will happen based on the query and the underlying statistics. The other shows what actually happened based on the query and the real data, load on the server, available memory, you get the idea. This model is easy to explain and understand. Too bad it’s not quite the way things work.

If you query the dynamic management function sys.dm_exec_query_plan, you can see a query plan. Once you drill down on the XML, or browse through the GUI, you’ll see that this is an estimated plan, with no execution statistics at all. The safe assumption here is that SQL Server takes the execution plan created by the optimizer and stores it. Ah, but is that exactly how it works? Can you generate an estimated execution plan for this batch statement:

CREATE TABLE dbo.TempTable (id INT IDENTITY(1, 1)
,val VARCHAR(50) ) ;
INSERT INTO dbo.TempTable
(val)
VALUES
(‘dude’) ;
SELECT tt.*
FROM dbo.TempTable AS tt ;
DROP TABLE dbo.TempTable ;

No. You can’t. If you try to generate an estimated plan you’ll get the error “Invalid object name ‘dbo.TempTable’.” This is because the algebrizer, a part of the process of the optimizer, which is what generates the estimated plan, can’t resolve the table dbo.TempTable because it doesn’t exist yet. However, if you execute the batch above and then run this query:

SELECT deqp.query_plan
FROM
sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
WHERE
dest.text LIKE ‘create table%’

You can see the estimated plan for executing this batch. So… if the optimizer creates estimated plans, but it can’t create an estimated plan for this query since the algebrizer can’t resolve the object name… where did this estimated plan come from? Good question.

Here’s some more interesting info. You know that you can capture both estimated & actual execution plans through trace events, right? Watch what happens if you run the batch statement above and capture both the “Showplan XML” and “Showplan XML Statistics Profile” events, estimated and actual, respectively. You’ll get the estimated plan that is being stored in sys.dm_exec_query_plan and you’ll get an actual execution plan including actual rows vs. estimated, etc. Where did that execution plan come from? Another good question.

A better man than I, Tibor Karaszi, first noticed that there was, other than the execution statistics, no difference between “XML Showplan” and “XML Showplan Statistics Profile.” He asked whether or not we should ever expect to see a difference in a forum post. I jumped in with both feet, of course you’ll see a difference. After all, one is the estimated plan and one is the actual plan. Mr. Karaszi corrected a couple of my assumptions and suddenly, there I was, not understanding execution plans, again. I didn’t know where those estimated plans were coming from, not for sure. I assumed the optimizer, but then how were they getting past the algebrizer?

I had a theory and a mechanism for testing it. If I set up a trace and capture the recompile events using SQL:StmtRecompile and return the EventSubClass column, I can see what caused a recompile. When I run this query and check the trace information, I find that the EventSubClass resolves to 3-Deferred Compile. I’m not 100% certain that I’m correct here, but the following is what I think occurrs.

Because of the CREATE statement, the batch is going to fail in the optimizer. But, because it’s a DDL statement, it goes on to execute anyway. Then, a recompile puts the batch back through the optimizer. There, a table is now available where none was previously. An estimated execution plan is created. This plan, previously non-existent gets stored and can be accessed by us.

Within the facts available, this seems to explain the behavior. In fact, it’s basically supported by Connor Cunningham, one of the MS SQL Server developers, in some blog posts over at Conner vs. SQL.

Permalink Leave a Comment

Visual Studio Team System Article

June 24, 2009 at 8:06 am (Visual Studio) (, , )

I wrote up an article on how we’re configuring & deploying databases to disparate systems using a combination of database projects, server projects and compound projects in conjunction with configurations that has been published over at SQL Server Central. Please click over & read it.

Permalink Leave a Comment

Learning Spatial Data in SQL Server 2008

June 23, 2009 at 6:39 am (PASS, spatial data) (, , , , )

PASS just published a new Top 10 list of mine over at the their web site. This one is the Top 10 Articles on the web if you’re trying to learn about spatial data. I’m not trying to say that I know what I’m doing with spatial data. I’m still feeling my way forward. These articles have proved to be the most useful in the learning I’ve done so far. I wanted to publish a little more information than we put into our Top 10 lists over at PASS. Having the list is good, but I thought it needed a bit of commentary to be complete.

All these articles are good and all the web sites hosting the articles have good stuff on them. A few of the web sites stand out. Paul Randal‘s site covers quite a lot more than spatial data, but he writes with such clarity that his posts are a must read. When you’re ready for more, you need to read the stuff at Bob Beauchemin’s blog. Again, there’s more than just spatial data to be had there, but Mr. Beauchemin has really done the research and he’s good enough to compile it for the rest of us. Those are, sort of, the introductory sites. When you’re ready to really and truly just go with all things spatial, the two sites that are going to prove most useful (or, at least that I’ve found most useful) are Spatial Ed(Ed Katibah) and Isaac Kunen. These are two of the people responsible for creating the spatial engine inside SQL Server. We can thank them for that, but better still, these guys are good about communicating what they’ve done, what it means, how it works, applications, ideas… You get the drift. If you’re really pursuing spatial data as an important component of your enterprise data, you need to read their stuff.

There are a few links that I couldn’t easily fit into the top 10 since they’re not discrete articles. I’ve already blogged about and reviewed Alastair Aitchison’s excellent book, Beginng Spatial with SQL Server 2008, but it’s worth another plug. You will also want to browse through the functionality being posted at the SQL Server Spatial Tools site over at CodePlex. Stuff there is pretty useful for getting your own functionality… functional. Finally, when you get stuck, if you get stuck, one of the best places to get unstuck is on the MSDN spatial forum, where a lot of the people already mentioned are answering questions and posting.

These are the resources I’ve found most useful in the little bit of spatial I’ve learned so far. I hope the top 10 list and this explanation of it prove useful.

UPDATE: Fixed link problem.

Permalink Leave a Comment

Object Database Editorial

June 17, 2009 at 7:57 am (nHibernate, Object Relational Mapping, SQL Server 2005, SQL Server 2008)

I never used to read editorials. Not in emails, magazines, newspapers, whatever. Now, I make it a point of always reading them. You can learn as much from an editorial as you can from the technical articles within, sometimes more. Tony Davis has just posted a guest-editorial over at SQL Server Central. Tony is normally the editor at Simple-Talk, where he also writes interesting editorials. This one is not to be missed. It makes a very clear, and concise case for why object databases have a fundamental flaw for most business needs (not all, not always, but a pretty hefty majority). It’s worth a read.

Permalink 1 Comment

The Best Thing I learned at PASS

June 17, 2009 at 7:35 am (PASS, SQLServerPedia Syndication)

And if you think keeping this below 250 words was easy… You try it.

I’ve received a lot from the PASS conference. I’ve attended technical sessions that blew my socks off and made me twitch in my seat, fighting the urge to start writing TSQL code, immediately. I don’t think any of that is the best thing I learned at PASS. No, the best thing I learned at PASS is summed up in a single phrase, You Can.

My first summit, I ventured out of the hotel room and met people volunteering with PASS. They were having fun. I asked if I could join in. You Can, just volunteer. I volunteered and began to meet people online. The next summit, I met more people, names, people I had learned from. I asked them, can I hang out with you? You Can, pull up a chair. I started learning more, directly from the source. I began building a network of people who helped me and who I could help.

I began to use You Can myself. I was asked if I wanted to write a book. You Can. I tried submitting abstracts to speak at PASS. You Can’t was the first answer, but You Can came the following year.

Go to the PASS Summit. You Can meet the people you’ve learned from. You Can build a network of fellow database professionals. You Can learn from technical sessions. You Can volunteer to make PASS an even better communittee. You Can submit an abstract to share knowledge that’s unique to you. You Can.

Permalink 4 Comments

Powershell White Paper

June 14, 2009 at 7:14 am (PowerShell, SQLServerPedia Syndication) (, )

Allen White, who introduced me to the wonders of PowerShell at last year’s PASS Summit, has published a white paper on using PowerShell within SQL Server 2008 with Microsoft. This is great news. If you’re, as Brent Ozar so nicely put it, a noob like me in working with Powershell, this is going to be a great resource.

Congratulations Allen, and thanks.

Permalink 4 Comments

More Spatial Headaches

June 11, 2009 at 2:38 pm (spatial data, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

I keep thinking I’ve got a handle on spatial data and I keep finding out I’m wrong. I rewrote the cursor that my co-worker used into a query that joined the list of test criteria onto our list of locations. I used an OUTER APPLY to more or less run a check for each of the test criteria since, except for the proximity to the locations, there’s no actual relationship between the test criteria and the location data for me to join on.

The query looked something like this:

SELECT a
.CriteriaDesc
,a.CriteriaLoc
,l.[Location].Lat AS LocationLatitude
,l.[Location].Long AS LocationLongitude
,l.LocationDesc
FROM dbo.Alert AS a
OUTER APPLY (SELECT x.[Location]
		FROM dbo.MyLocs x
		WHERE x.OrgID = 42
		AND x.[Location].STDistance(a.AlertLocation) < 50000) AS l

The cursor was taking almost a full minute to return just under 400 rows worth of data. This query took about 25 seconds. SCORE!

Unfortunately when I looked at the execution plan, the optimizer was not able to recognize that my spatial index would increase the performance. So I modified the query to provide a hint:


FROM dbo.MyLocs x WITH (INDEX(xsHigh))

Now the query returned in 5 seconds. BIG SCORE! Then I remembered my co-worker and his low density index. So I quickly rebuilt it and changed the INDEX hint to use the low density index. It ran in just over 2 seconds. Yeah, it’s a score, but why? I thought the high density index was the way to go with point locations. I decided I just wanted to see the middle behavior and see if a medium density index took 3.5 seconds about mid-way between the low and high density indexes… It ran in just over 1 second. WHAT THE F…

I then ran a whole series of experiments with my query. I kept changing the density and the cells per object value for the three indexes and recording times. In every test, the medium density index outperformed the low and high density indexes. It never outperformed the low density index by a whole lot, usually running ~50% faster as opposed to running 5 times faster than the high density index. But it really did run better.

My current working theory, and I’m trying to figure out how I can validate this, is that the low and medium density indexes were more efficient at eliminating the negative values, the places where the criteria wasn’t in proximity to my locations. Then, the medium density index was just better at retrieving the data than the low density index. I base this on the fact that the primary filter efficiency on the medium density index is  better than the low (although not as good as the high).

Permalink 2 Comments

PASS Summit Session Preview

June 10, 2009 at 12:35 pm (PASS, SQLServerPedia Syndication) (, , )

The Pre/Post Conference Sessions and the Spotlight Sessions for the PASS Summit 2009 have been announced. Go check it out. This is going to be a fantastic conference this year based on the topics and the people presenting (and I’m still humbled to be included with that group).

I wasn’t aware that they had shifted the structure of the conference so that it takes place from Tuesday-Thursday instead of Wednesday-Friday. I’m not sure I like that. I’m not sure I dislike that.

In addition to the Pre/Post conferences, a special new session, and program, has been put together by Andy Warren on networking. Since networking is one of the very best reasons to attend the PASS Summit (although pure geek joy can be had in the sessions), this should be very useful.

Things are starting to ramp up around the Summit. If you’ve gone in years past, I think this one may be better than before. If you’ve never been, I think this may be the one to get started on.

Permalink 9 Comments

Spatial Indexes and a Lack of Data

June 9, 2009 at 1:44 pm (spatial data, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , )

I was feeling quite confident about my new-found abilities with spatial indexes so I did a presentation for my team, to share what I had learned. I had also been sharing with one co-worker as I developed the knowledge of spatial indexes. While I was preparing my presentation, he was preparing his. I had focused on finding a set of data that showed it’s proximity to a test location and then showing how retrieving that set of data was faster because of the spatial index. He took a different approach. He took the idea of saying, here’s a list of different test locations, let’s see which one of our internal locations meet the proximity test. At the same time, he tried three different spatial indexes, one with high granularity, one with medium and a final one with low granularity.

The day came for presentations. I showed how the spatial index improved performance and how you could read from sp_help_spatial_geography_index to see how efficient the index was for the sample data. It all went without a hitch. Then my co-worker showed his tests. Wherein the low density index outperformed high or medium density indexes, despite having a primary_filter at about 5% efficiency. I was floored. I couldn’t explain and nor could my co-worker. But as far he was concerned, this whole spatial index thing was for the birds.

I went away and I started testing. Finally, after feeling like I had a handle on things, I asked my co-worker for his scripts. He had a cursor (which he acknowledged was a lousy way to write the query) that walked through the test locations, one at a time, and ran my query against our data. Sure enough, when I carefully built out the indexes and ran the test, the low density indexes worked better than the high density indexes. I set up a trace event and captured all the statement completion events so I could compare the high & low density runs. That’s when I spotted the issue.

The majority of the test locations didn’t find any matches in our location data. When the query ran against the low density index it would usually have a few hundred reads and a duration around 20ms to return no data. On the other hand, the same query, returning no data, on the high density index had about 5000 reads and took about 34ms. With each one of these cumulative queries running faster against the low density index, it really appeared to be better. But, when you got a query where data was returned, the low density index had more reads and a lot longer duration, about 500ms for the low density index compared to a consistant 34ms for the high density index.

There was my explanation. The cursor masked the fact that over and over again, the query was running but no data was being returned. I’m in the process now of rewriting the query to use the same set of test locations, but JOIN it against our location data to see which of the indexes is faster.

Permalink 3 Comments

Microsoft SQL Server Premier Field Engineers

June 8, 2009 at 8:12 am (SQL Server 2005, SQL Server 2008, TSQL) (, , )

Joe Sack has started a new team blog for the Microsoft SQL Server Premier Field Engineers. If you don’t know who they are, you should. The first post is just introductory, but this blog is likely to become a great resource. These are the guys that MS zip lines into tough situations with the expectations that they’ll improve them. I’d strongly suspect these are fellows worth listening to.

Permalink Leave a Comment

Next page »