Re-evaluating Execution Plans (again)

June 26, 2009 at 7:23 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

I keep thinking I’ve got a handle on the way execution plans are dealt with in SQL Server. I had a pretty simplistic view of things, there’s the estimated plan which comes out of the optimizer and there’s the actual plan which comes out of the data engine. The one shows the best guess for what will happen based on the query and the underlying statistics. The other shows what actually happened based on the query and the real data, load on the server, available memory, you get the idea. This model is easy to explain and understand. Too bad it’s not quite the way things work.

If you query the dynamic management function sys.dm_exec_query_plan, you can see a query plan. Once you drill down on the XML, or browse through the GUI, you’ll see that this is an estimated plan, with no execution statistics at all. The safe assumption here is that SQL Server takes the execution plan created by the optimizer and stores it. Ah, but is that exactly how it works? Can you generate an estimated execution plan for this batch statement:

CREATE TABLE dbo.TempTable (id INT IDENTITY(1, 1)
,val VARCHAR(50) ) ;
INSERT INTO dbo.TempTable
(val)
VALUES
(‘dude’) ;
SELECT tt.*
FROM dbo.TempTable AS tt ;
DROP TABLE dbo.TempTable ;

No. You can’t. If you try to generate an estimated plan you’ll get the error “Invalid object name ‘dbo.TempTable’.” This is because the algebrizer, a part of the process of the optimizer, which is what generates the estimated plan, can’t resolve the table dbo.TempTable because it doesn’t exist yet. However, if you execute the batch above and then run this query:

SELECT deqp.query_plan
FROM
sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
WHERE
dest.text LIKE ‘create table%’

You can see the estimated plan for executing this batch. So… if the optimizer creates estimated plans, but it can’t create an estimated plan for this query since the algebrizer can’t resolve the object name… where did this estimated plan come from? Good question.

Here’s some more interesting info. You know that you can capture both estimated & actual execution plans through trace events, right? Watch what happens if you run the batch statement above and capture both the “Showplan XML” and “Showplan XML Statistics Profile” events, estimated and actual, respectively. You’ll get the estimated plan that is being stored in sys.dm_exec_query_plan and you’ll get an actual execution plan including actual rows vs. estimated, etc. Where did that execution plan come from? Another good question.

A better man than I, Tibor Karaszi, first noticed that there was, other than the execution statistics, no difference between “XML Showplan” and “XML Showplan Statistics Profile.” He asked whether or not we should ever expect to see a difference in a forum post. I jumped in with both feet, of course you’ll see a difference. After all, one is the estimated plan and one is the actual plan. Mr. Karaszi corrected a couple of my assumptions and suddenly, there I was, not understanding execution plans, again. I didn’t know where those estimated plans were coming from, not for sure. I assumed the optimizer, but then how were they getting past the algebrizer?

I had a theory and a mechanism for testing it. If I set up a trace and capture the recompile events using SQL:StmtRecompile and return the EventSubClass column, I can see what caused a recompile. When I run this query and check the trace information, I find that the EventSubClass resolves to 3-Deferred Compile. I’m not 100% certain that I’m correct here, but the following is what I think occurrs.

Because of the CREATE statement, the batch is going to fail in the optimizer. But, because it’s a DDL statement, it goes on to execute anyway. Then, a recompile puts the batch back through the optimizer. There, a table is now available where none was previously. An estimated execution plan is created. This plan, previously non-existent gets stored and can be accessed by us.

Within the facts available, this seems to explain the behavior. In fact, it’s basically supported by Connor Cunningham, one of the MS SQL Server developers, in some blog posts over at Conner vs. SQL.

Permalink Leave a Comment

More Spatial Headaches

June 11, 2009 at 2:38 pm (spatial data, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

I keep thinking I’ve got a handle on spatial data and I keep finding out I’m wrong. I rewrote the cursor that my co-worker used into a query that joined the list of test criteria onto our list of locations. I used an OUTER APPLY to more or less run a check for each of the test criteria since, except for the proximity to the locations, there’s no actual relationship between the test criteria and the location data for me to join on.

The query looked something like this:

SELECT a
.CriteriaDesc
,a.CriteriaLoc
,l.[Location].Lat AS LocationLatitude
,l.[Location].Long AS LocationLongitude
,l.LocationDesc
FROM dbo.Alert AS a
OUTER APPLY (SELECT x.[Location]
		FROM dbo.MyLocs x
		WHERE x.OrgID = 42
		AND x.[Location].STDistance(a.AlertLocation) < 50000) AS l

The cursor was taking almost a full minute to return just under 400 rows worth of data. This query took about 25 seconds. SCORE!

Unfortunately when I looked at the execution plan, the optimizer was not able to recognize that my spatial index would increase the performance. So I modified the query to provide a hint:


FROM dbo.MyLocs x WITH (INDEX(xsHigh))

Now the query returned in 5 seconds. BIG SCORE! Then I remembered my co-worker and his low density index. So I quickly rebuilt it and changed the INDEX hint to use the low density index. It ran in just over 2 seconds. Yeah, it’s a score, but why? I thought the high density index was the way to go with point locations. I decided I just wanted to see the middle behavior and see if a medium density index took 3.5 seconds about mid-way between the low and high density indexes… It ran in just over 1 second. WHAT THE F…

I then ran a whole series of experiments with my query. I kept changing the density and the cells per object value for the three indexes and recording times. In every test, the medium density index outperformed the low and high density indexes. It never outperformed the low density index by a whole lot, usually running ~50% faster as opposed to running 5 times faster than the high density index. But it really did run better.

My current working theory, and I’m trying to figure out how I can validate this, is that the low and medium density indexes were more efficient at eliminating the negative values, the places where the criteria wasn’t in proximity to my locations. Then, the medium density index was just better at retrieving the data than the low density index. I base this on the fact that the primary filter efficiency on the medium density index is  better than the low (although not as good as the high).

Permalink 2 Comments

Spatial Indexes and a Lack of Data

June 9, 2009 at 1:44 pm (spatial data, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , )

I was feeling quite confident about my new-found abilities with spatial indexes so I did a presentation for my team, to share what I had learned. I had also been sharing with one co-worker as I developed the knowledge of spatial indexes. While I was preparing my presentation, he was preparing his. I had focused on finding a set of data that showed it’s proximity to a test location and then showing how retrieving that set of data was faster because of the spatial index. He took a different approach. He took the idea of saying, here’s a list of different test locations, let’s see which one of our internal locations meet the proximity test. At the same time, he tried three different spatial indexes, one with high granularity, one with medium and a final one with low granularity.

The day came for presentations. I showed how the spatial index improved performance and how you could read from sp_help_spatial_geography_index to see how efficient the index was for the sample data. It all went without a hitch. Then my co-worker showed his tests. Wherein the low density index outperformed high or medium density indexes, despite having a primary_filter at about 5% efficiency. I was floored. I couldn’t explain and nor could my co-worker. But as far he was concerned, this whole spatial index thing was for the birds.

I went away and I started testing. Finally, after feeling like I had a handle on things, I asked my co-worker for his scripts. He had a cursor (which he acknowledged was a lousy way to write the query) that walked through the test locations, one at a time, and ran my query against our data. Sure enough, when I carefully built out the indexes and ran the test, the low density indexes worked better than the high density indexes. I set up a trace event and captured all the statement completion events so I could compare the high & low density runs. That’s when I spotted the issue.

The majority of the test locations didn’t find any matches in our location data. When the query ran against the low density index it would usually have a few hundred reads and a duration around 20ms to return no data. On the other hand, the same query, returning no data, on the high density index had about 5000 reads and took about 34ms. With each one of these cumulative queries running faster against the low density index, it really appeared to be better. But, when you got a query where data was returned, the low density index had more reads and a lot longer duration, about 500ms for the low density index compared to a consistant 34ms for the high density index.

There was my explanation. The cursor masked the fact that over and over again, the query was running but no data was being returned. I’m in the process now of rewriting the query to use the same set of test locations, but JOIN it against our location data to see which of the indexes is faster.

Permalink 3 Comments

ORDER BY Speed

June 3, 2009 at 10:26 am (SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , )

I answered a question on SSC with the comment that while an INT would perform better than a DATETIME in an ORDER BY query, assuming each has a viable index, that the difference wouldn’t be all that terribly substantial. Then I realized, maybe that’s not true. So I ran up a quick test, just to see.

First I created a little test table with the right indexes and loaded it with data:

CREATE TABLE dbo.IntDate
(IntCol INT NOT NULL,
DateCol DATETIME NOT NULL);

CREATE INDEX ixInt
ON dbo.IntDate(IntCol);
CREATE INDEX ixDate
ON dbo.IntDate(DateCol);

SELECT TOP 10000 IDENTITY( INT,1,1 ) AS n
INTO #Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2;

INSERT INTO dbo.IntDate (
IntCol
,DateCol)
SELECT t.n,
DATEADD(dd,- t.n,GETDATE() )
FROM #Tally AS t;

DROP TABLE #Tally;

Then I ran these two queries. Each one correctly accessed the index, doing a scan, to retrieve the data:

SELECT id.DateCol 
FROM dbo.IntDate AS id
ORDER BY BY DateCol DESC

SELECT  id.IntCol
FROM dbo.IntDate AS id
ORDER BY IntCol DESC

Here are the execution times:

Table 'IntDate'. Scan count 1, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 70 ms.
(10000 row(s) affected)
Table 'IntDate'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 59 ms.

But what happens if I reverse the order?

SELECT id.DateCol 
FROM dbo.IntDate AS id
ORDER BY DateCol ASC 

SELECT id.IntCol
FROM dbo.IntDate AS id
ORDER BY IntCol ASC

Then results are:

Table 'IntDate'. Scan count 1, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 52 ms.
(10000 row(s) affected)
Table 'IntDate'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 5 ms.

So, there can be a somewhat significant difference, although at least part of that is explained by the differences in the number of reads. Not exactly earth shattering, but good to know.

Permalink 2 Comments

Spatial Index & Performance & Brain Pain

May 28, 2009 at 10:36 am (spatial data, SQL Server 2008, TSQL) (, , , , , )

In my previous post, Spatial Data Hurts My Brain, I showed how a query used the spatial index, but hurt performance. Several people jumped in to try to help out. Thanks to Bob Beauchamin, Isaac Kunin, Gail Shaw, and Valerie Yakich (via Twitter).

I’ve definately learned a bit more than I knew previously about spatial data and spatial indexes. I’ve posted about it before, but it bears repeating, if you’re just starting out, I can’t recommend Alistair Aitchison’s book, Beginning Spatial with SQL Server 2008, enough. Here’s where I’m at. I think I was confusing/confused/confounded/something about what to expect from a spatial index. I’m going to present two queries and two indexes and try to explain why each works well, or not together, mainly as a means for enabling my understanding, not so much teaching anyone else. The first query is simply a retrieval of the data from a database where the distance between a supplied point and the points stored in the geography column are less than a certain value:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000

When this is run against the table with only a clustered index on the PK, this is the execution plan:

 spatialindex_1

About what you would expect. The query returns almost 5000 rows of the 500000 in the table and runs for 14 seconds on my machine. Now I’m going to create this index and run the query again.

CREATE SPATIAL INDEX [ebe_High] ON [dbo].[ebe] 
(
[Location]
)
USING GEOGRAPHY_GRID 
WITH
(GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH)
, CELLS_PER_OBJECT= 1024) ON [PRIMARY]
GO

When the query is run a second time, here’s the execution plan.

 spatialindex_2

This time the query ran in under a second to return the same rows. Clearly, spatial indexes do work to speed up execution. The execution plan is hard to see here I realize. I can’t upload zipped files or .sqlplan files, so if you really want a copy of the plan, I’ve pasted both into a Word doc: spatial_plans

I’m not going to try to explain everything happening with the plan, because I’m not entirely sure what all is. I’m still figuring that out. Suffice to say that the table valued function and the constant scan operator feed data into a seek against the clustered index (which is missing statistics for some reason) where it’s fed to a compute scalar and then finally it’s loop joined to the clustered index in what I assume is the equivalent of a key lookup. Again, I’m still figuring this part out. I skipped some steps.

Now, I want to modify the query because the goal is to not simply get all locations near this point of interest, but the locations of a specified organization:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000
AND ebe.[OrgId] = 4464

When I run this query, I get the following execution plan:

 spatialindex_3

The query ran in 96ms and returned 148 rows. It’s all you could ask for. Or is it? What if I add this index:

CREATE NONCLUSTERED INDEX [ebe_nc] ON [dbo].[EBE] 
(
[OrgId]
ASC
) INCLUDE (InterestId,Location)
GO

Now when I run the query, I get the following execution plan:

 spatialindex_4

Much, much less complex and quite easy to explain. I have a covering index that does a simple seek to retrieve the initial set of data and then it’s filtered to remove the locations that are too far from the point of interest. But, here’s where it gets interesting. The execution time was 126ms. The spatial index shaved about 30ms, or about 25%, off the time of what would normally be the fastest access path, a covering non-clustered index. That is a substantial savings.

I’ve done a lot of testing over the last few days and I didn’t always see this kind of performance benefit. Several times I’ve seen worse performance. Also, I’m not sure why a brand new index has missing statistics, as seen in the execution plans when the where the spatial index was used. Further, the optimizer is choosing the nonclustered index over the spatial index, clearly because of the estimated costs, when the spatial index does, in fact, perform better in this instance. I’m not sure what to do about that, apart from dropping the nonclustered index. That’s OK. It just means more to learn. Starting with, explaining exactly what’s happening in the spatial index execution plan. That may be the next post.

UPDATE:
I forgot to mention. I ran the sp_help_spatial_geography_index DMV to see what information it could give, but when I supplied the parameter value listed above, I received a filter efficiency of zero. I’m clearly not understanding how to apply this DMV.

Permalink 3 Comments

Odd TSQL Behavior

May 20, 2009 at 2:18 pm (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, )

Before I describe this, let me thank Lynn Pettis over at SQL Server Central for coming up with the answer.

This morning a developer walked up and asked me what would happen if he ran a query that looked like this:

SELECT 1.SomeString

I said that he’d get an error. No, he says, try it. So I try it and I get this:

Somestring
1

Try it yourself. It works just fine. I’d never seen that before and didn’t have a clue what it was. Thinking that Microsoft had supplied some new short hand to define aliases I ran this:

SELECT 'dude'.dudette

Which resulted in the error:

 

Msg 258, Level 15, State 1, Line 1
Cannot call methods on varchar.

 

Which is what I would have expected. I tried a couple of functions and some other bits & pieces of data, but only integers seemed to work… Or was it? What would be the purpose of a function that defines aliases for an integer?

I finally posted a question up on SQL Server Central. One of the local brain trust over there, Lynn Pettis, chewed on it for a bit and identified the issue. SQL Server is “helping” you by assuming that you meant to put a space between the period (dot or . ) and the string, Somestring, as an alias. I then tried this:

SELECT 'dude'dudette

And was rewarded with the same behavior.

Personally, in this case, I don’t see this as TSQL “helping” me nearly so much as it confused the heck out of me. Thanks again for figuring it out Lynn.

Permalink 2 Comments

Database Screening Questions

April 30, 2009 at 1:58 pm (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, )

With all the cool kids posting about beginners and interview questions, I thought I’d toss my favorites out there, from the brief-case gang point of view. These are the technical phone-screening questions I use after I look at a resume. There are only 10. They’re simple. Stupid simple. Silly even. Yet, I can count on eliminating 4 out 5 people who have a resume that looks like a qualified DBA. I’ve seen people with 10 years experience fail on these questions.

I’m only going to provide the questions. If you can’t find the answers on your own, you’re already disqualified:

  1. What is the difference between a clustered and non-clustered index?
    No, don’t tell me that one is clustered and the other is not. I don’t need specific low-level information on this, just a demonstration of knowledge that the difference is understood.
  2.  What is the difference between a block (b – l – o – c – k) and a deadlock (d – e – a – d – l – o – c – k)?
    And yes, I spell the words. I don’t want any chance of misunderstanding. And yet, most people carefully explain what a block is and then carefully explain what a block is again.
  3. Can you tell me two of the three recovery models in SQL Server and what the difference between them is?
    Again, I don’t need to know what happens differently inside of the checkpoint operation, just tell me what’s different. Yeah, I only ask for two since almost everyone only uses one of the two.
  4. Can you tell me a few things that might cause a query stored in cache to recompile?
    Let me tell you that, yes, they do. I’ve had several people argue with me on that question.
  5. What do you think the query hint NO_LOCK might do?
    This should be a give away. I’m not asking for specifics. I’m assuming you don’t know. Why would you say “I have no idea” to a question like this?
  6. Can you tell me some of the various types of backups that are available in SQL Server?
    If you give me three, I’ll be overjoyed. I need at least two.
  7. How did error handling change in SQL Server 2005?
    Note, not how do you write error handling based on the change in 2005, just, what was the change. I need to know you’re aware there was one.
  8. Do you have any experience working with [latest hot topic] inside SQL Server?
    Our latest is Microsoft Dynamics CRM. We’ve also asked the question about XML and other stuff. It’s just an attempt to understand you. Talk about what you know or don’t know.
  9. Do you have experience with Version X of SQL Server?
    Now I ask about 2008, but before I asked about 2005. “No” is a perfectly acceptable answer. “I’ve never heard of it” or “That’s not out yet” or “No, but I have lots of experience with 2009” are pretty much disqualifiers. Broke my rule about no answers there, but I hate seeing people get this one wrong.
  10. You’re the DBA. The phone rings. One of the users is on the line. They say “The database is slow.” Then they hang up. What do you do?
    My favorite was the guy who wanted to track down the user in order to get his name and his managers name and to fill out a series of forms before he’d even consider the technical aspects of the question. This is the only open-ended question I ask for screening.

Preparing this I went back through my notes. I keep notes on every interview. It’s creepy. Page after page of people who can’t answer even four of these questions. We only want you to correctly answer six before we bring you in for an interview.

So, if you’ve got five or ten years experience as a DBA and you think this was a tough quiz… time to evaluate what you’ve been doing. If you’re just starting out, here are some of the basics that it might be nice to know.

Permalink 32 Comments

Tim Ford’s Top 5 Indexing Best Practices

April 27, 2009 at 8:31 am (SQL Server 2005, SQL Server 2008, TSQL) (, , )

All I can really add to this is, yeah, me too. If you want some absolutely great advice on indexes, read this post. It’s a must.

And might I add, I’ve been the bad guy in Tim’s example. Once, many, many years ago, I was reading from the SQL Server 7.0 documentation. It suggested that compound indexes were no longer needed since the optimizer could build them on the fly using index intersection. I had a performance problem and a consultant was telling me to use a compound index. I swore up and down it wouldn’t work because Microsoft said so. He kept pushing and I kept pushing back. Finally, after a rather heated discussion in which I was convinced I had the upper hand, I got off the phone resolved to show this “ID 10 T” he didn’t know what he was talking about… Let me just say that after running some tests I did NOT enjoy the next phone call. Crow really tastes nasty.

Great post Tim. I’m looking forward to the book.

Permalink 1 Comment

Spatial Data

April 21, 2009 at 7:33 am (spatial data, SQL Server 2008, TSQL) (, )

I work for an insurance company. If you think that maybe, we might be interested in the physical location of the things we insure, you’d be right. Actually, we’re an insurance company predicated on the idea that risk can be managed. That means that not only do we know where your factory is located. We know the wind zone, earthquake zone, flood zone, rain zone and temperature zone it’s in. We send engineers out to the site to inspect it and recommend upgrades. We track the upgrades and the condition of your facility.

With all that location specific information, just how important do you think it is that with SQL Server 2008 we’re finally getting a spatial data type? Yeah, exactly.

We’re in the process of launching our first full implementation of the spatial data type and, quite frankly, I was not ready. I had read several of the BOL entries and a few articles online, but nothing had given me enough information for me to say I understood how spatial data worked. Now I at least feel like I’ve got a basis for understanding. That’s because of Alistair Aitchison‘s book Beginning Spatial.

This was not an easy read for me. The first few chapters are frankly difficult. The concepts behind spatial data are not exactly simple. However, Mr. Aitchison did a great job of presenting the information in well explained, digestible chunks. I was able to get through. Coming out the other side, I’ve still got a lot of questions and concerns (especially around spatial indexes, he could have done another three or four chapters on just that topic), but I’m ready to support our project going forward now.

If you’re dealing with Spatial data and you’re not already a guru on the topic, I’d strongly suggest you pick this up. If you’re an expert, this book isn’t for you. There are plenty of examples and Mr. Aitchison walks you through some important concepts, such as importing spatial data. Did you realize that Microsoft didn’t include any mechanisms for dealing with in SSIS? Instead you either need to get creative with TSQL or, better still, use a third party product. I got a copy of Shape2SQL and I’ve tried it out. It seems to do what we need. We still might end up buying a commercial product (several were listed in the book). But it was the information in the book that told me what I needed to know to get started working with our spatial data.

I need to track down more info on spatial indexes now.

Permalink Leave a Comment

SQL Server Standard Article Available

April 7, 2009 at 1:21 pm (PASS, SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , )

Unfortunately PASS decided to put the SQL Server Standard to sleep right after I got an article published in it (and no, it wasn’t my fault). However, PASS, being the service oriented organization that they are, has decided to make back issues of the magazine available online. So, to read the article I wrote comparing various methods of retrieving versioned data using different TSQL constructs, click this link. Go to page 14. Oh yeah, and you can see other peoples articles here too.

There are other things coming out of the editorial committee soon (although I need to get one of them done myself… yikes).

Permalink 3 Comments

« Previous page · Next page »