Mr. Kunen just submitted this to the comments on the first Brain Pain post, but it deserves a bit more obvious, top of the bill placement. I’m just reading through the posts now, but I think it’s safe to say, these are must reads for everyone.
In my previous post, Spatial Data Hurts My Brain, I showed how a query used the spatial index, but hurt performance. Several people jumped in to try to help out. Thanks to Bob Beauchamin, Isaac Kunin, Gail Shaw, and Valerie Yakich (via Twitter).
I’ve definately learned a bit more than I knew previously about spatial data and spatial indexes. I’ve posted about it before, but it bears repeating, if you’re just starting out, I can’t recommend Alistair Aitchison’s book, Beginning Spatial with SQL Server 2008, enough. Here’s where I’m at. I think I was confusing/confused/confounded/something about what to expect from a spatial index. I’m going to present two queries and two indexes and try to explain why each works well, or not together, mainly as a means for enabling my understanding, not so much teaching anyone else. The first query is simply a retrieval of the data from a database where the distance between a supplied point and the points stored in the geography column are less than a certain value:
SELECT * FROM dbo.ebe WHERE ebe.[Location].STDistance(@location) < 15000
When this is run against the table with only a clustered index on the PK, this is the execution plan:
About what you would expect. The query returns almost 5000 rows of the 500000 in the table and runs for 14 seconds on my machine. Now I’m going to create this index and run the query again.
CREATE SPATIAL INDEX [ebe_High] ON [dbo].[ebe] ( [Location] ) USING GEOGRAPHY_GRID WITH (GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH) , CELLS_PER_OBJECT= 1024) ON [PRIMARY] GO
When the query is run a second time, here’s the execution plan.
This time the query ran in under a second to return the same rows. Clearly, spatial indexes do work to speed up execution. The execution plan is hard to see here I realize. I can’t upload zipped files or .sqlplan files, so if you really want a copy of the plan, I’ve pasted both into a Word doc: spatial_plans
I’m not going to try to explain everything happening with the plan, because I’m not entirely sure what all is. I’m still figuring that out. Suffice to say that the table valued function and the constant scan operator feed data into a seek against the clustered index (which is missing statistics for some reason) where it’s fed to a compute scalar and then finally it’s loop joined to the clustered index in what I assume is the equivalent of a key lookup. Again, I’m still figuring this part out. I skipped some steps.
Now, I want to modify the query because the goal is to not simply get all locations near this point of interest, but the locations of a specified organization:
SELECT * FROM dbo.ebe WHERE ebe.[Location].STDistance(@location) < 15000 AND ebe.[OrgId] = 4464
When I run this query, I get the following execution plan:
The query ran in 96ms and returned 148 rows. It’s all you could ask for. Or is it? What if I add this index:
CREATE NONCLUSTERED INDEX [ebe_nc] ON [dbo].[EBE] ( [OrgId] ASC ) INCLUDE (InterestId,Location) GO
Now when I run the query, I get the following execution plan:
Much, much less complex and quite easy to explain. I have a covering index that does a simple seek to retrieve the initial set of data and then it’s filtered to remove the locations that are too far from the point of interest. But, here’s where it gets interesting. The execution time was 126ms. The spatial index shaved about 30ms, or about 25%, off the time of what would normally be the fastest access path, a covering non-clustered index. That is a substantial savings.
I’ve done a lot of testing over the last few days and I didn’t always see this kind of performance benefit. Several times I’ve seen worse performance. Also, I’m not sure why a brand new index has missing statistics, as seen in the execution plans when the where the spatial index was used. Further, the optimizer is choosing the nonclustered index over the spatial index, clearly because of the estimated costs, when the spatial index does, in fact, perform better in this instance. I’m not sure what to do about that, apart from dropping the nonclustered index. That’s OK. It just means more to learn. Starting with, explaining exactly what’s happening in the spatial index execution plan. That may be the next post.
I forgot to mention. I ran the sp_help_spatial_geography_index DMV to see what information it could give, but when I supplied the parameter value listed above, I received a filter efficiency of zero. I’m clearly not understanding how to apply this DMV.
I just got the good word, I submitted an abstract for a spotlight session at the PASS Summit and it was accepted. Jazzed doesn’t begin to describe it. If you’re interested in hearing what I think about Best Practices for working with Execution Plans, please swing by.
I don’t usually make a big deal about my blog roll. I usually add either people I read regularly or people who ask to be added. I don’t have a hierarchy of belonging & all that (yet). But I want to point out one that I just recently put up. Chad Miller, creator of SQL Server PowerShell Extension, blogs about SQL Server & PowerShell. If you’re not already reading his stuff, you should. Unlike me, he really knows what he’s talking about.