PASS just published a new Top 10 list of mine over at the their web site. This one is the Top 10 Articles on the web if you’re trying to learn about spatial data. I’m not trying to say that I know what I’m doing with spatial data. I’m still feeling my way forward. These articles have proved to be the most useful in the learning I’ve done so far. I wanted to publish a little more information than we put into our Top 10 lists over at PASS. Having the list is good, but I thought it needed a bit of commentary to be complete.
All these articles are good and all the web sites hosting the articles have good stuff on them. A few of the web sites stand out. Paul Randal‘s site covers quite a lot more than spatial data, but he writes with such clarity that his posts are a must read. When you’re ready for more, you need to read the stuff at Bob Beauchemin’s blog. Again, there’s more than just spatial data to be had there, but Mr. Beauchemin has really done the research and he’s good enough to compile it for the rest of us. Those are, sort of, the introductory sites. When you’re ready to really and truly just go with all things spatial, the two sites that are going to prove most useful (or, at least that I’ve found most useful) are Spatial Ed(Ed Katibah) and Isaac Kunen. These are two of the people responsible for creating the spatial engine inside SQL Server. We can thank them for that, but better still, these guys are good about communicating what they’ve done, what it means, how it works, applications, ideas… You get the drift. If you’re really pursuing spatial data as an important component of your enterprise data, you need to read their stuff.
There are a few links that I couldn’t easily fit into the top 10 since they’re not discrete articles. I’ve already blogged about and reviewed Alastair Aitchison’s excellent book, Beginng Spatial with SQL Server 2008, but it’s worth another plug. You will also want to browse through the functionality being posted at the SQL Server Spatial Tools site over at CodePlex. Stuff there is pretty useful for getting your own functionality… functional. Finally, when you get stuck, if you get stuck, one of the best places to get unstuck is on the MSDN spatial forum, where a lot of the people already mentioned are answering questions and posting.
These are the resources I’ve found most useful in the little bit of spatial I’ve learned so far. I hope the top 10 list and this explanation of it prove useful.
UPDATE: Fixed link problem.
In my previous post, Spatial Data Hurts My Brain, I showed how a query used the spatial index, but hurt performance. Several people jumped in to try to help out. Thanks to Bob Beauchamin, Isaac Kunin, Gail Shaw, and Valerie Yakich (via Twitter).
I’ve definately learned a bit more than I knew previously about spatial data and spatial indexes. I’ve posted about it before, but it bears repeating, if you’re just starting out, I can’t recommend Alistair Aitchison’s book, Beginning Spatial with SQL Server 2008, enough. Here’s where I’m at. I think I was confusing/confused/confounded/something about what to expect from a spatial index. I’m going to present two queries and two indexes and try to explain why each works well, or not together, mainly as a means for enabling my understanding, not so much teaching anyone else. The first query is simply a retrieval of the data from a database where the distance between a supplied point and the points stored in the geography column are less than a certain value:
SELECT * FROM dbo.ebe WHERE ebe.[Location].STDistance(@location) < 15000
When this is run against the table with only a clustered index on the PK, this is the execution plan:
About what you would expect. The query returns almost 5000 rows of the 500000 in the table and runs for 14 seconds on my machine. Now I’m going to create this index and run the query again.
CREATE SPATIAL INDEX [ebe_High] ON [dbo].[ebe] ( [Location] ) USING GEOGRAPHY_GRID WITH (GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH) , CELLS_PER_OBJECT= 1024) ON [PRIMARY] GO
When the query is run a second time, here’s the execution plan.
This time the query ran in under a second to return the same rows. Clearly, spatial indexes do work to speed up execution. The execution plan is hard to see here I realize. I can’t upload zipped files or .sqlplan files, so if you really want a copy of the plan, I’ve pasted both into a Word doc: spatial_plans
I’m not going to try to explain everything happening with the plan, because I’m not entirely sure what all is. I’m still figuring that out. Suffice to say that the table valued function and the constant scan operator feed data into a seek against the clustered index (which is missing statistics for some reason) where it’s fed to a compute scalar and then finally it’s loop joined to the clustered index in what I assume is the equivalent of a key lookup. Again, I’m still figuring this part out. I skipped some steps.
Now, I want to modify the query because the goal is to not simply get all locations near this point of interest, but the locations of a specified organization:
SELECT * FROM dbo.ebe WHERE ebe.[Location].STDistance(@location) < 15000 AND ebe.[OrgId] = 4464
When I run this query, I get the following execution plan:
The query ran in 96ms and returned 148 rows. It’s all you could ask for. Or is it? What if I add this index:
CREATE NONCLUSTERED INDEX [ebe_nc] ON [dbo].[EBE] ( [OrgId] ASC ) INCLUDE (InterestId,Location) GO
Now when I run the query, I get the following execution plan:
Much, much less complex and quite easy to explain. I have a covering index that does a simple seek to retrieve the initial set of data and then it’s filtered to remove the locations that are too far from the point of interest. But, here’s where it gets interesting. The execution time was 126ms. The spatial index shaved about 30ms, or about 25%, off the time of what would normally be the fastest access path, a covering non-clustered index. That is a substantial savings.
I’ve done a lot of testing over the last few days and I didn’t always see this kind of performance benefit. Several times I’ve seen worse performance. Also, I’m not sure why a brand new index has missing statistics, as seen in the execution plans when the where the spatial index was used. Further, the optimizer is choosing the nonclustered index over the spatial index, clearly because of the estimated costs, when the spatial index does, in fact, perform better in this instance. I’m not sure what to do about that, apart from dropping the nonclustered index. That’s OK. It just means more to learn. Starting with, explaining exactly what’s happening in the spatial index execution plan. That may be the next post.
I forgot to mention. I ran the sp_help_spatial_geography_index DMV to see what information it could give, but when I supplied the parameter value listed above, I received a filter efficiency of zero. I’m clearly not understanding how to apply this DMV.
I’m pretty excited about this new data type and the functionality associated with it. I’m sure our company will have some application for this. However, I saw this interesting post over at SQL Server Central. If anyone can help shed some light on this question, I’m sure a lot of people would be interested in the answers.
I made an incorrect link above in the original posting. It was just an oopsie, not an intentional misdirection. My apologies.