Spatial Index & Performance & Brain Pain

May 28, 2009 at 10:36 am (spatial data, SQL Server 2008, TSQL) (, , , , , )


In my previous post, Spatial Data Hurts My Brain, I showed how a query used the spatial index, but hurt performance. Several people jumped in to try to help out. Thanks to Bob Beauchamin, Isaac Kunin, Gail Shaw, and Valerie Yakich (via Twitter).

I’ve definately learned a bit more than I knew previously about spatial data and spatial indexes. I’ve posted about it before, but it bears repeating, if you’re just starting out, I can’t recommend Alistair Aitchison’s book, Beginning Spatial with SQL Server 2008, enough. Here’s where I’m at. I think I was confusing/confused/confounded/something about what to expect from a spatial index. I’m going to present two queries and two indexes and try to explain why each works well, or not together, mainly as a means for enabling my understanding, not so much teaching anyone else. The first query is simply a retrieval of the data from a database where the distance between a supplied point and the points stored in the geography column are less than a certain value:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000

When this is run against the table with only a clustered index on the PK, this is the execution plan:

 spatialindex_1

About what you would expect. The query returns almost 5000 rows of the 500000 in the table and runs for 14 seconds on my machine. Now I’m going to create this index and run the query again.

CREATE SPATIAL INDEX [ebe_High] ON [dbo].[ebe] 
(
[Location]
)
USING GEOGRAPHY_GRID 
WITH
(GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH)
, CELLS_PER_OBJECT= 1024) ON [PRIMARY]
GO

When the query is run a second time, here’s the execution plan.

 spatialindex_2

This time the query ran in under a second to return the same rows. Clearly, spatial indexes do work to speed up execution. The execution plan is hard to see here I realize. I can’t upload zipped files or .sqlplan files, so if you really want a copy of the plan, I’ve pasted both into a Word doc: spatial_plans

I’m not going to try to explain everything happening with the plan, because I’m not entirely sure what all is. I’m still figuring that out. Suffice to say that the table valued function and the constant scan operator feed data into a seek against the clustered index (which is missing statistics for some reason) where it’s fed to a compute scalar and then finally it’s loop joined to the clustered index in what I assume is the equivalent of a key lookup. Again, I’m still figuring this part out. I skipped some steps.

Now, I want to modify the query because the goal is to not simply get all locations near this point of interest, but the locations of a specified organization:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000
AND ebe.[OrgId] = 4464

When I run this query, I get the following execution plan:

 spatialindex_3

The query ran in 96ms and returned 148 rows. It’s all you could ask for. Or is it? What if I add this index:

CREATE NONCLUSTERED INDEX [ebe_nc] ON [dbo].[EBE] 
(
[OrgId]
ASC
) INCLUDE (InterestId,Location)
GO

Now when I run the query, I get the following execution plan:

 spatialindex_4

Much, much less complex and quite easy to explain. I have a covering index that does a simple seek to retrieve the initial set of data and then it’s filtered to remove the locations that are too far from the point of interest. But, here’s where it gets interesting. The execution time was 126ms. The spatial index shaved about 30ms, or about 25%, off the time of what would normally be the fastest access path, a covering non-clustered index. That is a substantial savings.

I’ve done a lot of testing over the last few days and I didn’t always see this kind of performance benefit. Several times I’ve seen worse performance. Also, I’m not sure why a brand new index has missing statistics, as seen in the execution plans when the where the spatial index was used. Further, the optimizer is choosing the nonclustered index over the spatial index, clearly because of the estimated costs, when the spatial index does, in fact, perform better in this instance. I’m not sure what to do about that, apart from dropping the nonclustered index. That’s OK. It just means more to learn. Starting with, explaining exactly what’s happening in the spatial index execution plan. That may be the next post.

UPDATE:
I forgot to mention. I ran the sp_help_spatial_geography_index DMV to see what information it could give, but when I supplied the parameter value listed above, I received a filter efficiency of zero. I’m clearly not understanding how to apply this DMV.

3 Comments

  1. Bob Beauchemin said,

    Looking at your previous post, you need to change your query sample for the spatial stored proc. The query sample is not a single point, but a buffer of 15km around a point that you choose as input, as that’s what your looking at in your query.

    Change query sample to be:
    SET @Location = GEOGRAPHY::STPointFromText(‘POINT (-86.674582 36.148843)’,4326).STBuffer(15000);

    If you still don’t get reasonable answers from the proc, send me the results directly in email, and I’ll post a blog entry analyzing it.

    Cheers,
    Bob

  2. scarydba said,

    Yow!

    What a difference. 93.94% efficiency on the Primary_Filter.

    I knew I was missing something obvious. Thank you, very much.

  3. Spatial Indexes and a Lack of Data « Home of the Scary DBA said,

    […] Indexes and a Lack of Data I was feeling quite confident about my new-found abilities with spatial indexes so I did a presentation for my team, to share what I had learned. I had also […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: