Spatial Data Hurts My Brain

May 27, 2009 at 1:03 pm (spatial data, SQLServerPedia Syndication, TSQL) (index, indexing, spatial data, sp_help_spatial_geography_index)

I’m still barely scratching the surface working with spatial data in SQL Server 2008. We’ve ported some of the data into a table where we built a geography spatial data column and we’re begginning to work with point data. The requirements from the developers are, so far, very simple. They’ll feed me a point and I find all the locations “close” to it. We had to go round & round on what defines “close” but finally settled on, I think, 15km.

The query to answer a question like this is ridiculously simple (a few object names have been changed):

SELECT ebe.[Location].STDistance(@Location) AS Distance,
ebe.[InterestId],
ebe.[Location].Lat AS Latitude,
ebe.[Location].Long AS Longitude,
ebe.[OrgId]
FROM dbo.[ebe] AS ebe
WHERE ebe[OrgId] = @OrgId
AND ebe.[Location].STDistance(@Location) < @CloseDistance

I’m not even hard-coding the “close” value so it can change when they change their minds. It retrieves exactly what’s needed. But… Well, look at the STATISTICS IO & TIME:

Table 'ebe'. Scan count 3, logical reads 40179
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 376 ms, elapsed time = 373 ms.

Not exactly snappy, is it? So, the obvious answer is to provide an index it can use so that it doesn’t have to scan the clustered index. Spatial indexes only support certain functions with the spatial column, STIntersects, STEquals, STDistance. Since I’m dealing with STDistance, I have a shot at this working. Here’s the script:

CREATE SPATIAL INDEX [ebe_spatial] ON [dbo].[ebe]
(
[Location]
)
USING GEOGRAPHY_GRID
WITH
( GRIDS =( LEVEL_1 = MEDIUM, LEVEL_2 = MEDIUM, LEVEL_3 = MEDIUM, LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT= 128,
PAD_INDEX = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

When I then run the query, well, here’s the STATISTICS IO & TIME again:

Table 'Worktable'. Scan count 0, logical reads 0
Table 'extended_index_469576711_384000'. Scan count 2076
Table 'ebe'. Scan count 0, logical reads 448064
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 1155 ms, elapsed time = 1393 ms.

You read that correctly, it used the spatial index which caused the performance to decrease. There are adjustments you can make to spatial indexes. You can change the number of cells per object or set the detail level on the grid, but it’s hard to understand what this does to the index.

After fighting with it for a bit, I sent a tweet out on twitter, just whining about the index causing a slow-down. Surprised as anything, I get a response from Paul Randall, uh, wow, suggesting an approach. This is responded to by Isaac Kunen, you know the Microsoft PM & spatial expert. You just have to love Twitter. I took the information they gave me and did a few searches on the web. This lead to Bob Beauchemin’s blog and my introduction to some new Dynamic Management Views, sp_help_spatial_geometry_index and sp_help_spatial_geography_index. Now we’re talking. A mechanism to identify how useful the spatial data index is, similar to looking at sys.dm_db_index_physical_stats to see information about a clustered index. This lead to the following query and output:

DECLARE  @LocationGEOGRAPHY 
SET @Location = GEOGRAPHY::STPointFromText('POINT (-86.674582 36.148843)',4326)
EXEC sp_help_spatial_geography_index @tabname = 'EWH_BIM_Extract2'
,@indexname='ebe_spatial'
,@verboseoutput = 1
,@query_sample = @Location

The question is, how efficient is the index? In my case, the Primary_Filter_Efficiency was listed as 3.33333. Yes, that’s 3% efficient. No wonder it killed the performance. What do I do to make the index more efficient? I haven’t a clue. That may be the next blog post. But at least now I know how to evaluate the usefulness of a spatial index.

13 Comments

Manuel A. Rodriguez said,

May 27, 2009 at 1:34 pm

Great post. Spatial data is seriously challenging in my opinion. I love the title of the post. Reminds me of a Monthy pythos skit….brain hurts! I’m just getting into spatial data myself. I’ll keep post good articles on twitter on this if i find anything.
Love the jumpstarttv videos

ManuelRdgz
RBarryyYoung said,

May 27, 2009 at 2:08 pm

You know I’ve always wondered what SQL server was doing with the spatial indexes. Real 2d (let alone 3d!) indexing is a truly difficult thing, and not a solved problem AFAIK. Let us know if you find anything on the internals of spatial indexes, Grant.
Isaac Kunen said,

May 27, 2009 at 2:08 pm

Posting the plans you’re getting would help debug this—something is definitely
Isaac Kunen said,

May 27, 2009 at 2:09 pm

Posting the plans you’re getting would help debug this—something is definitely going wrong.

Cheers,
-Isaac
scarydba said,

May 27, 2009 at 2:16 pm

I’ll put up some more information on another post, thanks Isaac. I’m still not convinced my query or my data are flawed. Partly because I didn’t import the data, so I’m not convinced everything is correct. Partly because I didn’t write the original query and I’ve already had to rewrite it. Let me get it a bit more settled and I’ll see if it changes the output. If it does, that’ll be interesting too.
Bob Beauchemin said,

May 27, 2009 at 5:55 pm

If you’re working with point data, you want your spatial index grids to be all HIGH (you have them all MEDIUM now). You make the spatial index more effective by twiddling with the grids and cells-per-object.

My spatial index diagnostic blog entries entitled “Filter” and “Filter Output” describe what the filter info means. You’re trying to get the data type to use the primary filter or internal filter (both implemented by the index) rather than do the long computation.

If this is your real query on the blog, you want to look at this blog post: http://www.sqlskills.com/BLOGS/BOBB/post/How-to-ensure-your-spatial-index-is-being-used.aspx. You likely need to rewrite your query not to use variables, and make sure you install SP1. A spatial index is just like an ordinary index in that if you use variables the coster what know what the value is at plan creation time.

HTH,
Bob
scarydba said,

May 27, 2009 at 6:17 pm

Excellent. Thanks. Yeah, that’s not exactly the real query. I was planning on using parameters rather than hard-coding the “close” value just because the business can’t decide what “close” means. I’ll be doing a bunch more testing on it tomorrow.

Thanks again.

Grant
Gail said,

May 28, 2009 at 4:28 am

If you’re still on RTM, apply SP1. There were changes made around the costing for spatial queries and in SP1 the optimiser may actually select a spatial query without needing hints. It may still pick the wrong one (if you have multiple spatial indexes with different detail level), but it should select one.

RTM the optimiser almost never selected a spatial index without a hint.
Gail said,

May 28, 2009 at 4:29 am

The structure of spatial indexes is nothing revolutionary Barry. Briefly (and as I understand it) the space it tessellated and contains/doesn’t contain info is stored in a regular b-tree.
scarydba said,

May 28, 2009 at 7:12 am

Hey Gail,

Thanks for the pointers. We are on SP1. The problem in this case wasn’t so much getting the index to work, it worked. It was getting it to work correctly. I’ll be clear to do some more experiments with it this morning. I’ll post more info as I find it.
RBarryyYoung said,

May 28, 2009 at 8:40 am

Thanks Gail. Here I was hoping that someone had figured out how to effectively reverse quad-trees and use them as lookup indexes. Hmm, I guess tessellation is similar to that.
Gail said,

May 28, 2009 at 9:29 am

No, SQL is not using quad-indexes. I’ve seen a fair bit of commentary about that. Positive and negative.
Isaac Kunen said,

May 28, 2009 at 10:03 am

Hi Folks,

If you’re interested, I’ve rolled up a number of my spatial posts here: http://blogs.msdn.com/isaac/archive/2009/05/28/sql-server-spatial-indexing.aspx

Hope this helps.

-Isaac

Home of the Scary DBA