It’s a major award

May 29, 2009 at 8:26 am (Misc, SQLServerPedia Syndication)

I try to shy away from humor in any of my posts because what I think is funny, others think is irritating or stupid (examples: I LOATHE Adam Sandler and I thought Hudson Hawk was funny. So there. I’m completely out of the main stream of America where comedy is concerned). I wanted to post something funny about getting an upgraded listing over at SQLRockStar’s Blog Rankings. I’ve started it several times, looking to the Oscars and trying to  imitate Sally Fields or maybe that little Italian guy, toss in a Native American being really sombre & sad, a few streakers, an old guy doing push-ups (ohe-handed, even though he cheated a bit on the form, I still can’t do those, even cheating)… None of it worked. I do appreciate wit, I just don’t have any.

Instead, I’ll leave the humor to Tom & Tim, they’re good at it.

Thanks Tom… and I want to thank the Academy, and my makeup artist, my mom, all the little people that made this possible. It’s about time too. I should have received it for other work and everyone knows it, but now I’m getting it for more mediocre work but as a consolation for not getting it for the work that I should have received it for and now some other poor shleb isn’t getting theirs this time but instead will be booting out another worthy at some later award period because then it’ll be their turn to get the sympathy award and the entire circle repeats because after all this is a totally political awards body and it’s about who you blow or know or know to blow and not about the strict artistry of our art and work but now they’re playing that stinking music and the microphone is starting to sink into the floor so you can watch some interpretative dance of the DBCC process which I’m informed involves Paul Randal in a kilt and for some reason a bunch of shirtless firemen. GOOD NIGHT TULSA!

Permalink 3 Comments

Isaac Kunen Spatial Indexes Roundup

May 28, 2009 at 10:48 am (Uncategorized)

Mr. Kunen just submitted this to the comments on the first Brain Pain post, but it deserves a bit more obvious, top  of the bill placement. I’m just reading through the posts now, but I think it’s safe to say, these are must reads for everyone.

Permalink 3 Comments

Spatial Index & Performance & Brain Pain

May 28, 2009 at 10:36 am (spatial data, SQL Server 2008, TSQL) (, , , , , )

In my previous post, Spatial Data Hurts My Brain, I showed how a query used the spatial index, but hurt performance. Several people jumped in to try to help out. Thanks to Bob Beauchamin, Isaac Kunin, Gail Shaw, and Valerie Yakich (via Twitter).

I’ve definately learned a bit more than I knew previously about spatial data and spatial indexes. I’ve posted about it before, but it bears repeating, if you’re just starting out, I can’t recommend Alistair Aitchison’s book, Beginning Spatial with SQL Server 2008, enough. Here’s where I’m at. I think I was confusing/confused/confounded/something about what to expect from a spatial index. I’m going to present two queries and two indexes and try to explain why each works well, or not together, mainly as a means for enabling my understanding, not so much teaching anyone else. The first query is simply a retrieval of the data from a database where the distance between a supplied point and the points stored in the geography column are less than a certain value:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000

When this is run against the table with only a clustered index on the PK, this is the execution plan:

 spatialindex_1

About what you would expect. The query returns almost 5000 rows of the 500000 in the table and runs for 14 seconds on my machine. Now I’m going to create this index and run the query again.

CREATE SPATIAL INDEX [ebe_High] ON [dbo].[ebe] 
(
[Location]
)
USING GEOGRAPHY_GRID 
WITH
(GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH)
, CELLS_PER_OBJECT= 1024) ON [PRIMARY]
GO

When the query is run a second time, here’s the execution plan.

 spatialindex_2

This time the query ran in under a second to return the same rows. Clearly, spatial indexes do work to speed up execution. The execution plan is hard to see here I realize. I can’t upload zipped files or .sqlplan files, so if you really want a copy of the plan, I’ve pasted both into a Word doc: spatial_plans

I’m not going to try to explain everything happening with the plan, because I’m not entirely sure what all is. I’m still figuring that out. Suffice to say that the table valued function and the constant scan operator feed data into a seek against the clustered index (which is missing statistics for some reason) where it’s fed to a compute scalar and then finally it’s loop joined to the clustered index in what I assume is the equivalent of a key lookup. Again, I’m still figuring this part out. I skipped some steps.

Now, I want to modify the query because the goal is to not simply get all locations near this point of interest, but the locations of a specified organization:

SELECT *
FROM dbo.ebe
WHERE ebe.[Location].STDistance(@location) < 15000
AND ebe.[OrgId] = 4464

When I run this query, I get the following execution plan:

 spatialindex_3

The query ran in 96ms and returned 148 rows. It’s all you could ask for. Or is it? What if I add this index:

CREATE NONCLUSTERED INDEX [ebe_nc] ON [dbo].[EBE] 
(
[OrgId]
ASC
) INCLUDE (InterestId,Location)
GO

Now when I run the query, I get the following execution plan:

 spatialindex_4

Much, much less complex and quite easy to explain. I have a covering index that does a simple seek to retrieve the initial set of data and then it’s filtered to remove the locations that are too far from the point of interest. But, here’s where it gets interesting. The execution time was 126ms. The spatial index shaved about 30ms, or about 25%, off the time of what would normally be the fastest access path, a covering non-clustered index. That is a substantial savings.

I’ve done a lot of testing over the last few days and I didn’t always see this kind of performance benefit. Several times I’ve seen worse performance. Also, I’m not sure why a brand new index has missing statistics, as seen in the execution plans when the where the spatial index was used. Further, the optimizer is choosing the nonclustered index over the spatial index, clearly because of the estimated costs, when the spatial index does, in fact, perform better in this instance. I’m not sure what to do about that, apart from dropping the nonclustered index. That’s OK. It just means more to learn. Starting with, explaining exactly what’s happening in the spatial index execution plan. That may be the next post.

UPDATE:
I forgot to mention. I ran the sp_help_spatial_geography_index DMV to see what information it could give, but when I supplied the parameter value listed above, I received a filter efficiency of zero. I’m clearly not understanding how to apply this DMV.

Permalink 3 Comments

Spatial Data Hurts My Brain

May 27, 2009 at 1:03 pm (spatial data, SQLServerPedia Syndication, TSQL) (, , , )

I’m still barely scratching the surface working with spatial data in SQL Server 2008. We’ve ported some of the data into a table where we built a geography spatial data column and we’re begginning to work with point data. The requirements from the developers are, so far, very simple. They’ll feed me a point and I find all the locations “close” to it. We had to go round & round on what defines “close” but finally settled on, I think, 15km.

The query to answer a question like this is ridiculously simple (a few object names have been changed):

SELECT ebe.[Location].STDistance(@Location) AS Distance,
ebe.[InterestId],
ebe.[Location].Lat AS Latitude,
ebe.[Location].Long AS Longitude,
ebe.[OrgId]
FROM dbo.[ebe] AS ebe
WHERE ebe[OrgId] = @OrgId
AND ebe.[Location].STDistance(@Location) < @CloseDistance

I’m not even hard-coding the “close” value so it can change when they change their minds. It retrieves exactly what’s needed. But… Well, look at the STATISTICS IO & TIME:

Table 'ebe'. Scan count 3, logical reads 40179
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 376 ms, elapsed time = 373 ms.

Not exactly snappy, is it? So, the obvious answer is to provide an index it can use so that it doesn’t have to scan the clustered index. Spatial indexes only support certain functions with the spatial column, STIntersects, STEquals, STDistance. Since I’m dealing with STDistance, I have a shot at this working. Here’s the script:

CREATE SPATIAL INDEX [ebe_spatial] ON [dbo].[ebe]
(
[Location]
)
USING GEOGRAPHY_GRID
WITH
( GRIDS =( LEVEL_1 = MEDIUM, LEVEL_2 = MEDIUM, LEVEL_3 = MEDIUM, LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT= 128,
PAD_INDEX = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

When I then run the query, well, here’s the STATISTICS IO & TIME again:

Table 'Worktable'. Scan count 0, logical reads 0
Table 'extended_index_469576711_384000'. Scan count 2076
Table 'ebe'. Scan count 0, logical reads 448064
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 1155 ms, elapsed time = 1393 ms.

You read that correctly, it used the spatial index which caused the performance to decrease. There are adjustments you can make to spatial indexes. You can change the number of cells per object or set the detail level on the grid, but it’s hard to understand what this does to the index.

After fighting with it for a bit, I sent a tweet out on twitter, just whining about the index causing a slow-down. Surprised as anything, I get a response from Paul Randall, uh, wow, suggesting an approach. This is responded to by Isaac Kunen, you know the Microsoft PM & spatial expert. You just have to love Twitter. I took the information they gave me and did a few searches on the web. This lead to Bob Beauchemin’s blog and my introduction to some new Dynamic Management Views, sp_help_spatial_geometry_index and sp_help_spatial_geography_index. Now we’re talking. A mechanism to identify how useful the spatial data index is, similar to looking at sys.dm_db_index_physical_stats to see information about a clustered index. This lead to the following query and output:

DECLARE  @LocationGEOGRAPHY 
SET @Location = GEOGRAPHY::STPointFromText('POINT (-86.674582 36.148843)',4326)
EXEC sp_help_spatial_geography_index @tabname = 'EWH_BIM_Extract2'
,@indexname='ebe_spatial'
,@verboseoutput = 1
,@query_sample = @Location

The question is, how efficient is the index? In my case, the Primary_Filter_Efficiency was listed as 3.33333. Yes, that’s 3% efficient. No wonder it killed the performance. What do I do to make the index more efficient? I haven’t a clue. That may be the next blog post. But at least now I know how to evaluate the usefulness of a spatial index.

Permalink 13 Comments

How Can PowerShell Help You as a DBA

May 27, 2009 at 9:02 am (PowerShell)

Buck Woody has an article up on the Microsoft Scripting web site (which I didn’t know about before today) on how PowerShell will help you as a DBA. That’s a must read for every DBA.

Permalink 2 Comments

WHOOOP!

May 27, 2009 at 8:59 am (PASS, SQL Server 2005, SQL Server 2008, TSQL) (, , , , )

I just got the good word, I submitted an abstract for a spotlight session at the PASS Summit and it was accepted. Jazzed doesn’t begin to describe it. If you’re interested in hearing what I think about Best Practices for working with Execution Plans, please swing by.

Permalink 16 Comments

Odd TSQL Behavior

May 20, 2009 at 2:18 pm (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, )

Before I describe this, let me thank Lynn Pettis over at SQL Server Central for coming up with the answer.

This morning a developer walked up and asked me what would happen if he ran a query that looked like this:

SELECT 1.SomeString

I said that he’d get an error. No, he says, try it. So I try it and I get this:

Somestring
1

Try it yourself. It works just fine. I’d never seen that before and didn’t have a clue what it was. Thinking that Microsoft had supplied some new short hand to define aliases I ran this:

SELECT 'dude'.dudette

Which resulted in the error:

 

Msg 258, Level 15, State 1, Line 1
Cannot call methods on varchar.

 

Which is what I would have expected. I tried a couple of functions and some other bits & pieces of data, but only integers seemed to work… Or was it? What would be the purpose of a function that defines aliases for an integer?

I finally posted a question up on SQL Server Central. One of the local brain trust over there, Lynn Pettis, chewed on it for a bit and identified the issue. SQL Server is “helping” you by assuming that you meant to put a space between the period (dot or . ) and the string, Somestring, as an alias. I then tried this:

SELECT 'dude'dudette

And was rewarded with the same behavior.

Personally, in this case, I don’t see this as TSQL “helping” me nearly so much as it confused the heck out of me. Thanks again for figuring it out Lynn.

Permalink 2 Comments

Powershell Blog Worth Reading

May 14, 2009 at 2:55 pm (PowerShell) (, , )

I don’t usually make a big deal about my blog roll. I usually add either people I read regularly or people who ask to be added. I don’t have a hierarchy of belonging & all that (yet). But I want to point out one that I just recently put up. Chad Miller, creator of SQL Server PowerShell Extension, blogs about SQL Server & PowerShell. If you’re not already reading his stuff, you should. Unlike me, he really knows what he’s talking about.

Permalink Leave a Comment

More PowerShell Basics

May 14, 2009 at 9:22 am (PowerShell, SQLServerPedia Syndication) (, , , , )

Because of the data center move, we have a number of different functions that we’re running on totally different sets of servers on different days. None of this is part of our normal maintenance routines, so I’ve been using it as a great opportunity to stretch a little with PowerShell. I’m still learning, a lot.

The latest task was to get the databases of a list of servers backed up. I initially tried it using SMO. It works great. But it’s serial across all the servers as well as across the databases. I’m fine with serial backups on the databases (very, very fine, I saw a parallel backup of all databases once, pretty sparks, like the Fourth of July) but I really wanted all the servers to get backed up in parallel. So I tossed the idea of having PowerShell run the backups, but I did have PowerShell generate the backup scripts for me.

Again, this is simplistic, I know, but I’m really just getting started. I went to a switch instead of the if conditional for more of an excercise than anything. The last script I posted I’ve updated to a switch, but this one should still be an if, but hey, sue me.

[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | out-null
foreach ($svr in Get-Content "c:\scripts\serverlist.txt")
{
 $srv=New-Object "Microsoft.SqlServer.Management.Smo.Server" "$svr"
 
 $query = "SELECT bmf.physical_device_name ,bs.database_name FROM msdb.dbo.backupmediafamily AS bmf JOIN msdb.dbo.backupset AS bs ON bmf.media_set_id = bs.media_set_id where bs.backup_start_date > GETDATE() - 1 AND bs.[type] = 'D'" 

 $table = Invoke-Sqlcmd -Query $query -ServerInstance $svr 

 foreach($rows in $table)
{
  $path = "c:\scripts\bu_" + $svr.Replace("\","_") + ".sql"

  switch ($srv.Information.VersionMajor) 
  {
   10 $val="backup database [" + $rows["database_name"] + "] to disk = '" + $rows["physical_device_name"] + "' with compression"}
   default{$val = "master.dbo.xp_backup_database @database ='[" + $rows["database_name"] + "]', @filename ='" + $rows["physical_device_name"] + "', @threads = 1"}Add-Content -path $path -value $val
  }
 Add-Content -path $path -value $val
}
}

Permalink 4 Comments

PowerShell Basics

May 12, 2009 at 6:57 am (PowerShell, SQLServerPedia Syndication) (, , )

I’m just learning how to use PowerShell. I’ve been trying to spend time with it over the last year, ever since I saw Allen White‘s presentations at the PASS Summit last year. I just haven’t had the time. But recently, my company has been performing a multi-phased data center move. Because of it, I’ve needed to run tests & updates, disable jobs, all kinds of things on different sets of servers on different days. To paraphrase the old Superman cartoons “This looks like a job <voice gets deeper> for PowerShell.”

I’m not even remotely qualified to begin teaching anyone PowerShell. I had problems recently getting an IF conditional statement to work correctly. However, in the interest of sharing, in case you’re thinking about picking it up (do) or you’re actively working on trying to apply it to your SQL Server maintenance routines, let me give you a small piece of advice. There are millions of things you can do with this. Maybe more. But one of the most common that I see is simply getting a list of servers and then looping through the list performing an action on the server. You can do this using some of the built in sql server gadgets, or, because PowerShell is a .NET language, you can take advantage of constructs such as SMO (SQL Server Management Objects). I’ve done both, depending on whether I was trying to manipulate data, running scripts through the PS equivalent of sqlcmd, or manipulating the servers & objects, SMO. This really simple script can help you get going (at least, I’m using it for tons of stuff):

[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | out-null
foreach ($server in Get-Content "c:\Scripts\ServerList.txt")
{
$srv=New-Object "Microsoft.SqlServer.Management.Smo.Server" "$server"
if($srv.Information.VersionMajor -eq "10")
{
#Perform a command on SQL Server 2008
}
elseif($srv.Information.VersionMajor -eq "9")
{
#Perform a command on SQL Server 2005
}
else
{
#Perform a command on SQL Server 2000
}
}

Assuming I typed all that correctly (a little tighter syntax checking would be nice), this will cycle through all the servers included in the text file ServerList.txt.

This isn’t rocket science. You could have figured this out for yourself. Hopefully this saves you five or ten minutes. As I figure out more interesting things (assuming I do), I’ll post them up here.

Permalink 2 Comments

Next page »