SQL Rally: Performance Tuning Abstract

October 28, 2010 at 8:00 am (PASS) (, , , , , , )

I get the call, you get the call, everyone gets the call. “Hey, my app/procedure/query/report is running slow.” Now what do you do? You go to my full day session at SQL Rally, that’s what. Assuming you vote for it.

I didn’t post the abstract I submitted for the SQL Rally before because I thought that it would be redudant. However, since it’s not right off the voting page (unless they updated it since I voted), if you’re interested, here’s what I thought I would do for a day. If it sounds good to you, please go here and vote for it.

One of the most common problems encountered in SQL Server is the slow running query. Once a query is identified as performing poorly, DBAs and developers frequently don’t understand how to diagnose the problem and often struggle to fix the problem. This one day seminar focuses exclusively on these two topics. Attendees will learn how to identify the queries that are performing badly and learn how to fix them. We will start by learning how to gather performance metrics, both server and query metrics, using tools available directly from Microsoft such as performance monitor, DMVs and Profiler. From there we’ll move into learning how the optimizer works and how it uses statistics to determine which indexes and other database objects can assist the performance of a query. The session takes considerable time to show exactly how to generate and read execution plans, the one best mechanism for observing how the optimizer works. We’ll then look at other DMVs that can assist you when performance tuning queries. With all this knowledge gathered, we’ll move into looking at common performance problems, how they evidence themselves in the metrics and execution plans, and how to address them. Finally, we’ll explore advanced methods for solving some of the more difficult query performance problems introducing such concepts as query hints, plan guides and plan forcing. Through all of this, best practices and common techniques will be reviewed. Attendees will go home with a working knowledge of query performance tuning, a set of methods for identifying poorly performing queries, scripts to assist in these processes and the knowledge of how fix performance problems in their own systems.

To see the other sessions go here:
BI
DBA
Developer
Misc

Although I would prefer that you voted for me, it’s more important that you vote at all (same thing as in real life). Please go here and select the sessions that you want to see.

Permalink 1 Comment

24 Hours of PASS: Summit Preview

August 11, 2010 at 8:58 am (PASS, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , , , )

Registration is open for the second 24 Hours of PASS this year. This one is going to be a preview of the Summit itself. So all the sessions are tied, in some manner, to sessions being given at the summit.Here’s a link to go and register.

I’m very excited to be able to say that I’ll be presenting in this 24HOP. One of my presentations at the Summit this year is Identifying and Fixing Performance Problems Using Execution Plans. It covers pretty much what it says, methods for fixing performance problems by exploring the information available within execution plans. But, how do you know you have a performance problem? That’s where my preview session comes in. Identifying Costly Queries will show you several ways to gather metrics on your system so that you can understand which queries are causing you the most pain. Once you know which queries need tuning, you can use execution plans to tune them. Whether you’ll be attending the PASS Summit or not, and whether or not you’ll go to my session once you’re there, I think this 24HOP session will be useful to help you understand where the pain points are within your own systems. I hope you’ll attend.

More importantly though, check out all the other great sessions. This is an excellent collection of presenters and presentations. For anyone who has ever said “PASS doesn’t do anything for me,” I want you especially to take a look at the amazing training opportunities being offered by PASS, for free. The volunteers that run PASS do amazing things and this is just one of them. Take advantage of this opportunity and, hopefully, recognize that PASS is doing things for you. This just barely scratches the surface of all that PASS offers.

Permalink 3 Comments

I’m a Pain

June 10, 2010 at 10:22 am (SQL Server 2008, SQLServerPedia Syndication) (, , )

Or maybe I’m misreading that I’m supposed to be helping to alleviate pain. Either way, next Thursday, June 17, I’ll be taking part in the Quest Pain of the Week webcast: Easy Ways to Fix Hard SQL Server Problems – Without a Guru. It should be a lot of fun. Please stop by and ask hard questions.

Permalink 2 Comments

T-SQL Tuesday #1: Date/Time Tricks

December 8, 2009 at 10:44 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , )

I’m going to try out Adam Machanic’s idea for a blog party. The topic this month are Date/Time tricks.

Instead of supplying a trick for Date/Time, I’m going to caution you about the tricks that you use. Let’s take a simple issue. You want to pull back data from a table, let’s use the Production.TransactionHistoryArchive in AdventureWorks2008, for a given month of data. Before we run the query, let’s create an index on the table:

CREATE INDEX ixTest ON Production.TransactionHistoryArchive
(TransactionDate)

The query itself is pretty simple. This is one mechanism that will retrieve the data for the month of July in 2003:

SELECT tha.TransactionID
FROM Production.TransactionHistoryArchive AS tha
WHERE DATEPART(yy,tha.TransactionDate) = 2003
AND DATEPART(mm,tha.TransactionDate) = 7

In theory you should be able to use the index that was created earlier, but instead, you’ll see this execution plan:

The problem is occuring because there is a function running against the columns. This is going to force a scan, even though you have a good index. Rewriting the query so that it looks like this:

SELECT tha.ProductionID
FROM Production.TransactionHistoryArchive AS tha
WHERE tha.TransactionDate
BETWEEN '2003/7/1'
AND '2003/7/31'

Eliminates the function on the column so that the execution plan is now a nice clean index seek:

Whatever tricks you begin to apply to date/time, be careful of how you apply them. And, if you try a fix like I supplied above, be sure it returns the data you think it returns, testing is the key to applying anything you read on the internet.

Permalink 5 Comments

SQL Server XQuery For Idiots

November 30, 2009 at 9:17 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , )

I’m still struggling with learning XQuery. My latest little revelation was small, but vital.

The difference between:

@inXML.nodes('/rss/channel/item')

and

@inXML.nodes('/rss[1]/channel[1]/item')

Is the difference between a 10 minute query and a 4 second query. Kind of shocking really. My understanding is that XQuery assumes there are multiple possible paths that look like “/rss/channel” so it searches over and over again through the XML to see if it can find them. But by identifying it as a root, showing that only one possible path is available, it stops stumbling around in the dark and simply reads the data.

It’s a little thing, but it made an enormous difference. I’ve still got a long way to go in learning how to use XPath within XQuery.

UPDATED: I modified the title so that it’s clear I’m talking about SQL Server here.

Permalink 6 Comments

Gila Monster on Estimated vs. Actual Row Counts

September 22, 2009 at 6:47 am (SQL Server 2005, SQL Server 2008, TSQL) (, , , , )

I don’t generally do lots of blog aggregation and cross post linking & stuff. It’s just not something I’m that into. However, this time is an exception. Gail Shaw, Gila Monster to those who hang on out SQL Server Central, has posted an excellent explanation of times when Estimated and Actual row counts vary for a reason. I’m one of those who emphasises that differences between estimated & actual is an indication of… something. It could be out of date or missing statistics or it could be caused by query language like multi-statement table valued functions, but it’s usually and indication of a problem. Except when it’s not. Read Gail’s explanation for more.

Permalink Leave a Comment

Identifying Frequently Running Queries

September 21, 2009 at 9:13 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , )

It’s not enough to look at the longest running query. What if you have two queries, one runs for 5 minutes and the other runs for 5 seconds. Which do you tune? The first one, of course. But, let’s add a little information to our thought experiment. The one that runs for 5 minutes is called 2AM, your local time, when there are almost no users on the system and the one that runs for 5 seconds is called 40 times a minute all day long. Now, which one do you tune? That’s right, suddenly that 5 second query takes on a HUGE significance based on how often it’s called.

But how do you know how often a query is called? The easiest way to gather that information is not necessarily the best in terms of a long term methodology, a server side trace that captures RPC and SQL Batch completion events. The beauty of the server side trace is how easy it is to set up and how much information you can gather, so quickly. You can immediately start getting every query run on the server along with how long they took to run, how much CPU each used and various other metrics. You can run this for short periods or long and then gather the data together into a table in a database and run aggregation queries against it to generate reports showing the most frequently called query. Simple, quick easy. But there are a couple of problems with it. First, while you can run it 24×7, you’re going to see quite a bit of data generated from the events, depending on the load on your system, of course. Second, the queries that come through will likely have all kinds of stuff associated with them, various paramter values and what not, that will make aggregation hard. You’ll have to put together a means of cleaning the data and maintaining it or purchase one of the many third party tools for doing this. You’ll have to store and manage large amounts of data as it comes into the system. It’s going to be a lot of work.

Another option, one that will require a degree of setup, is a little less accurate, it’s unlikely to capture every single execution of every single query, but it will get the majority, sys.dm_exec_query_stats. This handy little dynamic management view carries aggregate data about all the queries currently in the cache. It will show how much accumulated processing time a query has taken, the last time, the longest time, the shortest time and lots of other information, but the one interesting value we would be most interested in here, is the usage count. If you just run a SELECT statement against sys.dm_exec_query_stats, you’ll get interesting data back, but you won’t see your query or your execution plan. To make the information available within the DMV, you’ll need to combine it with other DMV’s. Here’s a sample query that pulls back information that could be useful for identifying badly performing queries on your system including how often they’re called.

SELECT dest.text
,deqp.query_plan
,deqs.execution_count
,deqs.creation_time
,deqs.query_hash
,deqs.query_plan_hash
,deqs.total_elapsed_time
,deqs.total_clr_time
,deqs.total_physical_reads
,deqs.total_worker_time
FROM sys.dm_exec_query_stats deqs
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_sql_text(deqs.sql_handle)) dest
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_query_plan(deqs.plan_handle)) deqp

There’s a lot you could do with this query, but the real power is going to come from gathering this data on a regular basis. It’s going to depend on how volatile the cache is on your system. If most of your queries sit in the cache for weeks (and that’s possible), then you could run a query once a week to gather the information. If your cache is getting flushed clean once an hour, you’d need to run this query more often. Regardless of how often you run the query, understand that, because you’re dependent on the cache, it is possible to miss data. Once you get the query running on your system, you can use the information to determine which of the queries needs to be tuned, not just on how long it runs, but also how often it runs, or even using other combinations of the information returned.

One other point, you may see ad hoc queries on your system. Here’s a simple example:

SELECT TOP (5) sod.*
FROM Sales.SalesOrderDetail sod
WHERE sod.ProductID = 772
ORDER BY sod.SalesOrderID ASC ;

If the value passed to sod.ProductID is changed to 773 or any other value, you will get a different entry in cache for this procedure. That’s where you can put the query_hash or the query_plan_hash values to work. These values show a hash that SQL Server creates for each query that it stores in cache. You could group by these values to aggregate ad hoc queries that would, if they were stored procedures, be identical executions and arrive at an aggregated count.

So, if you need precise information, you can use trace events, but, pay for that with a lot of data that you have to manage.

EDIT: Changed deqs.total_elapsed_date to deqs.total_elapsed_time as per comments below.

Permalink 6 Comments

No Join Predicate

September 15, 2009 at 8:00 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , )

You could be looking at an execution plan on a query and see this message: Warning, No Join Predicate. With a very few exceptions (very few), that’s not a warning, like “Mind the gap” or “Your Mileage May Vary” or “Never fight a land war in Asia.” No. It’s a screaming shout saying “BY ALL THE GODS IN VALHALA, YOU DON’T HAVE ANY JOIN PREDICATES IN THIS QUERY!” Seriously, that’s exactly what it says.

But, you might be asking yourself, that’s what it says, but what does it mean? Please allow me to explain. The tables in your system, whether it’s a classic BI star schema, or a highly normalized OLTP system, or even (shudder) ORM objects, are related to one another. Usually they’re related through the use of primary and foreign keys. The primary key uniquely identifies a row, like a particular value in a dimension table. This then is related to some number of rows in another table, like in a fact table, through the foreign key. The foreign key acts to protect the integrity of the data, ensuring that if you insert something that’s supposed to relate to another row, then that other row really exists and if you try to delete that other row, the foreign key will remind you that, hey, that row has things that depend on it existing in another table. All of this can be quite complicated or quite simple. Heck, you may be using an ORM system and don’t have a single foreign key in site.

But, when you go to retrieve data from two tables that are related in a TSQL query, you need to define that relationship in the TSQL itself. SQL Server 2008 relies on what was defined as the ANSI standard back in 1992. But, if you learned your SQL coding skills back in the day or on Oracle, you may be used to simply listing tables and then defining their relationships through the WHERE clause. Just a side note, unless you learned your SQL skills in Oracle, if you were born AFTER the 1992 standard was enacted, you don’t have an excuse. Anyway, the old way that code looked like was this:

SELECT…
FROM TableA AS a, TableB as b, TableC as c
WHERE a.MyID = b.MyID
AND b.YourID *= c.YourID
AND….

What you have there is an inner join, or in the old parlance, an equi-join because of the use of the equals sign, between TableA and TableB and an left outer join, defined through the use of the ‘*’, between TableB and TableC. Any other criteria that defined the WHERE clause then followed.

Flat out, in SQL Server 2008, you can’t do this any more. It will not support that style of outer join. The new fangled method, remember, new as of 1992, is this:

SELECT …
FROM TableA as a
INNER JOIN TableB as b
ON a.MyID = b.MyID
LEFT JOIN TableC as c
ON b.YourID = c.YourID
WHERE….

This forces you to separate the relations between the tables as defined by the JOIN’s and the filtering methods supplied through the WHERE clause. And yes, it’s a lot more complicated than I’m making it out to be, but I’m trying to establish the ground rules here. Where was I? Oh yeah, missing join criteria.

What happens is, someone that’s used to writing things in the ANSI ’89 syntax can either forget to define the JOIN criteria or accidently remove it while editing or something. Regardless of how it happened, you’ve ended up with what is known as a cartesian product, shortest possible explanation, everything joined on everything else. For example, here’s a real query against the AdvenureWorks2008 database:

SELECT *
FROM Sales.SalesOrderHeader AS soh
,Sales.SalesOrderDetail AS sod
,Production.Product AS p
WHERE soh.SalesOrderID = 43659

That query does not define join criteria between any of the tables involved. This results in the following estimated execution plan (estimated because I got sick of waiting for the query to finish):

MissingJoin

If you look at the final Nested Loop operation right before the SELECT statement operator, you’ll notice there is a little exclamation point inside a yellow triangle. This is an indication of a warning in your query.  You can hover over the operator to get the tool tip, which looks like this:

NoJoinPredicate

You can see at the bottom the warning that no join predicate is in use. If you constructed the query appropriately it would look more like this:

SELECT *
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
JOIN Production.Product AS p
ON sod.ProductId = p.ProductID
WHERE soh.SalesOrderID = 43659;

Which, instead of returning 800,000 rows before I interrupted it returned only 12 rows and had a very clean execution plan:

Join

I would never consider “No Join Predicate” as a warning. It’s a blaring klaxon of a problem and you need to do something about it, right away.

Permalink 13 Comments

LaRock on Monitoring

September 11, 2009 at 7:34 am (SQL Server 2005, SQL Server 2008, TSQL) (, , , , )

Since I just spent a bit more than half of my 24 Hours of PASS presentation on tuning queries talking about monitoring performance, you could probably tell that I think that the two are inextricably linked. I’m not alone. Tom LaRock has put a post on why it’s important to understand the extent of your problem prior to attempting to fix it. It’s absolutely spot-on and a must read.

Remember, if someone says performance on a system you manage is slow, you have to ask, as compared to what? You need to understand what “normal” performance is on your system in order to arrive at the ability to identify a particular process as performing in an abnormal manner. That’s not to say that you couldn’t tune a query in complete isolation to a baseline of performance. Of course you could. The question that Tom is answering is, are you tuning the right query for the right reasons?

Oh, and for those that are interested, some of the presentations made during 24 Hours of PASS are now available for download. The rest of the sessions will be available after the Summit. If you haven’t made plans to go, I’d get on over there now and register.

Permalink Leave a Comment

Debuting: Beginning SQL Server 2008 Administration

September 3, 2009 at 9:21 pm (SQL Server 2008) (, , , , , )

The new book is up on Amazon. I only worked on three chapters of Rob Walter’s new book and that was after begging to only work on two, so I can’t take much credit for the effort that went into this book. However, thanks to our editor Jonathan Gennick, I was privileged to work with Rob & Carmen, if pretty indirectly. I know I mentioned the book before when it was put up on the Apress web site, but this is Amazon. Once it’s up on Amazon, it’s real.

Permalink 1 Comment

Next page »