24 Hours of PASS: Summit Preview

August 11, 2010 at 8:58 am (PASS, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , , , )

Registration is open for the second 24 Hours of PASS this year. This one is going to be a preview of the Summit itself. So all the sessions are tied, in some manner, to sessions being given at the summit.Here’s a link to go and register.

I’m very excited to be able to say that I’ll be presenting in this 24HOP. One of my presentations at the Summit this year is Identifying and Fixing Performance Problems Using Execution Plans. It covers pretty much what it says, methods for fixing performance problems by exploring the information available within execution plans. But, how do you know you have a performance problem? That’s where my preview session comes in. Identifying Costly Queries will show you several ways to gather metrics on your system so that you can understand which queries are causing you the most pain. Once you know which queries need tuning, you can use execution plans to tune them. Whether you’ll be attending the PASS Summit or not, and whether or not you’ll go to my session once you’re there, I think this 24HOP session will be useful to help you understand where the pain points are within your own systems. I hope you’ll attend.

More importantly though, check out all the other great sessions. This is an excellent collection of presenters and presentations. For anyone who has ever said “PASS doesn’t do anything for me,” I want you especially to take a look at the amazing training opportunities being offered by PASS, for free. The volunteers that run PASS do amazing things and this is just one of them. Take advantage of this opportunity and, hopefully, recognize that PASS is doing things for you. This just barely scratches the surface of all that PASS offers.

Permalink 3 Comments

Confio Ignite: Part II

April 16, 2010 at 8:00 am (Tools) (, , , , )

I’m continuing to evaluate Confio’s Ignite database monitoring tool. I’ve had it collecting data on a couple of production servers for about a week now. Based on what I’ve seen so far, it’s looking like a pretty good piece of software.

Breaking with the usual tradition, I’m going to talk about the things I’m not crazy about with the software, before I start singing its praises. The first thing, that I thought was missing, but is actually just hard to find, is the ability to look at the query information that Ignite collects, broken down by database. It looks like you should be able to get to it by looking at the Databases tab, but instead you have to first drill down into a time-period, then select specific databases within that time period, which will show you the queries by database. I know that in my environment, it’s this database listing of queries that is probably going to get used a lot. Tracking it down required help from the Confio team, which they quickly provided. They also showed me a way I could run queries to get this from their database so that I could create a report.

Speaking of reports, because they support Oracle & DB2 as well as SQL Server through their interface, no Reporting Services. OK, I recognize that I’m more than a bit Microsoft-centric when I say this, but I’d like to see SSRS reports so that I can manipulate things the way I want to. Again, not the end of the world, but it’s just something I don’t like. Because the data store is available though, I can get in there and create my own reports to my heart’s content and, if I like their report, I can always run a trace to capture the query and see how they built it so that I can build one of my own that mirrors it. I think they provide a mechanism for customizing reports by building some XML that will add it to the interface, which is a bit of a pain, but shows they’re on top of things.

One other thing that bothered me as I worked with Ignite is that, in the real time monitoring section, it was hard to find my way to the list of locks. It had the list, but it just wasn’t obvious how to get to them and this is something I’m used to looking at when I’m worrying about real time performance issues.

Right, enough talking about things I don’t like. See this? I love this:

That’s a break down, by database, by day, of the cumulative waits on the system. Yeah, that little pink database is causing all sorts of problems on one of my production systems. I actually already knew this was a problematic database, but I wasn’t that precisely aware of when and how it had issues. You can drill down to see the same thing for a given day:

And it’s not just the pretty pictures either, showing that most of our production load is in the morning, but that there are some odd spikes at midnight, 4AM & 7PM, but there’s data available behind the graph. If you drill down and hover over the graphs, pop-ups like this one appear:

(Names have been blacked out to protect my job)

And it’s the focus on wait times and types provided by the trending views and reports that make this a very strong tool. Ignite also collects pretty standard performance metrics, buffer cache hit ratio, memory paging, etc. And you can customize which of those metrics it collects and which ones you display, all on your own. But almost any decent monitoring tool does that. I use a tool that does that, and might even do that a bit better. No, what seperates this tool from the pack is that ability to identify the wait states and the query that is causing them. That’s what will get you excited when you’re using Ignite.

It also has a little ping mechanism that shows response time on the server, a helpful and friendly little reminder that all performance isn’t around what’s happening on SQL Server, but what’s happening across the enterprise.

Big chunks of the interface are customizable and you can add or drop things using the little icons on the right side of the picture above.

I can keep going with all the stuff that’s good, because it’s a long list, but I’ll summarize what I like so far. I like the focus on wait states, a lot. I like the focus on query performance. Between those two I feel like I’m getting the majority of what I need from the tool, and more than I get from other, similar, monitoring tools. The fact that it does all the other standard data collection on top feels like gravy. The problems I have with the software so far are either personal issues, Reporting Services, or fairly minor problems. I have a monitoring product running against the same servers as Ignite and I haven’t seen any impact from Ignite’s monitoring showing up there, so it looks like this is a benign monitoring software.

For my final installment, I’m going to look up the skirt of Ignite and see what the underlying data structure looks like. The ability to gather and present data is all well and good, but more important is the ability to take that data and make serious use of it through reports or by building other data structures out of it for long term storage and reporting.

Permalink 2 Comments

Confio Ignite: First Impressions

April 6, 2010 at 8:28 am (SQL Server 2005, SQL Server 2008) (, , , , , )

One of the national sponsors for SQL Satuday is Confio Software. Just because they sponsor SQL Saturday, it’s worth checking out their product, but they also host the speaker’s dinner at these events. So, having eaten off their dime, I felt obligated to take a look. I’m glad I did.

Confio Ignite is a monitoring software that keeps real-time and historical track of the performance of your SQL Server (and Oracle and DB2) database servers. It’s focus is on wait states and queues, a very common method for troubleshooting performance.

You can get a trial download from their web site to run for a couple of weeks yourself. Everyone’s first impression of a software, after the web site of course, is when go to do the install. The install routine for Ignite was very easy. It’s also a bit of a shock. It actually uses web pages for configuration. This makes sense when you consider that the software originally comes from Oracle and Unix systems. Anyway, despite the bit of weirdness of using a web page for configuring software, it’s a very easy install. Once the install is complete, it’s also really simple to add servers for the software to monitor.

I put it to work on a number of development servers to start (I’m not plugging some unknown software into Production without a shake-down. I may be stupid, but I’m not crazy). It’s agentless and it began running queries against the systems immediately. It’s storing the data to my local desktop. That data is available in real-time, a historical view, or a trending view, showing how performance is changing over time. It’s all through a web site, not dissimilar to Quest’s Foglight. You can see a list of servers, their immediate state, action choices and some other general info.

Clicking a server, after it’s had time to gather data, put’s you into the Trend screen. It’s laid out very nicely. You get to break down performance by SQL, Waits, Programs, Databases, Machines, or Users. Then it shows information by Total Wait, Average Wait or Typical Day. The chart gives you a visual representation of queries running against the system, by default numbered and assigned by an internal hashing system that Ignite uses (no, I don’t know what the internals look like, I’m inferring that from the information presented). Down at the bottom of the screen you can see Top Query Problems, which shows you queries that are running slow or using too many resources according configurable settings, or Resources, your standard breakdown of CPU, Memory Paging, Disk Queue and the rest, or SQL Text, literally the queries that have been running on your system.

Almost everything has a drill down that takes you to a lot more information, and there are pop-ups that show you bits of detail.

I’ve only been running it for a few days, but I can already see how it could be very useful. I’m very happy with the focus on waits and on queries. So often when you’re looking at these tools queries are either an after thought or ignored completely. You frequently have to spend time monitoring your queries on your own, while that very expensive monitoring software is tracking page life expectancy and buffer cache hit ratio, both useful measures, but you still need to know which query is causing the problems. Ignite will get to that information right away and still collect the other as well.

So far, I’m pretty pleased with the software. I’m going to switch this over to a production system in the next few days and see what I can see with Ignite running against a real system. More to come.

Permalink 2 Comments

nHibernate, First Look at TSQL

April 5, 2010 at 10:19 am (nHibernate) (, , , , )

I’ve blogged in the past about the nHibernate project that has been going on at my company for the last eighteen months. Prior to today, I have only seen the database generated by nHibernate. But today I finally started seeing some TSQL code. My first impressions… oy!

There are two levels of concern coming out of the gate. First, it appears that some of the programming decisions, completely independent of nHibernate, are going to cause problems. Second, it appears we’re going to be hitting issues with nHibernate.

First, the programming decision. I’ve been using Confio to monitor this server for a few days now (look for upcoming posts on my eval of Confio). Only one day has captured any real activity from the nHibernate team (and yes, I’m basically spying on these guys because they are absolutely insistent that my team stay out of their way). The longest running query was two calls to this (names have been changed to protect my job, there are no innocents):

SELECT BookID
FROM Books WITH (updlock,rowlock)
WHERE BookID = 42

 What they’re doing is locking the row so that no other user can get at it while it’s being edited. Total lock time for the first day was 38 seconds for two calls. I’ve posted about this previously, so I won’t go into it again, but this is potentially a disaster.

On to nHibernate. The next query was pretty benign:

SELECT OrgID
,OfficeID
,StartDate
,EndDate
,OtherID
FROM SpecialList
WHERE OfficeID = 42

Unfortunately this table, as created out of nHibernate, has no primary key, no clustered index or any other type of index, so this is just a table scan. But it’s only on 109 rows… in dev, not production, oh, and most of the 109 rows have a null value for OfficeID, but it’s all good I suppose… until I notice that this query also had 38 seconds of wait time, but it was called 394,652 times… oy (and no, I’m not even one of the chosen, but that’s a wonderful expression for this situation). Looking at the data in cache, this query has been called, since it was created in cache on the 2nd, 598351 times with a total elapsed time on the server of 5429689ms. The average then is 9ms, but the max was 29296 or 29 seconds.

The next query up looked like this:

SELECT col1
,col2
,col3
FROM Table1
INNER JOIN Table2
ON Table1.ChildID = Table2.OtherID
WHERE tabl2.ParentID IN (@p1,@p1,@p2…@p99)

Yep. 99 parameters passed and used against an IN statement. I can’t post the execution plans on this stuff without talking it over with the boss, but suffice to say, it’s two table scans and a merge join to get the data back. Oh, and this was called 1,138 times with a minimum elapsed time of 976ms. Yeah, minimum time to call this proc is 1 second and it’s been called over a thousand times.

It went down hill from there. I’m looking at other queries, one that hits seven tables, but instead of using JOINs uses a bunch of sub-selects in the WHERE clause. It has a minimum run time of 9 seconds. Luckily it’s only been called 64 times.

This is a horror show.

A lot of benefit can be reaped if we’re able to go in and create some indexes on these tables, but that’s going to be an issue anyway because I’m seeing lots of implicit data conversions on the parameters being passed in, as I noted a couple of years ago. At this point in time, I would say, at best, I’m very underwhelmed by what nHibernate is producing. To be perfectly honest, I’m concerned for the health of the project.

Permalink 22 Comments

Identifying Frequently Running Queries

September 21, 2009 at 9:13 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , )

It’s not enough to look at the longest running query. What if you have two queries, one runs for 5 minutes and the other runs for 5 seconds. Which do you tune? The first one, of course. But, let’s add a little information to our thought experiment. The one that runs for 5 minutes is called 2AM, your local time, when there are almost no users on the system and the one that runs for 5 seconds is called 40 times a minute all day long. Now, which one do you tune? That’s right, suddenly that 5 second query takes on a HUGE significance based on how often it’s called.

But how do you know how often a query is called? The easiest way to gather that information is not necessarily the best in terms of a long term methodology, a server side trace that captures RPC and SQL Batch completion events. The beauty of the server side trace is how easy it is to set up and how much information you can gather, so quickly. You can immediately start getting every query run on the server along with how long they took to run, how much CPU each used and various other metrics. You can run this for short periods or long and then gather the data together into a table in a database and run aggregation queries against it to generate reports showing the most frequently called query. Simple, quick easy. But there are a couple of problems with it. First, while you can run it 24×7, you’re going to see quite a bit of data generated from the events, depending on the load on your system, of course. Second, the queries that come through will likely have all kinds of stuff associated with them, various paramter values and what not, that will make aggregation hard. You’ll have to put together a means of cleaning the data and maintaining it or purchase one of the many third party tools for doing this. You’ll have to store and manage large amounts of data as it comes into the system. It’s going to be a lot of work.

Another option, one that will require a degree of setup, is a little less accurate, it’s unlikely to capture every single execution of every single query, but it will get the majority, sys.dm_exec_query_stats. This handy little dynamic management view carries aggregate data about all the queries currently in the cache. It will show how much accumulated processing time a query has taken, the last time, the longest time, the shortest time and lots of other information, but the one interesting value we would be most interested in here, is the usage count. If you just run a SELECT statement against sys.dm_exec_query_stats, you’ll get interesting data back, but you won’t see your query or your execution plan. To make the information available within the DMV, you’ll need to combine it with other DMV’s. Here’s a sample query that pulls back information that could be useful for identifying badly performing queries on your system including how often they’re called.

SELECT dest.text
,deqp.query_plan
,deqs.execution_count
,deqs.creation_time
,deqs.query_hash
,deqs.query_plan_hash
,deqs.total_elapsed_time
,deqs.total_clr_time
,deqs.total_physical_reads
,deqs.total_worker_time
FROM sys.dm_exec_query_stats deqs
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_sql_text(deqs.sql_handle)) dest
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_query_plan(deqs.plan_handle)) deqp

There’s a lot you could do with this query, but the real power is going to come from gathering this data on a regular basis. It’s going to depend on how volatile the cache is on your system. If most of your queries sit in the cache for weeks (and that’s possible), then you could run a query once a week to gather the information. If your cache is getting flushed clean once an hour, you’d need to run this query more often. Regardless of how often you run the query, understand that, because you’re dependent on the cache, it is possible to miss data. Once you get the query running on your system, you can use the information to determine which of the queries needs to be tuned, not just on how long it runs, but also how often it runs, or even using other combinations of the information returned.

One other point, you may see ad hoc queries on your system. Here’s a simple example:

SELECT TOP (5) sod.*
FROM Sales.SalesOrderDetail sod
WHERE sod.ProductID = 772
ORDER BY sod.SalesOrderID ASC ;

If the value passed to sod.ProductID is changed to 773 or any other value, you will get a different entry in cache for this procedure. That’s where you can put the query_hash or the query_plan_hash values to work. These values show a hash that SQL Server creates for each query that it stores in cache. You could group by these values to aggregate ad hoc queries that would, if they were stored procedures, be identical executions and arrive at an aggregated count.

So, if you need precise information, you can use trace events, but, pay for that with a lot of data that you have to manage.

EDIT: Changed deqs.total_elapsed_date to deqs.total_elapsed_time as per comments below.

Permalink 6 Comments

Editorial on SQL Server Central

September 14, 2009 at 6:46 am (Misc, Tools) (, , , , )

My first one over there. It’s discussing whether or not you should do two things, build your own monitoring tool, come out in particular favor of one tool or suite of tools from a single vendor. Please read it and watch the video. And, even more importantly, leave a comment in the discussion.

Permalink Leave a Comment

LaRock on Monitoring

September 11, 2009 at 7:34 am (SQL Server 2005, SQL Server 2008, TSQL) (, , , , )

Since I just spent a bit more than half of my 24 Hours of PASS presentation on tuning queries talking about monitoring performance, you could probably tell that I think that the two are inextricably linked. I’m not alone. Tom LaRock has put a post on why it’s important to understand the extent of your problem prior to attempting to fix it. It’s absolutely spot-on and a must read.

Remember, if someone says performance on a system you manage is slow, you have to ask, as compared to what? You need to understand what “normal” performance is on your system in order to arrive at the ability to identify a particular process as performing in an abnormal manner. That’s not to say that you couldn’t tune a query in complete isolation to a baseline of performance. Of course you could. The question that Tom is answering is, are you tuning the right query for the right reasons?

Oh, and for those that are interested, some of the presentations made during 24 Hours of PASS are now available for download. The rest of the sessions will be available after the Summit. If you haven’t made plans to go, I’d get on over there now and register.

Permalink Leave a Comment

Debuting: Beginning SQL Server 2008 Administration

September 3, 2009 at 9:21 pm (SQL Server 2008) (, , , , , )

The new book is up on Amazon. I only worked on three chapters of Rob Walter’s new book and that was after begging to only work on two, so I can’t take much credit for the effort that went into this book. However, thanks to our editor Jonathan Gennick, I was privileged to work with Rob & Carmen, if pretty indirectly. I know I mentioned the book before when it was put up on the Apress web site, but this is Amazon. Once it’s up on Amazon, it’s real.

Permalink 1 Comment

No really. When did this statement start?

July 31, 2009 at 8:07 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication) (, , , , , )

I thought I had an interesting answer to the question. Unfortunately Adam Machanic, who has been working in this specific area quite a bit longer than I have, and, let’s face it, is a bit smarter, pointed out (in the comments) the possibility that I didn’t have this quite right.

I ran all his tests and sure enough, it wasn’t working the same way that I saw it work. First, I tried modifying his query so that it ran the SELECT statements from the DMV’s between the operations, but this didn’t change the results, start_time and last_request_start_time were the same. From a scientific stand-point, if no one can replicate your results, the experiment failed.

So I went back and tried again. To be sure I got things right and wasn’t, somehow, querying the wrong server or looking at the wrong spid, I added a couple of checks, specifically I queried the spid I was working with from my first query: SELECT @@SPID.

Unlike Adam’s tests, I continued running mine with two connections. I wanted to see multiple entries while the test was running. The results from sys.dm_exec_requests showed what I had seen before. There were 25 rows,  All of them showing the exact same start_time. The command, sql_handle, wait_time, cpu_time and all the other values that you would have expected to change, did. Elapsed time and logical_reads for example show a steadily increasing value, as you would expect.

Turning to sys.dm_exec_sessions, it showed more data, 40 rows instead of 25, just because it continued to record a session for the whole time I was querying the view since the spid wasn’t dropped. I finally saw the problem. It was the way I tested. I started the query against the DMV’s before I started the query that was generating the load. Because of this, the initial queries against the sys.dm_exec_sessions showed a different start & end time, unrelated to the test. Then the test ran and a second set of values were generated when the test completed. What I had attributed to the time it took for individual statements to complete and then be registered was not. It was just the different batches finishing, not the statements.

Nuts. 100% wrong. That’ll make Tim Ford happy though.

Unfortunately I’m back to where I was. Is there a way to tell when a statement started? I don’t know of a dynamic method for doing so. Obviously you could run a trace to get the statement start event, but you’ll be collecting a lot of data depending on how you set up that trace, so it wouldn’t be a good approach for everyday troubleshooting. I’ll keep looking.

Permalink 13 Comments

When did this statement start?

July 28, 2009 at 10:19 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , )

UPDATE: This post is incorrect. Adam nailed it in the comments. I explain my mistake here.

A question came up over at SQL Server Central where someone was wondering if it was possible to know when a given statement within a batch started. Immediately I thought, oh yeah, that’s easy, use the sys.dm_exec_requests dynamic management view (DMV). Done.

Wrong. The original poster pointed out that I had assumed that the values present in the DMV represented statement level values, but they show the batch. While the DMV shows a start_time, that time is the start of the current batch, not the statement within the batch. Now the question was, where else might I get this data? I next tried sys.dm_exec_sessions because it has the last_request_start_time value. Sure enough this worked.

Don’t believe me? This time I tested it. First, I set up a batch that ran a few longish queries:

BACKUP DATABASE AdventureWorks
TO DISK = 'c:\bu\ad.bak'
WITH INIT;
DBCC CHECKALLOC(AdventureWorks);
DBCC CHECKDB(AdventureWorks);
DBCC CHECKCATALOG(AdventureWorks);
BACKUP DATABASE AdventureWorks
TO DISK = 'c:\bu\ad.bak'
WITH INIT; 

Then I wrote a little loop that would insert data from both DMV’s, once every second for a minute:

SET NOCOUNT ON;
TRUNCATE TABLE dbo.dmR;
TRUNCATE TABLE dbo.dmS;
DECLARE @CheckTime DATETIME = DATEADD(ss,60,GETDATE())
 ,@CurrTime DATETIME = GETDATE();
WHILE @CurrTime < @CheckTime
BEGIN
INSERT INTO ... SELECT ... FROM sys.dm_exec_sessions
INSERT INTO ... SELECT ... FROM sys.dm_exec_requests
WAITFOR DELAY '00:00:01';
SET @CurrTime = GETDATE();
END

Then it was a question of hitting the go button. The results are pretty clear. If you look at sys.dm_exec_requests data, you’ll see the start_time doesn’t change, but, the work being performed, visible in the command and plan_handle columns changes. You can also see the statement_start_offset and statement_end_offset changing. These values will allow you to pull data in combination with the sys.dm_exec_sql_text() dynamic management function.

 start_time             command         statement_start_offset statement_end_offset plan_handle
2009-07-28 10:42:09.447 BACKUP DATABASE 0                      138                  0x060001007D931719B8C0BF0D000000000000000000000000
2009-07-28 10:42:09.447 DBCC            140                    206                  0x060001007D931719B8C0BF0D000000000000000000000000
2009-07-28 10:42:09.447 DBCC            140                    206                  0x06000F00AF16F10FB8C04D0D000000000000000000000000
2009-07-28 10:42:09.447 DBCC            208                    268                  0x060001007D931719B8C0BF0D000000000000000000000000

But, as you can see, the start_time stays the same. Looking at the last_request_start_time and last_request_end_time in the sys.dm_exec_sessions DMV, you can see these values changing in conjunction with the changing values in sys.dm_exec_requests.

last_request_start_time last_request_end_time
2009-07-28 10:35:21.500 2009-07-28 10:35:51.170
2009-07-28 10:42:09.447 2009-07-28 10:35:51.170 

I’ll leave at as an excercise for the reader (or for another post for myself) to put together the DMV’s and DMF’s that would allow you to combine this data and show exactly which statement was being executed at which moment. The point, it is possible to determine exactly which statement is currently being executed within a batch process, and when that statement started.

Permalink 5 Comments

Next page »