Or, another way to put it, in most cases, shooting yourself in the foot.
I was not aware that the cumulative update for SQL Server 2008 back in June included a switch that allows you to turn parameter sniffing off within SQL Server. Thanks to Kendra Little (blog|twitter) for letting me know about it (although she let me know by “stumping the chump” during my lightening talk at the Summit, thanks Kendra!).
When I first saw the switch, I thought about the places where turning off parameter sniffing could be helpful. But, as I thought about it, the more I realized that removing parameter sniffing was an extremely dangerous switch. Why? Because, most people only ever hear about parameter sniffing when they run into a problem. Someone says “Parameter sniffing” and you see people cringe. Too many people will take this information in and go, “Hey, I can just switch parameter sniffing off and I’ll have a much faster system, all the time.” But… even when you’re not hitting a problem with parameter sniffing, you’re still getting parameter sniffing. Here is where I see a problem. Let’s discuss what parameter sniffing is.
Parameter sniffing is applicable to stored procedures and parameterized queries. What happens is, when a value is passed to a parameter, the optimizer has the ability to read, or “sniff,” the value of that parameter. It can do this because it knows exactly what the value is when the proc/query is called. This is not applicable to local variables, because the optimizer can’t really know what those values might be, where as it knows exactly what the values of parameters are going in. Why does it do this? One word: statistics. Statistics are what the optimizer uses to determine how queries will be executed. If the optimizer is given a specific value, it can then compare that value to the statistics on the index or table in question and get as good an answer as is possible from those statistics as to how selective this value may be. That information determines how the optimizer will run the query and because it is using specific values, it’s looking at specific information within the stats. If the parameters are not sniffed, the statistics are sampled and a generic value is assumed, which can result in a different execution plan.
The problem with parameter sniffing occurs when you have out of date statistics or data skew (certain values which return a wildly different set of results compared to the rest of the data within the table). The bad statistics or skew can result in an execution plan that is not consistent with most of the data that the stats represent. However, most of the time, in most situations, this is an edge case. Notice that hedging though. When parameter sniffing goes bad, it hurts.
Most of the time we’re going to gain huge benefits from parameter sniffing because the use of specific values leads to more accurate, not less accurate, execution plans. Sampled data, basically an average of the data in the statistics, can lead to a more stable execution plan, but a much less accurate one. Switching parameter sniffing off means that all queries will use sampled data, which creates a serious negative impact on performance. Most of the time, most of us are benefitting wildly from the strengths of parameter sniffing and only occasionally are we seeing the problems.
Unless you know, and I mean know, not suspect, that your system has major and systematic issues with parameter sniffing, leave this switch alone and let the optimizer make these choices for you. If you don’t, it’s very likely that you’ll see a performance hit on your system.
Today is Dr. Dewitt.
The ballroom, where the keynotes are held, is filled with extra chairs. The Summit organizers expect extra attendance today, and well they should. Dr. Dewitt was amazing last year. I suspect this year will be more of the same.
Rick Heiges is introducing the day (waiting for Dr. Dewitt). Lynda Rab is leaving the board. Sad. I started volunteering for the PASS organization working for Lynda. She’s great. The new board members are Douglas McDowell, Andy Warren and Allen Kinsel.
The spring SQL Rally event was announced. I’ll be presenting a full day session on query performance, Query Performance Tuning, Start to Finish. Look for (a lot) more blog posts on this. The Summit next year has been moved to mid-October. WHOOP! This is great because I was going to miss it next year. Oct 11-15 will be the dates in 2011. Of course, it’ll be at Seattle.
Dr. Dewitt is finally on stage. From this point forward, I’ll be just posting his words & some comments. This is my best attempt to capture the information. There will be typos.
Query optimization is a really hard problem. Dr. Dewitt, says “I’m running out of ideas.” Yeah, right. His “Impress Index” is basically an arrow going down. He’s cracking jokes about his delivery, asking, How Can I Possibly Impress You. He’s showing this strange picture that has 240 seperate colors that each represent an exec plan in the optimizer. We’ll be back to that. This session was voted on. I’m glad optimization won. They live in fear of regression, talking about the optmizer developers.
The 100,000 foot view, magic happens. He’s working off of TPC-H benchmark, query 8. There are 22 million ways of executing this query. The optimizer has to spend a few seconds to pick the correct plan from this full set. It’s still possible to pick bad plans. Cost Based optimization came from System R & a lady named Pat Selinger at IBM. Optimization is the hardest part of building a DBMS, after 30 years. Situation is fruther complicated by advances in hardware and functionality within the DBMS.
The goal of the optimizer is to transform sQL queries into an efficient execution plan. The parser turns out a logical operator gtree, which then goes to the optmizer and a physical operator tree is sent to the execution engine. He’s showing a simple table, based on movie reviews. The query is a SELECT with AVG. Two possible plans. A scan occurs first, then a filter is applied to pull out the right movie and then an aggregate occurs. With this you’ll get a scan, meaning I/O corresponds to the number of pages on the table. Plan 2 uses an index to pull pages from the non-clustered index. This means random disk access that will look up the movies and then pass that on to the aggregate. The optmizer then has to figure out which is faster. The optimizer estimates the cost based on the statistics it has in hand. It has to estimate how many movies there are. So it estimates the selectivity of the predicate, then it calculates the cost of the plans in terms of CPU and I/O time.
So there are equivalence rules, such as select & join operators. Join operators are associative, meaning that the results from multiple tables are associated. Select operator distributes over joins and there are multiple ways of getting back the same information, all evaluated by the optimizer.
With a more complicated query, it could start with seelction of customers, then a selection of reviews, join them together, then join to the movies table and then project out the select out the columns wanted. But with equivalence rules, you can get other plans. Selects distribute over joins rule gets a different plan, or selects commute rule can change the plan. He showed five different plans, then four more plans & said he could have done another 20. For this simple query, he came up with 9 logically equivalent plans. All nine will produce the same data. For each of the 9 plans there is a large number of alternate physical plans that the optimizer can choose.
Assuming the optimizer has three joing strategies, nested loops, sort-merge & hash. He’s also assuming two selection strategies, sequential scan or index scan. Obviously, this is simplified.So, using these three joins & two select methods, there are 36 possible physical alternatives, for one logical plan. So with 9 logical plans there are 9*36 = 324 possible physical plans. And that’s for a VERY simple query.
Selectivity estimation, is the task of estimating how many rows can satisfy a predicate like MoviesId = 932. Plan quality is highly dependent on quality of the estimates that the optimizer makes.
I just sent in a question.
So the Histogram is the distribution of the data within the table. So there isn’t enough space within the db to store detailed statistical info. The solution is histograms. You can different kinds. The equi-widthy histogram divides the rows into equal sized buckets and then figures out how many values match each range of values. So, for an actual value, it might be .059 selectivity, but the estimated value is actually .050. That’s extremely close. But, another value he shows has .011 actual but in the histogram is .082, which is a HUGE error. Hello bad execution plan.
Another approach is equi-height histograms. These divide the ranges so that all buckets contain roughly the same number of rows, as opposed to an equal distribution of values. In equi-height, the second example is .033 instead of .082. Which is pretty good, but still skewed. He’s basically showing that errors can be introduced all over the place. The first example is .167.
Histograms are the critical tool for estimating selectiviy factors for selection predicates. But errors still occur. The deal is, there’s just a limited amount of space for these. other statistics are rows, pages, etc.
Estimating costs the optimzer considers I/O time and CPU time. Actual values are highly dependent on CPU and I/O subsystem on which the query will be run. For a parallel database system, such as PDW, plug, the problem focuses also on network traffic. So back to the two alternative physical plans… You have to determine which plan is cheaper. Assuming that the optimizer gets is right, we know that there are 100 rows out of 100k pages. These are sorted on date, but we’re going for MovieID, random reads. The optimizer doesn’t know system it’s on, but it makes a guess that a scan will take 8 seconds. The Filter will work on .1 microsecond/row & aggregate will be .1micrsec/row, for .00001 seconds, for a total of 9 seconds. Plan two will use the index. Since the rows are sorted on date, random seeks are going to occur. .003 seconds / seek, then total time .3 seconds and same time for the aggregate. This means plan two is the winner.
But, what if the estimates are wrong. On a log plot, you start to see how, as the number of rows returned, each plan will perform better, based on the rows returned. More will make plan 1 better, but less will make plan 2 better.
That was just to get the data out of a table. To add in JOIN costs, things get worse. First example is to take a sort-merge join. This sorts each data set being returned, and then merges the results through a simple scan. Cost is 5r + 5m for I/O. A nested loop works on scanning one table and row-by-row, scanning the other table. The cost is R + R * M. R is rows M is pages.
With the example, you can see that with an indexin place, highly selective, loop joins can be cheap. But it’s the cardinalities that affect things. So, getting the histogram right is the key trick. With a log plot, again, you see how the various operations vary over time. So for a sort merge, it’s very expensive at a low number of rows, but at a large number of rows, it still returns in about the same amount of time. So as large sets of data are accessed, merge gets good. But at lower numbers of rows, the nested loop works better. So if the cardinality estimate is off, you could get a huge error in performance, especially at the larger sets of data. The optimizer has to pick the right join method. This is based on the number of rows in each set of data being joined.
He then moves on talk about how much space these things take up. The space depends on the “shape” of the query. He shows a type called a “start” join and a type called a “chain” join. Whoa! as you increase tables, the likely numbers of plans increases a lot. I knew this, but I haven’t seen it written down like this. But these shapes are extremes.
Every query optimizer starts off with a left deep plan, first, instead of bushy plans. For the example, a bushy tree would have 645k equivalents for the Star Join as opposed to 10k for left deep plans. With 3 joins methods and n number of joins in a query, there will be 3 to the power of n possible physical plans. Uh… wow. Instead, the optimizer uses dynamic programming. Sometimes heuristics will cause the best plan to be missed.
One method of optimization is Bottom Up. Optimiztion is performed in N passes (if N relations are joined). First pass, find the best 1-relation plan for each relation. Pass 2, find the best way to join the result of each 1-relation plan to another relation to generate all 2-relation plans. Pass n, find the best join result… can’t see it. Gets the lowest cost plans & interesting order rows. In spite of pruning plan space, this approach is still exponential in the # of tables. Costs are done, then pruning occurs. I’ve stopped taking notes on this part. You’ll have to see how this works in the slide deck (I’ll post the location at the end).
So that’s the theory. But the problem is, bad plans can be picked. If the statics are missing or out of date, cardinaltiy estimates are against skewed data, attribute values are correlated, and regression, hardware changes mess stuff up.
Opportunities to improve. Jayan Haritsa, has the Picasso Project. Bing this: Picasso Haritsa. There are actually software there that helps improve values. He’s back to TPC-H Query 8, and using the tool, it will show the plan space for the query, this is the painting of the cool picture at the start of the talk. With this, you can see how sensitive input parameters are to plan generation. So the cardinalities estmates are the key.
This animation shows how the estimated costs for a query start low, peak, and then, instead of continuing up, goes back down. And the optimizer team doesn’t know why. This is his example of how QO is indeed, harder than rocket science.
What can you do better? Well, Indexed Nested Loops looks good, but they’re not stable across the range of selectivity factors. If they went conservative and always picked sort-merge, it would be more stable. So, picking slower operations could make things more stable, just slower. Robustness is tied to the number of plans. And he says the QO team doesn’t understand.
At QO time, have the QO annotate compiled queryu plans with statistics and check operators. Then, you can see how this stuff works. They use this in two ways, a learning optimizer and dynamic reoptimization. The optimizer observed stats go back to a statistics tracker and then, feed that back through to the catalog, and the next query will be better. The dynamic reoptimization takes the idea that actual stats note the estimated stats and when there are differences, truncate the operation, pause the execution, output the query back to tempdb, stores that, and then uses that with the rest of the query to re-optimize using real values. Cool!
Key points: Query optimization is harder than rocket science. Three phases of QO: Enumeration of the logial plan space, enumeration of alternate physical plans selectivity estimates. The QO team of every DB vendor lives in fear of regressions, but it’s going to happen, so cut the optimizer some slack.
“Microsoft Jim Gray Systems Lab” on FaceBook is the source for the slides. Available here.
This is just another reminder to please vote for my blog post on using PowerShell Remoting with SQL Server. It’s a post I’m proud of.
Also, I think that Gail Shaw (blog|twitter) is one heck of a great blogger. She has multiple posts in several categories. She’s extremely deserving of your vote. She might even get the most votes overall, another thing I think she deserves. I’ve learned tons and tons from all the information she puts out there. The least she deserves is a little chunk of plexiglass for all that hard work. So vote for all her posts.
A week from now will be Kilt Day at the PASS Summit. It’s probably way too late to order a kilt at this point. But, don’t despair. You can still take part. Just a short walk from the Summit is the headquarters of Utilikilt. These are not classic tartan wraps with sporans and socks. They’re the modern equivalent, come in fun fabrics & colors and are actually pretty practical. So if you still want to participate in Kilt Day, and we’d love to have you, plan a trip to Utilikilt.
And no, they’re not sponsoring me or anything (more’s the pity). I just like them.
Quote: “Database stuff, all this programming stuff, is easy. Anyone can do it. That’s why everyone in the company has sa privileges.”
For nine months, I worked in an environment where everyone, from developers to QA to the sales people to the receptionist, had SA privileges. You know what? DBA skills are necessary.
I speak from the point of view of someone that has had to recover a server after a salse person helpfully “cleaned up the temporary stuff on the server” by dropping tempdb, causing a late deployment for a client. I speak from the point of view of the guy who kept a window open on his desk with the database restore script ready to run, all day long, because of “accidents” that stopped development until I could get the database restored. I speak from the point of view of someone who had to explain, multiple times, “No, you can’t hit the undo button to get that data back” in client systems. And yes, I could go on and on about this nine month gig, but I won’t. You might consider it an extreme case, mainly because, it was.
Instead, I’ll talk about another job. It was a free-wheeling dot com. We were doing some amazing stuff. We were running well over 300gb of data into SQL Server 7.0 in a 24/7 environment… until the day one of the managers of the organization said “You database guys are spending too much time working on maintenance of the server. SQL Server manages itself. You will stop all maintenance, immediately, and don’t work on it any more.” It only took two weeks for the server to crash. It took us three days (72 straight hours) to rebuild the system and recover the data, with no web site available for our thousands of clients during that entire time. Funny enough, a memo came down from management, not thanking us for putting in all the effort, no, but instructing us that we should spend adequate time performaning maintenance routines on the server.
And before you ask, yes, I can keep going. I have tons of examples. How about the application development team (different company) that built the entire database out of multi-statement table valued UDFs that called other UDFs that called other UDFs that called other UDFs… which led to a delayed production roll-out.
What I’m saying is, these are not isolated or extreme examples (OK, the first one was extreme, but only a little). This is real world stuff occurring on regular basis, every day, all over the world that negatively impact the businesses that we support. It’s not that people are stupid. They’re absolutely not. It’s just that databases and database servers, and data for that matter, are still pernicious and difficult. It’s not rocket science, not by any stretch of the imagination. Why do I say that? Because I can do it. In fact, I’ve been quite successful as a DBA and Database Developer. If I can do it, it’s not that hard. But it is a specialized set of knowledge, and one that is quite extensive. There’s just so much to learn about how data is stored, retrieved, and managed within SQL Server, it’s crazy to assume you wouldn’t need a specialist as you move deeper and deeper into managing more and more information. The lack of that knowledge, can, and does, have an impact on the bottom line.
If you’re a business person or a developer (and by developers, I’m not talking about the exceptional Brainiac that can really do it all, I’m talking a normal human being), ask yourself, do you know how to restore a database to a point in time? Just in case you don’t know, this means recover the database from backups and log backups to a point, usually just before a failure or a bad update or some other problem, again, that will cost your company money. Ask yourself, do you know how to set up backups so you can do that point in time recovery? Do you know how to tell why a query is running really slow and keeping customers from buying your stuff? Do you know how to tell why you’re getting deadlocks and losing transactions? Do you know how to tell if a process is blocked or just taking a long time? I’m clearly belaboring the point, and this is still fairly basic information. I haven’t even started delving into configuring systems for mirroring, or gathering metrics for a performance tuning effort, or any other of hundreds of tasks that need to be performed to build large scale data management systems that keep your business in business. If you don’t know what all this stuff is, or you don’t know why it’s needed, you might just need someone in your organization who can help out.
And yeah, there are tools out there now, like Object Relational Mapping (ORM) tools, that take away vast amounts of the labor that used to go into building databases. But, having worked with teams building tools with an ORM, while they’ve successfully removed the database from their thoughts & designs by using the ORM tool, they’re still storing data on a database system. Guess what? The problems, such as deadlocks, index scans, blocking, still exist. Sometimes, these problems are even exacerbated by the ORM tool. Now what? You need a specialist.
Fine, what if you toss all the mess, get rid of relational storage, go for one of the NOSQL database systems. Yeah, that might work, a bit. But guess what. You’re still storing data. It’s still going to, in most circumstances, outlive the application that built it. It’s still going to need to be reported against, backed up, recovered, protected, performance tuned… In other words, you’re going to run into situations where, despite the fact that you’ve eliminated the things that irritate you, like SQL, like constraints, you’re still dealing with pernicious data and you’re going to need someone who has been working out how to deal with that.
Does every single company with any installation of SQLExpress need to hire a DBA? No, of course not. But as soon as your business starts to rely on having data around, all the time, you just might need the skills of someone who has been working with this stuff for a while.
As if the US Election and SQL Rally voting were not enough (and have you voted yet for the SQL Rally Pre cons? Polls close tomorrow at 8PM), I just found out that SQLServerPedia is holding an awards vote. You only have a week, so get on over there and vote. There are a ton of categories and some really excellent blog posts to choose from.
Yes, I have a post in there, but in this instance, I am not asking for your vote and won’t provide a link. All of those bloggers, many of whom I know personally, put in extra work to get that information out to you. They all deserve your vote. That is a collection of extremely hard working, smart people, that I learn a lot from. I’m not going to try to compete with them. So please, follow the link and take care of business.
During the PASS Summit I have again been given the opportunity to keep my laptop plugged in… as long as I blog about the Key Notes. So, I’m going to do it, power is hard to come by in that place. Once again I can regale you, near real time, what’s occurring in the key note addresses at the PASS Summit. Once more I’ll have the opportunity to jump on to the table while wearing a kilt.
But, this year, you may not want to read me. Instead, you might want to tune into the key notes yourself. PASS is going to transmit them live. You can go to the this link to watch them. Now, I can hear you, literally, thinking to yourself, “Right, just what I need in my life, to listen to some sales hack tell me about some semi-functional bit of software.” Most of the time, you’d be right. But this is PASS. We don’t just listen to sales hacks stumbling through presentations. We’re getting to learn from Dr. DeWitt again this year. I’m jazzed and you should be too. Dr. DeWitt’s presentation last year was simply amazing. In terms of sheer geek fun, it’s hard to beat. This year should be as good, or better.
I’ll also be tweeting all week. Follow hash tag #sqlpass to find out what’s happening from me and all the other Twitterati.
Sigh… I just spent almost a full work day trying to come up with, what turned out to be a single line script. The requirements were simple. After patching of servers is completed, and everything is rebooted, sometimes, not often, but sometimes, SQL Server Agent doesn’t restart correctly. I was asked to put together a script that will check a list of servers to see if Agent is running. Any PowerShell guru’s reading this are either laughing or crying, already.
So, I started exploring how to connect to the server using using some type of PS-Drive, but I quickly figured out that I’d have to loop through the servers that way. Plus, I don’t think there’s a way to check Agent status through the SQL drive. So that went out the window. Then, it was to use Invoke-SQLCmd. After all, I can call multiple servers through the -ServerInstance property, so I just need to find a good T-SQL statement to check Agent status… Googling & Binging didn’t turn up a good T-SQL statement to check Agent status (not saying there isn’t a way, but I couldn’t find one easily, plus, this is a story about PowerShell). On to SMO. All I had to do was figure out how to connect to the server and the SQL Agent object through SMO. I’d have to loop through the servers, but for such a simple script, that shouldn’t be hard. Ever tried to connect directly to the Agent through SMO? Me neither. It’s a pain.
Finally, I referred back to Don Jones (blog|twitter) book, Windows Powershell 2.0, TFM (and a more appropriately named book, there isn’t), and noticed a statement, Get-Service… Hey! Isn’t SQL Agent basically a service? Yes. Yes it is. Here’s the final script, in all it’s simplistic glory:
Get-Service -DisplayName "SQL Server Agent*" -ComputerName (Get-Content "servers.txt") | Select-Object MachineName,Status
Yep, it was that easy. Kids, don’t try this at home. Take the time to learn what you’re doing. Don’t assume you know.
The absolute biggest part of the PASS Summit is the one thing that most people don’t take advantage of, networking. And no, I don’t mean glad-handing everyone you meet, remembering all their names (although that is a good skill to have) and saying over & over again “Rush Chairman, damn glad to meet you.” I mean taking advantage of the fact that you can talk to people that have already solved the problem that you’re facing at work, or just might have some insight into that issue, or maybe you can give them insight into a problem they’re facing. I’m mean, talking to people.
Yeah, I know, we’re all geeks, and worse than that, data geeks. That means we like to sit in dark little caves & grumble about our fellow man having WAY too much access to the data we’ve been sworn to protect. I’m with you. But, you’ve made the decision to go the PASS Summit. You’re there. All over the place are your fellow data geeks. And look at that, some of them are talking to each other. You can too. In fact, you should.
So, how do you break the ice? Here’s a suggestion. When you go to lunch on Tuesday, look for the Birds of a Feather tables. Each one will have a different topic, hosted by someone who knows at least a little about that topic, or is just really excited about discussing that topic. Sit down (you don’t need permission, it’s implied), introduce yourself and dive into the topic. Ask questions. Answer questions. At least say hi before you sit there and listen. You’re in. You’ve just made the leap. Welcome to networking. Now, find out where the party is for Tuesday night and you can do some more.
I’ll be hosting a table on the topic “T-SQL Tuning & Optimization.” If you’re interested in that topic, please, sit down & talk. Oh, I might be a minute or two late. I’m presenting right before lunch. Save me a chair, just in case.
The real world can get in the way of important things like SQL Saturday. In the case of SQL Saturday New England, the original date was the same time as the Boston Marathon. Unfortunately, that means that Boston’s somewhat limited hotel space will be tweaked. Rather than try to compete with that, we moved the date. Please mark it as April 2nd, 2011.