I saw an odd statement the other day, “The size of the name of the parameter does not affect performance.” My first thought was, “Well, duh!” But then, I had one of those, “Ah, but are you sure” thoughts. And you know what, I wasn’t sure.
If size of the parameter name did affect performance, I figured, the one sure place where that would be evident is in the size of the execution plan. Right? I mean, if there was an impact on memory, and hence on performance, that’s probably where you’d see evidence of it. I wrote two queries:
DECLARE @ThisIsAVeryVeryLongParameterNameThatIsTrulyRidiculousButItIllustratesThePointThatParameterLengthDoesNotAffectPerformance int SET @ThisIsAVeryVeryLongParameterNameThatIsTrulyRidiculousButItIllustratesThePointThatParameterLengthDoesNotAffectPerformance = 572 SELECT soh.SalesOrderID ,sod.SalesOrderDetailID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID WHERE soh.SalesOrderID = @ThisIsAVeryVeryLongParameterNameThatIsTrulyRidiculousButItIllustratesThePointThatParameterLengthDoesNotAffectPerformance DECLARE @v int SET @v = 572 SELECT soh.SalesOrderID ,sod.SalesOrderDetailID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID WHERE soh.SalesOrderID = @v
If you run this against AdventureWorks2008R2 you’ll get two distinct, but identical, execution plans:
You can see that they look identical, but how do I know they’re distinct? If you run this query:
SELECT deqs.creation_time, deqs.query_hash, deqs.query_plan_hash FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest WHERE dest.text LIKE '%SELECT soh.SalesOrderID%'
You’ll get this back as a result:
creation_time query_hash query_plan_hash
2010-09-23 18:18:09.347 0x8D0FB9D524B8DD4D 0x13707445560737BA
2010-09-23 18:18:16.223 0x8D0FB9D524B8DD4D 0x13707445560737BA
Two distinct queries, but with identical hash values, so the plans generated are nearly the same, but clearly different, due to the fact that they were built with different parameters, including the monster name. So, how to see if there is a difference in the plan generated that could affect performance? How about the execution plan properties. First, the property sheet for the SELECT operator for the query with the long parameter name:
Of particular note is the Cache Plan Size. Let’s compare it to the same property sheet for the small parameter name:
If you compare the two, you’ll see that they’re the same. In fact, if you look at almost all the values, you’ll see that the Compile CPU, Compile Memory, and Compile Time are all identical. Based on all this information, I have to conclude that no, the size of the name of the parameter doesn’t affect performance, positively or negatively. But why?
I’m actually not 100% sure, but based on some things I know, here’s what I think. The Algebrizer within the Query Optimizer breaks down all the objects referred to within a query plan. It assigns them all values and identifiers for that plan, part of getting the information together to feed the plan into the mathematical part of the Optimizer. I’ll bet it just assigns values to parameters that are the same type of value, if not the same exact value, and that value is the same size from one execution plan to the next.
This means that you don’t save memory by assigning parameters @a, @b, @c when in fact you mean @ReferenceCount, @MaxRetries, @BeginDate. Do I think you should put in parameters of the silly length I put in before? No, of course, not, because it makes the TSQL code less clear. But, so does putting in equally silly, short, parameter names.
Don’t make your TSQL code hard to read. It doesn’t help performance.
I’ve wracked my brain for some bit of puzzle that I could present as part of TSQL Tuesday #2 and I finally came up with a bit of something.
When you’re looking at an execution plan for a query, you know that this represents SQL Servers best attempt at a good execution plan. But, it may not represent the best possible plan. Or, it could be the only possible plan. The puzzle is, how do you know what you’re looking at? Is this a trivial plan, meaning it’s the only possible execution method for the query? Is this plan fully optimized, or did the optimizer go through it’s prescribed cycles and simply take the best plan it had generated up to that point?
These questions can be answered directly from information available to you in the execution plan. Let’s take two queries, one incredibly simple and one mildly complex:
SELECT a.StateProvinceID FROM Person.Address AS a WHERE a.StateProvinceID = 42 SELECT soh.AccountNumber ,sod.LineTotal FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID WHERE sod.SalesOrderID = 43933
When you run the first query the execution plan looks like this:
It’s clearly a simple plan for a simple query, but what did the optimizer do?
Let’s take a quick look at the next query and it’s execution plan:
A slightly more complicated plan, but again, the question is, what did the optimizer do? Was it able to do a complete and full optimization or did it timeout? How can you tell?
Tune in next week…
Kidding. It’s pretty easy. Take a look at the SELECT operator in both queries. Right click on that operator and open up the Properties window. Here you’ll get a lot of interesting information about the execution plan itself, how long it took, the plan hash & query hash, how much memory it used and, most interesting for this little mystery, the Optimization Level and, depending on that level, the Reason for Early Termination. Take a look at the properties for the first query:
As you can see, the Optimization Level for the first query is TRIVIAL. This means it’s a Trivial plan, no other possible plan could be created for it. Notice also that there is a Parameter List, consisting of one parameter, @1, despite the fact that no parameters were used. This is evidence of simple parameterization. Yet more information that can be found in the Properties.
If we then look at the second query’s properties:
The Optimization Level in this case is FULL, meaning that it went through a complete cycle within the query optimizer. Looking down, below the Query Hash & Query Plan Hash you can see the Reason For Early Termination Of Statement Optimization, slightly truncated. This shows that a “Good Enough Plan Found.” Which means that the optimizer feels that it’s done everything for this that it thinks it can. Generally, this is a good sign. The other thing you’ll usually see here is Timeout, meaning that the optimizer gave up and used the best plan it had at the moment. That’s generally a bad sign and an indication that you may have tuning opportunities in the query. However, neither of these are hard and fast rules. You can get a full optimization with a good enough plan and still have a pretty stinky query plan. This is just another bread crumb on the trail, another piece of information that you can use to figure out if you’ve got a good execution plan or not.
Another frequently heard story is that stored procedures get and reuse execution plans, but ad hoc queries do not. A lot of people believe this and say as much online and in the real world. Unlike my last myth, this time, I’m going to give you the DBA answer to this question. It depends.
There are ad hoc queries and there are ad hoc queries. The classic ad hoc query looks like this:
DECLARE @sql NVARCHAR(MAX), @value int; SET @value = 43668; SET @sql = 'SELECT * FROM Sales.SalesOrderHeader AS soh '; SET @sql = @sql + 'JOIN Sales.SalesOrderDetail AS sod '; SET @sql = @sql + 'ON soh.SalesOrderID = sod.SalesOrderID '; SET @sql = @sql + 'WHERE soh.SalesOrderID = ' + CAST(@value AS NVARCHAR); EXEC (@sql);
And as ad hoc TSQL goes, that one is actually some what clean. They get a lot worse. But, in this case, each and every time the @value variable is changed, you’re going to get a different execution plan. That’s because the full text of the query is used to determine if the existing plan will work for the new query and changing the @value variable will result in a change to that text.
However, there is another way to build and execute ad hoc TSQL. Done correctly, it will work exactly the same way as stored procedures, including plan reuse. This method is known as a parameterized query. There ways to do this through OLEDB & ODBC and there’s a way to do it through TSQL, sp_executesql. Instead of the query above, let’s examine this query:
DECLARE @sql NVARCHAR(MAX), @value INT, @parm NVARCHAR(MAX); SET @value = 43668 SET @sql = 'SELECT * FROM Sales.SalesOrderHeader AS soh '; SET @sql = @sql + 'JOIN Sales.SalesOrderDetail AS sod '; SET @sql = @sql + 'ON soh.SalesOrderID = sod.SalesOrderID '; SET @sql = @sql + 'WHERE soh.SalesOrderID = @internalvalue' SET @parm = '@internalvalue int' EXEC sp_executesql @sql, @parm, @internalvalue = @value;
Using sp_executesql you can define parameters and put them into the query. Now, when this query is called again, you’ll see that the execution plan gets reused. As part of the comparison, I’ll also create this procedure:
CREATE PROCEDURE dbo.AdHocTest (@value INT) AS SELECT * FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID WHERE soh.SalesOrderID = @value;
To test these queries, we’ll need to capture some trace events. Here are the ones I’ll capture:
When all three queries are run, changing the parameter passed to @value, here is the resulting trace event output:
From the top, I ran each query in order as listed here in the article, changing the value passed once each time I called query. You first see a cache miss for the pure ad hoc query. It inserts to the cache and finishes the batch. Then, the next ad hoc query has a cache miss, a cache insert, and completes it’s batch. Proving that ad hoc queries with only a small change will restult in multiple execution plans.
Then we move on to the sp_executesql queries. They start out the same way, with a cache miss and an insert before the batch is completed. Note, the cache insert is for a different set of code than is immediately visible based on what we’ve provided. The TextData column starts with “(@internalvalue int)…” and procedes to show the rest of the query. This is the use of sp_executesql at work. It’s creating a parameterized query, effectively the same thing as a stored procedure. Then, with the second call to sp_executesql, we see a cache miss followed by a cache hit. This is the classic pattern for a successful call to the plan cache.
If you don’t trust that, the last two batches are the stored procedure. You can see that the first call to the procedure results in the same cache miss followed by a cache insert that every other first call had. The second call to the stored procedure had a cache miss followed by a cache hit. The same as the second, ad hoc sql, call to sp_executesql.
That, my friends, is plan reuse in action from an ad hoc query. Myth 1/2 Busted. Or maybe: Myth “It Depends” Busted.
I keep thinking I’ve got a handle on the way execution plans are dealt with in SQL Server. I had a pretty simplistic view of things, there’s the estimated plan which comes out of the optimizer and there’s the actual plan which comes out of the data engine. The one shows the best guess for what will happen based on the query and the underlying statistics. The other shows what actually happened based on the query and the real data, load on the server, available memory, you get the idea. This model is easy to explain and understand. Too bad it’s not quite the way things work.
If you query the dynamic management function sys.dm_exec_query_plan, you can see a query plan. Once you drill down on the XML, or browse through the GUI, you’ll see that this is an estimated plan, with no execution statistics at all. The safe assumption here is that SQL Server takes the execution plan created by the optimizer and stores it. Ah, but is that exactly how it works? Can you generate an estimated execution plan for this batch statement:
CREATE TABLE dbo.TempTable (id INT IDENTITY(1, 1)
,val VARCHAR(50) ) ;
INSERT INTO dbo.TempTable
FROM dbo.TempTable AS tt ;
DROP TABLE dbo.TempTable ;
No. You can’t. If you try to generate an estimated plan you’ll get the error “Invalid object name ‘dbo.TempTable’.” This is because the algebrizer, a part of the process of the optimizer, which is what generates the estimated plan, can’t resolve the table dbo.TempTable because it doesn’t exist yet. However, if you execute the batch above and then run this query:
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
WHERE dest.text LIKE ‘create table%’
You can see the estimated plan for executing this batch. So… if the optimizer creates estimated plans, but it can’t create an estimated plan for this query since the algebrizer can’t resolve the object name… where did this estimated plan come from? Good question.
Here’s some more interesting info. You know that you can capture both estimated & actual execution plans through trace events, right? Watch what happens if you run the batch statement above and capture both the “Showplan XML” and “Showplan XML Statistics Profile” events, estimated and actual, respectively. You’ll get the estimated plan that is being stored in sys.dm_exec_query_plan and you’ll get an actual execution plan including actual rows vs. estimated, etc. Where did that execution plan come from? Another good question.
A better man than I, Tibor Karaszi, first noticed that there was, other than the execution statistics, no difference between “XML Showplan” and “XML Showplan Statistics Profile.” He asked whether or not we should ever expect to see a difference in a forum post. I jumped in with both feet, of course you’ll see a difference. After all, one is the estimated plan and one is the actual plan. Mr. Karaszi corrected a couple of my assumptions and suddenly, there I was, not understanding execution plans, again. I didn’t know where those estimated plans were coming from, not for sure. I assumed the optimizer, but then how were they getting past the algebrizer?
I had a theory and a mechanism for testing it. If I set up a trace and capture the recompile events using SQL:StmtRecompile and return the EventSubClass column, I can see what caused a recompile. When I run this query and check the trace information, I find that the EventSubClass resolves to 3-Deferred Compile. I’m not 100% certain that I’m correct here, but the following is what I think occurrs.
Because of the CREATE statement, the batch is going to fail in the optimizer. But, because it’s a DDL statement, it goes on to execute anyway. Then, a recompile puts the batch back through the optimizer. There, a table is now available where none was previously. An estimated execution plan is created. This plan, previously non-existent gets stored and can be accessed by us.
Within the facts available, this seems to explain the behavior. In fact, it’s basically supported by Connor Cunningham, one of the MS SQL Server developers, in some blog posts over at Conner vs. SQL.
A view is simply a query that behaves something like a table. Most people know this. Most people also know that a view is simply a mask on top of what might be a very complex query. It all seems really simple. You call the view inside a simple query, the view runs the underlying complex query. Most people might not know that when a view is called and it gets sent to the optimizer, the optimizer unpacks the view and binds the component parts of the query necessary to create an execution plan that will return the data requested. What I didn’t know until recently was that the optimizer is VERY smart. Not only does it unpack the query of the view, but it will change the query that the view uses. Let’s take an example (using AdventureWorks2008):
CREATE VIEW dbo.vTest AS SELECT soh.AccountNumber ,sod.LineTotal ,(SELECT COUNT(pcc.BusinessEntityId) FROM Sales.PersonCreditCard AS pcc WHERE pcc.CreditCardID = soh.CreditCardID ) AS PersonCreditCard FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID ; GO
Whether or not this query is a good one (I’d say it isn’t) and whatever you might think about it, the view definition here is pretty straight forward. If you run a SELECT against this view and take a look at the execution plan, it looks like this:
As you can see, the behavior is completely as defined above. The view has been resolved out to it’s component parts in order to build the query and arrive at the data requested. What happens if, instead of selecting all the columns, only a couple are selected? Well, the query that defines the view is not changing, so the optimizer will arrive at the same execution plan as that shown in Figure 1, right? Run this query and get the execution plan:
SELECT vt.AccountNumber ,vt.LineTotal FROM dbo.vTest AS vt;
What the heck just happened?
I told you, the optimizer is smart. Compare that execution plan with the one generated from this query:
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;
You’ll find that they’re identical. As stated before, views are just a query stored on the side. The optimizer is smart enough to figure out that it only needs to generate a plan as if the query requested was this. But what happens when you start nesting views? Yes, yes, Yeah… I… Ok… But… YES! That’s not a good idea. So? When has that slowed people down for a New York second? I’ve seen views nested three and four layers deep. I’ve seen table valued user defined functions nested six and seven layers deep. The question is, how well does the optimizer handle something like this? This script sets up a new situation:
CREATE VIEW dbo.vSalesInfo
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;
CREATE VIEW dbo.vTest3
FROM Sales.PersonCreditCard AS pcc
WHERE pcc.CreditCardID = vsi.CreditCardID) AS PersonCreditCard
FROM Purchasing.Vendor v) AS VendorCount
FROM Person.Address AS a) AS AddressCount
FROM Person.BusinessEntity be) AS BusinessEntityCount
FROM dbo.vSalesInfo AS vsi
From the script you can see that I’ve nested one view inside another and complicated the outer a view a bit. What happens when you run the query that would eliminate tables from the view this time?
FROM dbo.vTest3 AS vt
See. The optimizer is very smart. Ah, but before you get all comfy and crack a bottle of some frothy beverage, we need to look at the compile times. Compare the STATISTICS TIME output from a query that pulls from the full view, the limited pull from the view and the pull from the tables:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server parse and compile time:
CPU time = 8 ms, elapsed time = 8 ms.
These are small views and easy queries, so the compile times are not large. But with one consistently 30% larger than the others in terms of execution time and actually requiring CPU cycles when the others require less, you begin to see that, despite the extreme intelligence of the optimizer, there is a cost. TANSTAAFL still applies. Keep all this in mind if you think that nested views are a nifty way to write your code or you’re trying to convince someone that nested views are not a nifty way to write their code.
Peter Ward, the editor at SQL Server Performance, has published an article of mine on Breaking Down Complex Execution Plans. I go way beyond the blog entry below and show how the estimated costs in execution plans can mess you up, how to use the XML in execution plans to search through them for costly operations or operations that have mismatched estimated rows & actual rows and some other tips and tricks. Hopefully it’s worth a read.
One little potential for performance problems that we’ve seen comes out of how nHibernate generates it’s parameratized queries. It limits the length of any parameter to the length of the column, but if the length of that parameter is less than the column, it uses tha smaller length when declaring the variable. This results in a query that looks like this:
exec sp_executesql N'INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)',N'@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(8)',@p0=N'Ted Cool',@p1=N'abc123',@p2=Nfirstname.lastname@example.org',@p3='2008-04-29 14:10:44:000',@p4=N'ted_cool'
Note the parameter @p4 which is the parameter mapping to the primary key for the little sample table. In this query it’s declared as nvarchar( 8 ) because ‘ted_cool’ is composed of eight characters. But if we changed it to ‘ted_coolish’:
exec sp_executesql N'INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)',N'@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(11)',@p0=N'Ted Cool',@p1=N'abc123',@p2=Nemail@example.com',@p3='2008-04-29 14:13:30:000',@p4=N'ted_coolish'
Now that same parameter is declared as nvarchar(11). So if we look at the procedure cache to see what’s inside for this query:
OUTER APPLY sys.dm_exec_sql_text(p.plan_handle)
‘%INSERT INTO dbo.users%’
We get the following results (I ran the test a few times, so I have more than two rows):
Prepared 40960 (@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(11))INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)
Prepared 40960 (@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(10))INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)
Prepared 40960 (@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(12))INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)
Prepared 40960 (@p0 nvarchar(8),@p1 nvarchar(6),@p2 nvarchar(12),@p3 datetime,@p4 nvarchar(8))INSERT INTO dbo.users (Name, Password, EmailAddress, LastLogon, LogonId) VALUES (@p0, @p1, @p2, @p3, @p4)
For what should have been a single plan, I have four (or more) clogging up the cache and causing unecessary compiles, etc. I’m not completely enamored with what this tool is going to do tofor me as a dba.