Unpacking the View

April 24, 2009 at 9:16 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

A view is simply a query that behaves something like a table. Most people know this. Most people also know that a view is simply a mask on top of what might be a very complex query. It all seems really simple. You call the view inside a simple query, the view runs the underlying complex query. Most people might not know that when a view is called and it gets sent to the optimizer, the optimizer unpacks the view and binds the component parts of the query necessary to create an execution plan that will return the data requested. What I didn’t know until recently was that the optimizer is VERY smart. Not only does it unpack the query of the view, but it will change the query that the view uses. Let’s take an example (using AdventureWorks2008):


CREATE VIEW dbo.vTest
AS
 SELECT soh.AccountNumber
,sod.LineTotal
,(SELECT COUNT(pcc.BusinessEntityId)
FROM Sales.PersonCreditCard AS pcc
WHERE pcc.CreditCardID = soh.CreditCardID
) AS PersonCreditCard
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;
GO

Whether or not this query is a good one (I’d say it isn’t) and whatever you might think about it, the view definition here is pretty straight forward. If you run a SELECT against this view and take a look at the execution plan, it looks like this:

vtestexecutionplan4

As you can see, the behavior is completely as defined above. The view has been resolved out to it’s component parts in order to build the query and arrive at the data requested. What happens if, instead of selecting all the columns, only a couple are selected? Well, the query that defines the view is not changing, so the optimizer will arrive at the same execution plan as that shown in Figure 1, right? Run this query and get the execution plan:

SELECT
 vt.AccountNumber
,vt.LineTotal
FROM
 dbo.vTest AS vt;

2columexecutionplan

What the heck just happened?

I told you, the optimizer is smart. Compare that execution plan with the one generated from this query:

 SELECT soh.AccountNumber
,sod.LineTotal
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;

You’ll find that they’re identical. As stated before, views are just a query stored on the side. The optimizer is smart enough to figure out that it only needs to generate a plan as if the query requested was this. But what happens when you start nesting views? Yes, yes, Yeah… I… Ok… But… YES! That’s not a good idea. So? When has that slowed people down for a New York second? I’ve seen views nested three and four layers deep. I’ve seen table valued user defined functions nested six and seven layers deep. The question is, how well does the optimizer handle something like this? This script sets up a new situation:

CREATE VIEW dbo.vSalesInfo
AS
SELECT soh.AccountNumber
,sod.LineTotal
,soh.CreditCardID
,sod.ProductID
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS  sod
ON soh.SalesOrderID = sod.SalesOrderID ;

CREATE VIEW dbo.vTest3
AS
SELECT vsi.AccountNumber
,vsi.LineTotal,
(
SELECT COUNT(pcc.BusinessEntityId)
FROM Sales.PersonCreditCard AS pcc
WHERE pcc.CreditCardID = vsi.CreditCardID) AS PersonCreditCard
,(SELECT COUNT(v.BusinessEntityId)
FROM Purchasing.Vendor v) AS VendorCount
,(SELECT COUNT(a.AddressId)
FROM Person.Address AS a) AS AddressCount
,(SELECT COUNT(be.BusinessEntityID)
FROM Person.BusinessEntity be) AS BusinessEntityCount
FROM dbo.vSalesInfo AS vsi

From the script you can see that I’ve nested one view inside another and complicated the outer a view a bit. What happens when you run the query that would eliminate tables from the view this time?

 SELECT vt.AccountNumber
,vt.LineTotal
FROM dbo.vTest3 AS vt

 3rdtest

See. The optimizer is very smart. Ah, but before you get all comfy and crack a bottle of some frothy beverage, we need to look at the compile times. Compare the STATISTICS TIME output from a query that pulls from the full view, the limited pull from the view and the pull from the tables:

–View 1
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

–Table
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

–View 2
SQL Server parse and compile time:
CPU time = 8 ms, elapsed time = 8 ms.

These are small views and easy queries, so the compile times are not large. But with one consistently 30% larger than the others in terms of execution time and actually requiring CPU cycles when the others require less, you begin to see that, despite the extreme intelligence of the optimizer, there is a cost. TANSTAAFL still applies. Keep all this in mind if you think that nested views are a nifty way to write your code or you’re trying to convince someone that nested views are not a nifty way to write their code.

Permalink 12 Comments

View vs. Table Valued Function vs. Multi-Statement Table Valued Function

August 13, 2008 at 8:36 am (TSQL) (, , , , , )

About five years ago, I was checking an app before it went to production. I hadn’t seen the app before then and a junior dba had worked with the developers designing and building the app. It didn’t use a single stored procedure or view. Instead, it was built entirely of multi-statement UDF’s. These UDF’s called other UDF’s which joined to UDF’s… It was actually a very beautiful design in terms of using the functions more or less like objects within the database. Amazing. It also would not, and could not, perform enough to function, let alone scale. It was a horror because they thought they were done and ready to go to production, but no one had ever tested more than a couple of rows of data in any of the tables. Of course, a couple of rows of data worked just fine. It was when we put in 10, 1000, a few million, that the thing came to a total and complete halt. We spent weeks arguing about the stupid thing. The developers instisted that since it was “possible” to do what they did, that, in fact, it was OK to do what they did.

Anyway, with the help of a Microsoft consultant, we finally cleaned up the app and got it on it’s feet. Ever since then, I’ve preached the dangers of the multi-statement table valued function. The thing to remember is, there are no statistics generated for these things. That means the optimizer thinks they return a single row of data. When they do only return a few rows, everything is fine. When they return even as little as a hundred rows, like the example I’m posting below, they stink.

Anyway, I boiled up this silly example because some developer accused me and several other DBA’s of spreading Fear, Undertainty, and Doubt because we suggested that the multi-statement UDF is something to avoid if possible. Actually, he pretty all but stated that we didn’t know what we were talking about. I was peeved. Hence this example. Feel free to check it out. Oh, and if you check the execution plans, note that the multi-statement UDF is marked as the least costly even though it actually performs twice as slow as the others. One more example of execution plans being wrong.

Here are the time results from one run of the view & UDF’s:

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.

And the code to test for yourself:

CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)

CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)

DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
END

CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId

CREATE FUNCTION dbo.SingleUDF ()
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
)

CREATE Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
RETURN
END

set statistics time on
select * from vJoin
select * from SingleUDF()
select * from MultiUDF()
set statistics time off

Permalink 6 Comments