View vs. Table Valued Function vs. Multi-Statement Table Valued Function

August 13, 2008 at 8:36 am (TSQL) (, , , , , )

About five years ago, I was checking an app before it went to production. I hadn’t seen the app before then and a junior dba had worked with the developers designing and building the app. It didn’t use a single stored procedure or view. Instead, it was built entirely of multi-statement UDF’s. These UDF’s called other UDF’s which joined to UDF’s… It was actually a very beautiful design in terms of using the functions more or less like objects within the database. Amazing. It also would not, and could not, perform enough to function, let alone scale. It was a horror because they thought they were done and ready to go to production, but no one had ever tested more than a couple of rows of data in any of the tables. Of course, a couple of rows of data worked just fine. It was when we put in 10, 1000, a few million, that the thing came to a total and complete halt. We spent weeks arguing about the stupid thing. The developers instisted that since it was “possible” to do what they did, that, in fact, it was OK to do what they did.

Anyway, with the help of a Microsoft consultant, we finally cleaned up the app and got it on it’s feet. Ever since then, I’ve preached the dangers of the multi-statement table valued function. The thing to remember is, there are no statistics generated for these things. That means the optimizer thinks they return a single row of data. When they do only return a few rows, everything is fine. When they return even as little as a hundred rows, like the example I’m posting below, they stink.

Anyway, I boiled up this silly example because some developer accused me and several other DBA’s of spreading Fear, Undertainty, and Doubt because we suggested that the multi-statement UDF is something to avoid if possible. Actually, he pretty all but stated that we didn’t know what we were talking about. I was peeved. Hence this example. Feel free to check it out. Oh, and if you check the execution plans, note that the multi-statement UDF is marked as the least costly even though it actually performs twice as slow as the others. One more example of execution plans being wrong.

Here are the time results from one run of the view & UDF’s:

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.

And the code to test for yourself:

CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)

CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)

DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
END

CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId

CREATE FUNCTION dbo.SingleUDF ()
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
)

CREATE Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
RETURN
END

set statistics time on
select * from vJoin
select * from SingleUDF()
select * from MultiUDF()
set statistics time off

Permalink 6 Comments

Did I mention that I love Red Gate’s Data Generator?

April 30, 2008 at 12:33 pm (Tools, TSQL) (, , , )

Because I do. I’m working on a set of tests for an article comparing TOP, MAX & ROW_NUMBER. I have a simple data structure and I need a bunch of data in order to create my tests. I wanted that data to be distributed a certain way, to mimic some production system behavior I’ve seen in the past. Last night I got it all set by mucking about with the seed values of the various columns to get it just right and load up millions of rows in only a few minutes and doing this all on my lap top. Great tool!

Permalink 1 Comment

SQL Data Generator

April 1, 2008 at 12:29 pm (Tools) (, , , , )

I just received word from Rachel Hawley over at RedGate that SQL Data Generator has been released. I’ve been using the beta over the last few months and I’ve found it incredibly useful for setting up tests and playing around with my database designs, seeing how different data loads will be distributed through the indexes, etc. It’s quick to use but fairly powerful and flexible and, frankly, pretty inexpensive. I strongly recommend it.

Permalink 4 Comments

Top vs. Max

March 21, 2008 at 2:56 pm (TSQL) (, , , , , , , , , )

The company I work for has a very well defined need for versioned data. In a lot of instances, we don’t do updates, we do inserts. That means that you have to have mechanisms for storing the data that enables you to pull out the latest version of all the data or a particular version of all the data, or the data at a particular moment in time, regardless of version.

 That means maintaining a version table and a series of inserts into various tables. Some tables will have pretty much a new row for each version, some tables may only have one or two versions out of a chain. With the help of a very smart Microsoft consultant, Bill Sulcius, we have a mechanism that works very well. However, questions about the ultimate tuning of the procedures remain. So we may have a query that looks like this:

SELECT *
FROM dbo.Document d
INNER JOIN dbo.Version v
ON d.DocumentId = v.DocumentId
AND v.VersionId = (SELECT TOP(1) v2.VersionId
                                    FROM dbo.Version v2
                                    WHERE v2.DocumentId = v.DocumentId
                                    ORDER BY v2.DocumentId DESC, v2.VersionId DESC)

There’s a clustered index on the Version table that has DocumentId & VersionId in it. This query works great.

But you can also write the same query to get the same results using MAX or ROW_NUMBER(). What’s more, those work well too, all nice clean clustered index seeks. You can also use CROSS APPLY rather than JOINS. All these appear to work well, but in different circumstances, some work better than others. That makes establishing a nice clean “if it looks like this, do this” pattern for developers to emulate difficult. I’m creating a series a tests to outline as much of the differences as I can. I’ll write it up and submit it all to Chuck over at SQL Server Standard first. If he doesn’t like it, it’s Steve’s. I’ll also post a few tid bits here.

Permalink Leave a Comment