Bad Performance Tip

March 4, 2009 at 2:23 pm (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , )


I saw a performance tip that just didn’t make any sense to me:

In cases where you are using the IN clause, try to order the list of values so that the most frequently found values are placed first.

That just didn’t make any sense to me. The IN clause is not like EXISTS where the query will stop as soon as it finds a good match. So I set up a test with AdventureWorks. I found a few different ProductId values, the highest, the lowest and a few in between and ran a very simple query to test this tip:

/*
ProductID        RowCount
870                        4688 rows
877                        1327 rows
972                        380 rows
823                        148 rows
723                        52 rows
897                        2 rows*/

DBCC FReeproccache()
DBCC dropcleanbuffers()
GO
SELECT  sod.ProductID
       ,sod.SalesOrderDetailID
FROM    Sales.SalesOrderDetail AS sod
WHERE   sod.ProductID IN (870, 877, 972, 723, 897)
GO
DBCC FReeproccache()
DBCC dropcleanbuffers()
GO
SELECT  sod.ProductID
       ,sod.SalesOrderDetailID
FROM    Sales.SalesOrderDetail AS sod
WHERE   sod.ProductID IN (897, 723, 972, 877, 870)
GO
DBCC FReeproccache()
DBCC dropcleanbuffers()
GO
SELECT  sod.ProductID
       ,sod.SalesOrderDetailID
FROM    Sales.SalesOrderDetail AS sod
WHERE   sod.ProductID IN (972, 870, 877, 897, 723)

It resulted in three identical execution plans, every one of them a nonclustered index seek. In the three tests, the most frequently found values are first, last & mixed. All three queries had I/O that looked like this:

(6449 row(s) affected)
Table ‘SalesOrderDetail’. Scan count 5, logical reads 26, physical reads 7, read-ahead reads 37

And the execution times were:

Query     Compile     Run
1           45ms        47ms
2           14ms        28ms
3           4ms          30ms

I then ran all three again, in reverse order:

Query     Compile     Run
3           34ms        52ms
2           25ms        46ms
1           5ms          25ms

The times scattered around the same values, but were essentially the same. No change in the order of the data affected the query positively or negatively. So where does this tip come from?

If people post a simple list of tips but don’t explain what they mean, show some sample code or in any way attempt to educate you as to why you should do X in your queries, I would not trust it.

7 Comments

  1. Jeremiah Peschka said,

    Good catch, Grant. If I recall correctly, an IN clause will be expanded by SQL Server into a bunch of OR comparisons wrapped inside a set of parens. So, EmployeeId IN (1, 2) will expand to (EmployeeId = 1 OR EmployeeId = 2).

  2. scarydba said,

    It’s just going to search through the entire data set for all the values, no matter what the order is. There were a bunch of tips on this blog post that didn’t make much sense and a few of them were wrong. I don’t feel good posting the blog in public though.

  3. Gail said,

    Jeremiah: Yup, it is, and if you look at the indx seek/index scan operator in the exec plan, that’s exactly what’s shown (in all cases I’ve seen). So the IN is expanded to ORs before the optimiser even sees the query.

    Grant: No, but Google knows many things if you ask nicely…

  4. dd said,

    Have you tried:

    SELECT sod.ProductID
    ,sod.SalesOrderDetailID
    FROM Sales.SalesOrderDetail AS sod
    WHERE sod.ProductID IN (1, 2, 3, 4, 5, 6, 7, 8,………870, 877, 972, 723, 897)

    Maybe for a trivial number of IN parameters (5) it may not matter I think? That applies to OR also, I think.

  5. scarydba said,

    No I haven’t and it might make a difference. I suspect at that point you’d be getting a table scan and wouldn’t see any difference then either.

  6. Adam Machanic said,

    So who posted the original “tip?”

  7. scarydba said,

    That’s nda.

    Kidding.

    It was on a list of tips that had been swiped from another source. On the way the tips had been edited so that some of them didn’t say the same things any more or they were placed out of context. I alerted the original author regarding his “borrowed” material, but I don’t want to name any names. As Gail says, a little googling and you can find it even though I edited the original statement. The true author does not stand by this tip. I don’t know if the pilferer does or not.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: