Why does SQL Server use a better execution plan when I inline the variable?

Rainbolt 05/15/2018. 3 answers, 1.867 views
sql-server execution-plan

I have a SQL query that I am trying to optimize:

DECLARE @Id UNIQUEIDENTIFIER = 'cec094e5-b312-4b13-997a-c91a8c662962'

SELECT 
  Id,
  MIN(SomeTimestamp),
  MAX(SomeInt)
FROM dbo.MyTable
WHERE Id = @Id
  AND SomeBit = 1
GROUP BY Id

MyTable has two indexes:

CREATE NONCLUSTERED INDEX IX_MyTable_SomeTimestamp_Includes
ON dbo.MyTable (SomeTimestamp ASC)
INCLUDE(Id, SomeInt)

CREATE NONCLUSTERED INDEX IX_MyTable_Id_SomeBit_Includes
ON dbo.MyTable (Id, SomeBit)
INCLUDE (TotallyUnrelatedTimestamp)

When I execute the query exactly as written above, SQL Server scans the first index, resulting in 189,703 logical reads and a 2-3 second duration.

When I inline the @Id variable and execute the query again, SQL Server seeks the second index, resulting in only 104 logical reads and a 0.001 second duration (basically instant).

I need the variable, but I want SQL to use the good plan. As a temporary solution I put an index hint on the query, and the query is basically instant. However, I try to stay away from index hints when possible. I usually assume that if the query optimizer is unable to do its job, then there is something I can do (or stop doing) to help it without explicitly telling it what to do.

So, why does SQL Server come up with a better plan when I inline the variable?

3 Answers


sp_BlitzErik 05/16/2018 at 06:21.

In SQL Server, there are three common types of non-join predicates:

Literals:

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Reputation = 1;

Parameters:

CREATE PROCEDURE dbo.SomeProc(@Reputation INT)
AS
BEGIN
    SELECT COUNT(*) AS records
    FROM   dbo.Users AS u
    WHERE  u.Reputation = @Reputation;
END;

Local variables:

DECLARE @Reputation INT = 1

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Reputation = @Reputation;

When you use a literal value, and your plan isn't a) Trivial and b) Simple Parameterized or c) you don't have Forced Parameterization turned on, the optimizer creates a very special plan just for that value.

When you use a parameter, the optimizer will create a plan for that parameter (this is called parameter sniffing), and then reuse that plan, absent recompile hints, plan cache eviction, etc.

When you use a local variable, the optimizer makes a plan for... Something.

If you were to run this query:

DECLARE @Reputation INT = 1

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Reputation = @Reputation;

The plan would look like this:

NUTS

And the estimated number of rows for that local variable would look like this:

NUTS

Even though the query returns a count of 4,744,427.

Local variables, being unknown, don't use the 'good' part of the histogram for cardinality estimation. They use a guess based on the density vector.

NUTS

SELECT 5.280389E-05 * 7250739 AS [poo]

That'll give you 382.86722457471, which is the guess the optimizer makes.

These unknown guesses are usually very bad guesses, and can often lead to bad plans and bad index choices.

Your options are:

  • Brittle index hints
  • Potentially expensive recompile hints
  • Parameterized dynamic SQL
  • A stored procedure
  • Improve the current index

Improving the current index means extending it to cover all the columns needed by the query:

CREATE NONCLUSTERED INDEX IX_MyTable_Id_SomeBit_Includes
ON dbo.MyTable (Id, SomeBit)
INCLUDE (TotallyUnrelatedTimestamp, SomeTimestamp, SomeInt)
WITH (DROP_EXISTING = ON);

Assuming that Id values are reasonably selective, this will give you a good plan, and help the optimizer by giving it an 'obvious' data access method.

You can read more about parameter embedding here:


Joe Obbish 05/15/2018.

I'm going to assume that you have skewed data, that you don't want to use query hints to force the optimizer what to do, and that you need to get good performance for all possible input values of @Id. You can get a query plan guaranteed to require just a few handfuls of logical reads for any possible input value if you're willing to create the following pair of indexes (or their equivalent):

CREATE INDEX GetMinSomeTimestamp ON dbo.MyTable (Id, SomeTimestamp) WHERE SomeBit = 1;
CREATE INDEX GetMaxSomeInt ON dbo.MyTable (Id, SomeInt) WHERE SomeBit = 1;

Below is my test data. I put 13 M rows into the table and made half of them have a value of '3A35EA17-CE7E-4637-8319-4C517B6E48CA' for the Id column.

DROP TABLE IF EXISTS dbo.MyTable;

CREATE TABLE dbo.MyTable (
    Id uniqueidentifier,
    SomeTimestamp DATETIME2,
    SomeInt INT,
    SomeBit BIT,
    FILLER VARCHAR(100)
);

INSERT INTO dbo.MyTable WITH (TABLOCK)
SELECT NEWID(), CURRENT_TIMESTAMP, 0, 1, REPLICATE('Z', 100)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

INSERT INTO dbo.MyTable WITH (TABLOCK)
SELECT '3A35EA17-CE7E-4637-8319-4C517B6E48CA', CURRENT_TIMESTAMP, 0, 1, REPLICATE('Z', 100)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

This query might look a little strange at first:

DECLARE @Id UNIQUEIDENTIFIER = '3A35EA17-CE7E-4637-8319-4C517B6E48CA'

SELECT
  @Id,
  st.SomeTimestamp,
  si.SomeInt
FROM (
    SELECT TOP (1) SomeInt, Id
    FROM dbo.MyTable
    WHERE Id = @Id
    AND SomeBit = 1
    ORDER BY SomeInt DESC
) si
CROSS JOIN (
    SELECT TOP (1) SomeTimestamp, Id
    FROM dbo.MyTable
    WHERE Id = @Id
    AND SomeBit = 1
    ORDER BY SomeTimestamp ASC
) st;

It's designed to take advantage of the ordering of the indexes to find the min or max value with a few logical reads. The CROSS JOIN is there to get correct results when there aren't any matching rows for the @Id value. Even if I filter on the most popular value in the table (matching 6.5 million rows) I only get 8 logical reads:

Table 'MyTable'. Scan count 2, logical reads 8

Here's the query plan:

enter image description here

Both index seeks find 0 or 1 rows. It's extremely efficient, but creating two indexes might be overkill for your scenario. You could consider the following index instead:

CREATE INDEX CoveringIndex ON dbo.MyTable (Id) INCLUDE (SomeTimestamp, SomeInt) WHERE SomeBit = 1;

Now the query plan for the original query (with an optional MAXDOP 1 hint) looks a bit different:

enter image description here

The key lookups are no longer necessary. With a better access path that should work well for all inputs you shouldn't have to worry about the optimizer picking the wrong query plan due to the density vector. However, this query and index won't be as efficient as the other one if you seek on a popular @Id value.

Table 'MyTable'. Scan count 1, logical reads 33757


Jon of All Trades 05/15/2018.

I can't answer why here, but the quick-and-dirty way to ensure that the query runs the way you want it is:

DECLARE @Id UNIQUEIDENTIFIER = 'cec094e5-b312-4b13-997a-c91a8c662962'
SELECT 
  Id,
  MIN(SomeTimestamp),
  MAX(SomeInt)
FROM dbo.MyTable WITH (INDEX(IX_MyTable_Id_SomeBit_Includes))
WHERE Id = @Id
  AND SomeBit = 1
GROUP BY Id

This incurs a risk that the table or indices may change in the future such that this optimization becomes dysfunctional, but it's available if you need it. Hopefully someone can offer you a root cause answer, as you requested, rather than this workaround.

Related questions

Hot questions

Language

Popular Tags