How Statistics are used for Query Plan generation

Statistics (data distribution statistics on columns/indexes) will be used by Query Optimizer in determining the appropriate choices for operators within query plan.

Inaccurate, outdated statistics may lead to poor quality of Query Execution Plan and there by could lead to performance issues.

Statistics comprise of two kinds of details.

  • Density vector
  • Histogram

https://msdn.microsoft.com/en-IN/library/ms174384.aspx

http://www.sqlpassion.at/archive/2014/01/28/inside-the-statistics-histogram-density-vector/

Query plans that are generated using Histogram information are generally considered to be efficient compared to plans that are generated based on density vector only.

Below aspects will drive how statistics are used by SQL Server optimizer in generating a query plan.

Literals

Ex: SELECT SalesOrderDetailID, OrderQty FROM Sales.SalesOrderDetail WHERE ProductID = 897;

Constants or actual values used directly in the query as a filter provides for better cardinality estimates.

Usually, this will result in efficient query plan as optimizer has knowledge of specific column filters before run time as histogram part of statistics will be used by Query Optimizer to estimate the number of qualified rows.

Parameters

Ex: EXEC Get_OrderID_OrderQty @ProductID=897;

Using stored procedures and passing filter values as parameters to stored procedure has some advantages.

  • Query plan generated for the first invocation of stored procedure will be re-used for subsequent executions.
  • This will reduce the CPU consumption as no need to compile the code and generate execution plan for each execution of stored procedure.
  • Histogram part of the statistics will be used for deriving carnality estimates and hence provides for best possible cardinality estimates.

It has few disadvantages as well.

Query plan that was generated for the first invocation of stored procedure may not be efficient for all subsequent executions particularly if there is data distribution skew among column values.

Also, histogram part of the statistics will be not be re-checked for subsequent executions of stored procedure as query plan is re-used.

A work-around for this would be to use either statement level or stored procedure level RECOMPILE hint which will result in generation of a fresh plan considering the parameter values passed for the specific execution.

From the point of view of query optimization, parameter sniffing is a good thing but it can negatively impact performance in few scenarios.

More details on Parameter Sniffing concepts at http://www.brentozar.com/archive/2013/06/the-elephant-and-the-mouse-or-parameter-sniffing-in-sql-server/

http://www.benjaminnevarez.com/2010/06/the-parameter-sniffing-problem/

Local variables

Ex: DECLARE @ProductID INT=897
SELECT SalesOrderDetailID, OrderQty FROM Sales.SalesOrderDetail WHERE ProductID = @ProductID;

 

A traditional way to avoid the parameter sniffing problem, especially in previous versions of SQL Server, was by using local variables. When you use local variables SQL Server is not able to use the histogram anymore. Instead it uses the information on the density vector of the statistics object. OPTIMIZE FOR UNKNOWN query hint works pretty much in the same way.

Values for local variables are not known at optimization time so the Query Optimizer is not able to use the actual value passed for the optimization, as it did earlier in case of literals or parameters.

One benefit of using local variables or OPTIMIZE FOR UNKNOWN hint is that you always get the same execution plan.

More details at http://www.brentozar.com/archive/2014/06/tuning-stored-procedures-local-variables-problems/

http://www.benjaminnevarez.com/2010/06/how-optimize-for-unknown-works/