Query Tuning Fundamentals: Density, Predicates, Selectivity, and Cardinality

Density:Density, when used to describe the data in a column, is a measure of how often duplicate values occur in that column. Another way to think of density is as a measure of the uniqueness of the data in a column: high density –> less unique data. Density values range from 0 to 1.0.  There are different (but equivalent) ways to think of density.

Density = 1/[# of distinct values in a column]

Density = Avg. number of duplicates for a given value / Total row count

Q: What is the density of a column that contains only a single value repeated in every row? 

A: 1.0 — the highest possible density.

Q: What is the density of a unique column, such as an IDENTITY or primary key column? 
A: 1/[row count]

Predicate: A predicate is an expression that evaluates to True or False. Predicates appear in a WHERE or JOIN clause in a database query. They are used to either filter out or qualify rows.

Here’s a query with two filter predicates:

SELECT *
FROM table1
WHERE column1 > 20 AND column2 IS NULL

And a query with one join predicate:

SELECT *
FROM table1
INNER JOIN table2 ON table1.column1 = table2.column1

Selectivity: The fraction of rows from the input set of the predicate that satisfy the predicate.

Selectivity is also a measure of uniqueness. High selectivity implies high uniqueness, or a low number of matching values. Low selectivity implies a low uniqueness, or a high percent of matches.

Selectivity is most commonly used to describe a filter predicate.

Selectivity for a filter predicate against a base table can be calculated as “[# rows that pass the predicate]/[# rows in the table]”.

Note that 0.000001 reflects a high selectivity even though the number is small, while 1.0 is low selectivity even though the number is higher.

Cardinality estimate: An estimate of the size of a result set. For example, if a table T has 100,000 rows and a query contains a selection predicate of the form T.a=10, and a histogram shows that the selectivity of T.a=10 is 10%, then the cardinality estimate for the fraction of rows of T that must be considered by the query is 10% * 100,000 = 10,000.

Cardinality can be thought of as the number of rows returned by a query operator.

Each operator in a query plan has an estimated cardinality (the number of rows the optimizer guessed that the operator would return) and an actual cardinality (the number of rows that the operator returned in the real world).

Sometimes the query optimizer cannot accurately predict the number of rows that a given operator will return. This can prevent SQL from estimating the cost of a query plan correctly, which can lead to the selection of a suboptimal plan. Cardinality estimation errors are one of the most common causes of slow query plans in SQL Server, so it is very important to know how to identify cardinality estimation problems in a query plan.

Indexes have density, a measure of how unique the left-based column subsets within them are;

Predicates have selectivity, a measure of what portion of the table they affect;

Operators have Cardinality, a measure of how many rows the operator processes.

References:

http://blogs.msdn.com/b/bartd/archive/2011/01/25/query_5f00_tuning_5f00_key_5f00_terms.aspx

http://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx

http://sqlinthewild.co.za/index.php/2015/10/06/index-selectivity-and-index-scans/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s