Selectivity of a Query and how it affects indexing
Posted Under: Caching, Information, SQL Server 2005, Tuning
Query selectivity is one of the fundamental concepts in query tuning and optimization. All of us know that indexes help RDBMS engine to quickly locate required rows but it does not mean that creating an idex on a specific column and using that column in where clause means it is guaranteed that SQL Server will always use it. Why? The answer lies in the concept of “Query Selectivity” or in other words how many rows are expected to be returned as a result of the intended query.
SQL Server forecasts the expected number of rows to be returned by analyzing the where clause, joins and having clauses and then decides that will it be better to search the index first to locate the desired rows identifier and then retrieves the actual rows from table or just directly scans the table and return all matching rows? The answer of this decision is based on the table statistics which SQL Server records about distribution of data values in a particular column. If it will take more time to go to the index and then retrieving the rows because most of the rows will match the criteria then SQL Server will directly scan the table and return the matching rows.
A Query is “more selective” when it is expected to match with more rows and vice versa.
Following is an example to demonstrate the Query selectivity concept. In this example, we will first create a tes table with two columns i.e. Id, Name. We will then create a non-clustered index on Id column. We will fill the table with 100000 rows and then will run two queries using Id (i.e. indexed column) in our where criteria. Check the query plan of both of them and you will see that Quey 1 will use the index whereas Query 2 will not use it even though both of them are searching on same indexed column.
non-clustered
USE [tempdb] GO /****** Create a test table ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[testTable]( [Id] [bigint] NULL, [Name] [nchar](10) COLLATE Latin1_General_CS_AS NULL ) ON [PRIMARY] /****** Create a Non-clustered index on Id column ******/ CREATE NONCLUSTERED INDEX [IX_TestIndex] ON [dbo].[testTable] ( [Id] ASC ) /****** Fill table with 100000 rows ******/ /* Declare a variable to hold the count of rows to be generated */ Declare @p_NumberOfRows Bigint --We need 100000 rows Select @p_NumberOfRows=100000; With Base As ( Select 1 as n Union All Select n+1 From Base Where n < Ceiling(SQRT(@p_NumberOfRows)) ), Expand As ( Select 1 as C From Base as B1, Base as B2 ), Nums As ( Select Row_Number() OVER(ORDER BY C) As n From Expand ) Insert into testTable Select n,'TestName'+ CAst( (n % 100) as varchar) from Nums Where n<=@p_NumberOfRows /* Remove Maximum Recursion level constraint */ OPTION (MaxRecursion 0); /** Query No.1 Following Query will use index ***/ Select Name from testTable where Id<100 /** Query No. 2 Following Query will not use index ***/ Select Name from testTable where Id>50000





Reader Comments
SQL query is one fundamental concept in query tuning and Optimization.The logic which you have provided would be helpful for the database consultants.I hope this helps.