Microsoft Technologies: Indexes in SQL Server

What is meant by INDEX?

What are different types of INDEX in SQL Server?

How to create INDEX?

How to DROP INDEX?

How to Enable-Disable INDEX?

What is difference between clustered and non-clustered indexes?

What are the disadvantages of indexes?

What is meant by INDEX?

Index is a database object which is used to improve query performance for data retrieval but it also adds some burden on data modification queries such as INSERT, UPDATE & DELETE. So it is advised to use correct Index type. If your table gets many data modification request then it is advised not to have Clustered index on such table, because it will slow down you DML operations. Index can be created on one or more column (max 16 columns) on existing table or views. When we create the index, it reads the column’s data and forms a relevant data structure to minimize the number of data comparison. Indexes are automatically maintained for a table or view whenever the table data is modified.

Let’s understand Index with some real world examples.

Example 1

Suppose A table CUSTOMERS has around 20 Million records. When we try to retrieve records for two or three customers based on their customer id, all 20 Million records are taken and comparison is made to get a match on the supplied customer id. Just think about now how much time that will take if it is a web based application and there are 30-50 customers that want to access their data through internet.

Does the data server do 20 Million X 30 searches? The answer is NO because all modern database use the concept of Index.

Example 2

Suppose you have a book which has thousands of pages, Thousands of pages are divided into many chapters. Each chapter has many pages and each page has related topic written, if you want to read a particular topic then how do you find it in your book? Answer is you just look into the Table of Content which is always present in the beginning from Chapter 1.This Table of contents helps you search your topics in all topics and then you jump to a particular page out of those thousands pages. So this Table of Content is nothing but Index for your book. You can assume how helpful are those table of content 10-20 pages, if it was not there then it would have been very difficult to search needed topic in those thousands of pages. Table of content will always be created after completing the book.

Let’s relate this understanding with Database, now you assume your Book as a Database Table and Table of Content as Index. After creating and populating a Table or view, you create index on it. Once you create an index on a field, automatically another data structure will be created which holds the field value and pointer to the record it relates to. This index structure is then sort and stores the data rows in the database which allows Binary search on it, which means these Indexes require additional space on the disk, and get stored with a table using MYISAM engine.

Summary

1. Like an index in a book, an index in a database lets you quickly find specific information in a table or indexed view.

2. An Index contains keys built from one or more columns in a Table, or view and pointers that map to the storage location of the specified data.

3. These keys are stored in structure (B-Tree) that enables SQL Server to find the rows associated with the key values quickly and efficiently.

4. We can significantly improve the performance of database queries and applications by creating well designed indexes to support your queries.

5. Indexes can reduce the amount of data that must be read to return the query result set.

6. Indexes can also enforce uniqueness on the rows in a table, ensuring the data integrity of the table data.

What are different types of INDEX in SQL Server?

1. Clustered Index

2. Non Clustered Index

3. Unique

4. Filtered

5. Xml

6. Full Text

7. Spatial

8. Columnstore

9. Index with Included Columns

10. Index on Computed Columns

In this article we will focus only on Clustered and Non Clustered Index.

How to create INDEX?

We can create Index on an empty table or one that contains data, creating an index on an empty table has no performance implications at the time of Index is creation, however performance will be affected when data is added to the table.

Creating Index on large table should be planned carefully so that database performance is not hindered. The best way to create Index on large table is to first create Clustered Index and then create any Non Clustered Index.

When a table has no Clustered Index then its data will be stored in unsorted way and it will be called as Heap. When you insert a new record in Heap, it will be inserted in any random position in the table and position depends on the page free space. If A Clustered index is created on a heap with several existing nonclustered indexes, all the nonclustered indexes must be rebuilt so that they contain the clustering key value instead of the row identifier (RID).

Syntax to Create Index

CREATE [UNIQUE] CLUSTERED |NONCLUSTERED INDEX index_name ON table_name|view_name (Column ASC|DESC [,..n])

Uniqueness can be a property of both clustered and nonclustered indexes. If UNIQUE and CLUSTERED keywords are missing in CREATE INDEX command then it will create Non-Unique Non Clustered Index.

Clustered Index

1. When Clustered Index is created it first sorts the data rows and then stores the data rows in the table or view based on their key values.

2. Whenever new records come into the table, it gets fit to its position based on the key values that is the reason INSERT, UPDATE and DELETE are slow on INDEXED table because it will take some time to rearrange the data Pages.

3. When we create Primary Key constraint in a Table and if Clustered Index is not present on the table then it will automatically create Clustered Index on primary key column with the same name as the Primary Key constraint has.

4. We can have only 1 Clustered index on a Table.

5. Reason for having only 1 Clustered Index on a table is the data rows themselves can be sorted in only one order either ASC or DESC.

6. When a table has Clustered Index then it will be called as Clustered Table.

7. When creating the Clustered Index, SQL Server reads the Primary Key column values and forms a Binary Tree on it. This Binary Tree information is then stored separately in the disc, you can find Index created on Primary Key in database Index Folder.

8. Clustered Index can be created on Non - Primary Key columns. When we create Primary Key later in that case it will not create Clustered index because it is already available, it will create Unique Non Cluster Index.

9. Uniqueness is the property of Clustered Index and Non Clustered Index.

10. When we drop the Index, It leaves the data in sorted order and if new records come then it will be inserted at the random position in the table.

11. Clustered Index leaf nodes contains the Data Pages means row’s data.

Let’s go through with some example to understand Clustered Index

Let’s create a table name as EMP in our tutorial database TEACHMESQLSERVER

USE TEACHMESQLSERVER

IF OBJECT_ID('EMP','U') IS NOT NULL

DROP TABLE EMP

CREATE TABLE EMP(EMPID INT NOT NULL)

As you can see from the below image right now EMP table doesn’t has Index. It is because we have not created the INDEX or we have not defined any Unique or Primary Key constraint on EMPID column at the time of table creation.

Let’s insert few records in the EMP table.

INSERT INTO EMP VALUES(1),(3),(2),(0)

GO

SELECT * FROM EMP

As you can see EMPID was not sorted and stored because it doesn’t have Clustered Index on EMP Table. I have already discussed that we can also create Clustered Index on Non Primary Key columns.

Let’s Create Clustered Index on EMPID column in EMP table.

CREATE CLUSTERED INDEX EMPIDINDEX ON EMP (EMPID ASC)

As you can see now EMP table has one Clustered Index name as EMPIDINDEX.

Now let’s retrieve all the records we had inserted last time from EMP table.

GO

SELECT * FROM EMP

As you can see once after creating Clustered Index we retrieved the data, it is coming in sorted form. It is because at the time of Clustered Index creation we had given sorting Order as ASC. Data can be sort either ASC or DESC. This is the main reason we can have only one Clustered Index on a Table.

Now let’s try to create one more Clustered Index. We already know the answer that SQL Server will not allow to create another Clustered Index on EMP table.

CREATE CLUSTERED INDEX EMPIDINDEX1 ON EMP (EMPID ASC)

Here you go – got error message

Msg 1902, Level 16, State 3, Line 1

Cannot create more than one clustered index on table 'EMP'. Drop the existing clustered index 'EMPIDINDEX' before creating another.

Let’s try to insert new records

INSERT INTO EMP VALUES (8)

INSERT INTO EMP VALUES(6)

GO

SELECT * FROM EMP

As you can see from the above result output, records are coming in ASC order even after we inserted 8 before 6.

As I have already discussed if our table already has Clustered Index and Primary key constraint was not defined at the time of table creation, if we create Primary Key constraints using Alter command on EMP table then it will never create a Clustered Index, It will create a Unique Non Clustered Index.

Let’s create Primary Key constraint on EMPID column.

ALTER TABLE EMP ADD CONSTRAINT PRIMARYKEY PRIMARY KEY (EMPID)

As you can see PRIMARYKEY INDEX is a Unique Non Clustered Index.

You can find all Indexes list in SQL Server from SYS.INDEXES table.

SELECT * FROM SYS.INDEXES

Dropping index will not affect the sorted stored records. Let’s Drop the INDEX. I will discuss how to DROP Index in this article, if DROP command looks strange here then don’t worry.

DROP INDEX EMPIDINDEX ON EMP

ALTER TABLE EMP DROP CONSTRAINT PRIMARYKEY

GO

SELECT * FROM EMP

As you can see Result set is same after dropping the Index. Now let’s insert new records in EMP table.

INSERT INTO EMP VALUES (7)

GO

SELECT * FROM EMP

As you can see from the result set new records were inserted at the end not its key value position.

NonClustered Index

1. We can have many Nonclustered (max 999) index on a Table.

2. Uniqueness is the property of Clustered Index and Non Clustered Index.

3. By default a Nonclustered Index is created if Clustered or Unique options are not specified in the Create Index command.

4. A Nonclustered index never store data in sorted order if your table has more than 1 column, ASC and DESC keywords will be ignored if used at the time of Nonclustered index creation. If your table has only column then only it will store data in sorted order.

5. A Nonclustered Index is stored separately from the table, so additional space is required. The amount of space required will depend on the size of table and the number and types of columns used in index.

6. A Nonclustered Index contains the nonclustered Index key values and each key value entry has a pointer to the data row that contains the key value.

7. The pointer from an index row in a nonclustered index to a data row is called a row locator.

8. The Structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a Row locator is a pointer to the row. For a clustered Index, the row locator is the clustered index key.

9. Whenever you create constraint, nonclustered index will be created.

Let’s go through with some example to understand Nonclustered Index

Let’s create a table name as EMP in our tutorial database TEACHMESQLSERVER

USE TEACHMESQLSERVER

IF OBJECT_ID('EMP','U') IS NOT NULL

DROP TABLE EMP

CREATE TABLE EMP(EMPID INT NOT NULL, DEPID INT NOT NULL)

As you can see from the above image right now EMP table doesn’t has any Index. It is because we have not created any INDEX or we have not defined any constraint at the time of table creation.

Let’s insert few records in the EMP table.

INSERT INTO EMP VALUES(1,11),(3,5),(2,4),(0,7),(1,1)

GO

SELECT * FROM EMP

As you can see from the above result set records are not stored in sorted order. It was stored in the sequence it was written in query.

Let’s Create Nonclustered Index on EMPID column in EMP table.

CREATE NONCLUSTERED INDEX EMPIDINDEX ON EMP (EMPID ASC)

GO

SELECT * FROM EMP

As you can see from the above image Non-Uique, Non-Clustered index has been created.

Now let’s retrieve all the records we had inserted last time from EMP table.

GO

SELECT * FROM EMP

As you can see from the above result set, records were not stored in sorted ordered even after defining the ordering type (ASC) at the time of EMPIDINDEX Index creation.

Now let’s create one more Nonclustered Index on DEPID column in EMP table. We have already discussed that we can create many nonclustered index on a table.

CREATE NONCLUSTERED INDEX DEPIDINDEX ON EMP (DEPID DESC)

GO

SELECT * FROM EMP

Now once again let’s retrieve all the records we had inserted last time from EMP table.

GO

SELECT * FROM EMP

As you can see from the above result set, records were not stored in sorted ordered even after defining the ordering type (DESC) at the time of DEPIDINDEX Index creation. This clearly shows that Nonclustered Index never stores the data in any order.

Let’s insert a new record in EMP table

INSERT INTO EMP VALUES (-2,6)

GO

SELECT * FROM EMP

As you can see from the above result set, new record was inserted at the end of all records.

What is difference between clustered and non-clustered indexes?

1. A table can have multiple non-clustered indexes. But, a table can have only one clustered index.

2. A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.

3. A Clustered Index is faster than a Nonclustered Index but in some condition Non-Clustered index would perform better. You can refer to thisarticle.

4. If Clustered Index is disabled then you can’t perform DML operation, but if Nonclustered index is disabled then also you can perform DML operation.

5. Clustered Index determines the Storage Order of rows in the table, hence doesn’t require additional disk space but where as a Nonclustered Index is stored separately from the table, so additional space is required.

6. A clustered index determines the order in which the rows of the table will be stored on disk (asc or desc) – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.

7. Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.

8. Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.

How to DROP INDEX?

Implicitly created index can’t be dropped using DROP INDEX command which means you can’t drop the Indexes if Indexes were created automatically by any of Primary Key or Unique constraint. To drop the index you have to drop the constraint it will automatically delete the Index.

If a clustered index is dropped on a table that has several nonclustered indexes, the nonclustered indexes are all rebuilt as part of the DROP operation. This may take significant time on large tables.

How to Enable-Disable INDEX?

DROP INDEX index_name ON table_name|view_name

Syntax to Disable Index

You can disable a particular Index using below query syntax

ALTER INDEX Index_name ON table_name|view_name DISABLE

You can disable all available Index using below query syntax

ALTER INDEX ALL ON table_name|view_name DISABLE

When you disable the clustered index then you can’t perform DML (INSERT, UPDATE & DELETE) operation. If you try to do then you will get below error message

Msg 8655, Level 16, State 1, Line 1

The query processor is unable to produce a plan because the index ‘Index_name’on table or view 'table_name or View_name' is disabled.

You can perform DML operation if your nonclustered index is disabled.

Syntax to Enable Index

You can enable a particular Index using below query syntax

ALTER INDEX Index_name ON table_name|view_name REBUILD

You can enable all available Index using below query syntax

ALTER INDEX ALL ON table_name|view_name REBUILD

What are the disadvantages of indexes?

Additional Disk Space – Clustered Index does not require any additional storage. Every Non-Clustered index requires additional space as it is stored separately from the table. The amount of space required will depend on the size of the table, and the number and types of columns used in index.

Insert, Update and Delete Operation can become Slow – When DML statements modify data in a table, the data in all the indexes also needs to be updated. Indexes can help, to search and locate the rows, that we want to delete, but too many indexes to update can actually hurt the performance of data modifications.

Database Indexing, Types of Index, Indexes

A database index is a copy of part of a table that is used to speed up data retrievalin a database.Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.

Types of Index:
Clustered Index
Non-Clustered Index

Clustered Index:
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. Theleaf nodes of a clustered index contain the data pages.
A clustered index stores data similar to a phone directory where all people with the same last name are grouped together. SQL Server will quickly search a table with aclustered index while the index itself determines the sequence in which rows are stored in a table. Clustered indexes are useful for columns searched frequently for ranges of values, or are accessed in sorted order.

CREATE UNIQUE CLUSTERED INDEX DummyTable2_EmpIndexON DummyTable2 (EmpID)GO

Non-Clustered Index:
A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodescontain index rows.
A non-clustered index stores data comparable to the index of a text book. The index is created in a different location than the actual data. The structure creates an index with a pointer that points to the actual location of the data. Non-clustered indexes should be created on columns where the selectivity of query ranges from highly selective to unique. These indexes are useful when providing multiple ways to search data is desired.

CREATE UNIQUE NONCLUSTERED INDEX DummyTable1_empidON DummyTable1 (empid)GO

Index

DEFINITION

A database object which helps database engine to retrieve the data in a faster way without scanning an entire table

DATA STORAGE STRUCTURE

Structure

Heap

Clustered Index

Non Clustered Index

Basic Info

When there is no clustered index, data will be stored in Heap structure

Data will be stored in B-Tree structure and leaf node will be data pages.

Data will be stored in B-tree structure and leaf node contains Index Pages which points either Heap/ClusteredIndex

Sysindexes

Index_id : 0

First_iam_page : contains starting memory location

Index_id : 1

Root_page : contains starting memory location

Index_id : 2

Root_page : contains starting memory location

B-TREE STRUCTURE

B means Balanced
B-tree is always symmetrical, with the same number of pages on both the left and right halves at each level
Level
Root – Starting page. Can be only 1
Intermediate – In between pages which holds only reference
Leaf – They can refer to clustered Index / Heap depends on availability of Clustered Index
Page
Each page will have 8192 bytes, out of this, 8060 bytes can store data
If clustered index has been created on INT column then one page can store max 2015 records
Initially only a root page will be there. It can hold upto 2015 records. The moment 2016th record gets inserted, page split will occur. Root page will be moved to leaf page, one more leaf page will be added. Half of the records will be moved to new leaf page and one root page will be created which will have reference of 1st record from each leaf page.

TYPE OF INDEX
Clustered
Maximum only 1 can be created
By default it will be created with Primary Key. However non-clustered index can be created on primary key with following syntax

create table Temp(a int primary key nonclustered, bvarchar(10)
No. of Key columns per Index : 16
Index Key Record Size: 900 bytes

Non-clustered
Maximum 999 non clustered index can be created in SQL 2008, previously it was 249
By default it will be created with Unique Key.
No. of Key columns per Index : 16
Index Key Record Size: 900 bytes

XML
Maximum 249 can be created
No. of Key columns per Index : 16
Index Key Record Size: 900 bytes

Spatial
Can be defined on GEOMATRY / GEOGRAPHY datatype
Maximum 249 can be created
No. of Key columns per Index : 16
Index Key Record Size: 900 bytes

DATATYPE RELATED

a. VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), NTEXT, IMAGE, TEXT cannot be part of Index Key columns

b. Computed columns can be indexed unless they don’t violate datatype rules

SPECIAL INDEX

Covered Index

- An index that is constructed such that SQL Server can completely satisfy queries by reading only the index is called a covering index.

- Can be created using INCLUDE clause

- Those columns won’t be counted in maximum 16 column count

- Maximum 1023 columns can be part of covering index

- Those columns will be counted only at leaf level

- Performance can be improved

- Example

CREATE NONCLUSTERED INDEX IX_TABLE ON TABLE (COLUMN1) INCLUDE(COLUMN2)

Filtered Index

- An index with WHERE clause

- Must be non-clustered

- Cannot be created on computed

- WHERE condition will support only >,>=,=,<,<=, IS NOT NULL, NOT NULL and multiple where conditions separated by AND/OR

INDEX OPTIONS

Online Index Creation

When index has to be created on live production database, then it is preferable to create with ONLINE option with value OFF. By default it will be ON. When you make it off, it will keep the lock on table and prevent any kind of access.

FillFactor

Index Hint

A way to force SQL Engine to use a particular index.

SELECT * FROM TableName WITH (INDEX = IndexName)

INDEX MAINTENANCE

Reorganizing Index / Rebuilding Index

By the amount of time, more DML operations will affect the index pages and database will have unorganized data; which will impact on performance.

DBCC SHOWCONTIG ('Domain.Domainvalue') WITH FAST,TABLERESULTS, ALL_INDEXES, NO_INFOMSGS

If LogicalFragmentation column shows percentage (between 5 to 30), reorganize is required, >30, rebuild is required.

ALTER INDEX IndexName ON TableName REORGANIZE

(Same as DBCC INDEXDEFRAG (dbid, ObjectID, IndexID)

ALTER INDEX IndexName ON TableName REBUILD

(Same as DBCC DBREINDEX (‘TableName’, ‘IndexName’)

Designing good set of Index

Introduction

We all know about the different type of index but the main question is who we decide which columns need the indexing and in which condition is not an easy task. Here in this article I am trying to understand it. For this I am taking some reference from MSDN. Hope it will be helpful

Disadvantage of Indexes

We must remember that, Index improves the Performance of SELECT statement only and decrease the Performance of INSERT/UPDATE/DELETE and MERGE statement because all indexes must adjusted appropriately when the data changes in the table. So, using too many index in the table is not good.

Guideline to Design Index

1. We must try to create index on a few columns within a table not all the columns.

2. Index in a small table that have few records is not wise idea as the Query optimizer longer to traverse the index searching for data than to perform a simple table scan.

3. Indexing in a View (Materialized View) significantly improve the performance when the view contains Aggregate function or Join multiple table or in both the cases. In this situation the Query optimizer not use the internal query of the Views, the use the View as whole. For designing Indexed Views please look at the MSDN reference

https://technet.microsoft.com/en-us/library/ms187864(v=sql.105).aspx

4. Create nonclustered indexes on all columns that are frequently used in predicates and join conditions in queries.
5. Covering Index is an important factor starts from Microsoft SQL Server 2008. Covering indexes can improve query performance because all the data needed to meet the requirements of the query exists within the index itself. That is, only the index pages, and not the data pages of the table or clustered index, are required to retrieve the requested data; therefore, reducing overall disk I/O.
But we must consider the volume of data in the table as the cover index also increases the volume of index table. If the volume is to heavy, we must follow the olfaction style with composite index.

6. If we are going to Update a single rows, we must use a single query not use multiple query to update the same rows.

7. We must evolutes the columns used within the query very carefully

Exact Match to Specific Values

SELECT BusinessEntityID, JobTitle

FROM HumanResources.Employee

WHERE BusinessEntityID = 228;

Nonclustered or clustered index on the BusinessEntityIDcolumn.

Exact match to a value in an IN (x,y,z) list

SELECT BusinessEntityID, JobTitle

FROM HumanResources.Employee

WHERE BusinessEntityID IN (288,30,15);

Nonclustered or clustered index on the BusinessEntityIDcolumn.

Range of values

SELECT ProductModelID, Name

FROM Production.ProductModel

WHERE ProductModelID BETWEEN 1 and 5;

OR

WHERE ProductModelID >= 1 AND ProductModelID <= 5;

Clustered or nonclustered index on the ProductModelIDcolumn.

Join between tables

SELECT a.ProductAssemblyID, b.Name, a.PerAssemblyQty

FROM Production.BillOfMaterials AS a

JOIN Production.Product AS b

ON a.ProductAssemblyID = b.ProductID

WHERE b.ProductID = 900;

Nonclustered or clustered index on
the ProductID andProductAssemblyID columns.

LIKE comparison

SELECT CountryRegionCode, Name

FROM Person.CountryRegion

WHERE Name LIKE N'D%'

Nonclustered or clustered index on the Name column.

Sorted or aggregated

SELECT a.WorkOrderID, b.ProductID, a.OrderQty, a.DueDate

FROM Production.WorkOrder AS a

JOIN Production.WorkOrderRouting AS b

ON a.WorkOrderID = b.WorkOrderID

ORDER BY a.WorkOrderID;

Nonclustered or clustered index on the sorted or aggregated column.

For sort columns, consider specifying the ASC or DESC order of the column.

PRIMARY KEY or UNIQUE constraint

INSERT INTO Production.UnitMeasure

(UnitMeasureCode, Name, ModifiedDate)

VALUES ('OZ1', 'OuncesTest', GetDate());

Clustered or nonclustered index on the column or columns defined in the constraint.

UPDATE or DELETE operation in a PRIMARY KEY/FOREIGN KEY relationship

Nonclustered or clustered index on the foreign key column.

Column is in the select list but not in the predicate.

SELECT Title, Revision, FileName

FROM Production.Document

WHERE Title LIKE N'%Maintenance%' AND Revision >= 0;

Nonclustered index withFileName specified in the INCLUDE clause.

8. Keep the length of the index key short for clustered indexes. Additionally, clustered indexes benefit from being created on unique or nonnull columns.

9. Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. However,varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns.

10. In case of XML columns we can used the XML index.

11. Examine column uniqueness. A unique index instead of a nonunique index on the same combination of columns provides additional information for the query optimizer that makes the index more useful.

12. Consider a Filtered index in a column with frequently NULL values and Distinct range of values increase the performance by Filtered index.

13. Consider the order of the columns if the index will contain multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.

Hope you like it.

Is there any differences between Table Scan and Index Scan

Introduction

As we all know that the developer prefers the Index Seek in the case of performance of a query and we all know that what is Index Seek and who it improve the performance. In this article we are not going to discuss about Index Seek.

A table scan is performed on a table which does not have an Index upon it (a heap) – it looks at the rows in the table and an Index Scan is performed on an indexed table – the index itself.

Here we are trying discuss about a specified scenario occurs related to Table Scan and Index Scan.

Scenario Description

Suppose we have a table object without any Index on it. The name of the Table Object istbl_WithoutIndex and another Table Object called tbl_WuthIndex

-- Heap Table Defination

IF OBJECT_ID(N'dbo.tbl_WithoutIndex', N'U') IS NOTNULL

BEGIN

DROP TABLE [dbo].[tbl_WithoutIndex];

END

GO

CREATE TABLE [dbo].[tbl_WithoutIndex]

(

EMPID INT NOT NULL,

EMPNAME VARCHAR(50) NOT NULL

);

GO

-- Insert Some Records

INSERT INTO [dbo].[tbl_WithoutIndex]

(EMPID, EMPNAME)

VALUES (101, 'Joydeep Das'),

(102, 'Sukamal Jana'),

(103, 'Ranajit Shinah');

GO

-- Table with Index (Clustered Index for primary Key)

IF OBJECT_ID(N'dbo.tbl_WithIndex', N'U') IS NOT NULL

BEGIN

DROP TABLE [dbo].[tbl_WithIndex];

END

GO

CREATE TABLE [dbo].[tbl_WithIndex]

(

EMPID INT NOT NULL PRIMARY KEY,

EMPNAME VARCHAR(50) NOT NULL

);

GO

-- Finding Index Name

sp_helpindex tbl_WithIndex;

index_name

index_description

index_keys

PK__tbl_With__14CCD97D3AD6B8E2

clustered, unique, primary key located on PRIMARY

EMPID

-- Insert Some Records

INSERT INTO [dbo].[tbl_WithIndex]

(EMPID, EMPNAME)

VALUES (101, 'Joydeep Das'),

(102, 'Sukamal Jana'),

(103, 'Ranajit Shinah');

GO

Now we are going to compare the execution plan output

SELECT * FROM [dbo].[tbl_WithoutIndex];

As we can see in the Execution plan, there is Table Scan. As the table has no Index (Here I mean the clustered Index) the table is a Heap. So when we put the SELECT statement the entire table scanned.

SELECT * FROM [dbo].[tbl_WithIndex];

Here he Table has a PRIMARY KEY, so it has a Clustered Index on it. But here we are not putting the Index columns on the WHERE clause, so the Clustered Index Scan Occurs.

Close Observation of Execution Plan

Please remember that the table has small number of records.

Table Name

Estimated IO Cost

tbl_WithoutIndex

0.0032035

Tbl_WithIndex

0.003125

If we see the Estimated Operation Cost, it would be same for both the Query (0.0032853).

Question in Mind

Here we can see the Estimated Operation cost for both the Query is same. So Question is in the mind that, if a Index Scan occur we can drop the index and use the heap (in our case). So is there any other difference between them.

How they are Differences

Here we understand what the internal difference between Table Scan and Index Scan.

When the table scan occurs MS SQL server reads all the Rows and Columns into memory. When the Index Scan occurs, it's going to read all the Rows and only the Columns in the index.

Effects in Performance

In case of performance the Table Scan and Index Scan both have the same output, if we have use the single table SELECT statement. But it differs in the case of Index Scan when we are going to JOIN table.

Finding Unused Index

Introduction

To increase the performance of our Microsoft SQL Server we have to find the Unused index and Drop it.

How to find it

From MS SQL Server version 2005 introduce two DMV to see the usages status of the INDEX.

sys.dm_db_index_operational_stats

This DMV allows you to see insert, update and delete information for various aspects for an index. Basically this shows how much effort was used in maintaining the index based on data changes.

SELECT OBJECT_NAME(A.[OBJECT_ID]) AS [TABLE NAME],

I.[NAME] AS [INDEX NAME],

A.LEAF_INSERT_COUNT,

A.LEAF_UPDATE_COUNT,

A.LEAF_DELETE_COUNT

FROM SYS.DM_DB_INDEX_OPERATIONAL_STATS (db_id(),NULL,NULL,NULL ) A

INNER JOIN SYS.INDEXES AS I

ON I.[OBJECT_ID] = A.[OBJECT_ID]

AND I.INDEX_ID = A.INDEX_ID

WHERE OBJECTPROPERTY(A.[OBJECT_ID],'IsUserTable') = 1;

sys.dm_db_index_usage_stats
This DMV shows you how many times the index was used for user queries.

SELECT OBJECT_NAME(S.[OBJECT_ID]) AS [TABLE NAME],

I.[NAME] AS [INDEX NAME],

USER_SEEKS,

USER_SCANS,

USER_LOOKUPS,

USER_UPDATES

FROM SYS.DM_DB_INDEX_USAGE_STATS AS S

INNER JOIN SYS.INDEXES AS I

ON I.[OBJECT_ID] = S.[OBJECT_ID] AND I.INDEX_ID = S.INDEX_ID

WHERE OBJECTPROPERTY(S.[OBJECT_ID],'IsUserTable') = 1

AND S.database_id = DB_ID();

How to Decide

Based on the output of the two above query we have to decide.

If we see indexes where there are no seeks, scans or lookups, but there are updates this means that SQL Server has not used the index to satisfy a query but still needs to maintain the index.

The general guidelines for Index

We all know that the INDEX is a key factor of performance tuning. As proper index can increase the performance and bad implementation of index implementation can dramatically decreases the performance.

In this article I am trying to gather some facts from different sources, which can provide us a general guide line to choose appropriate columns and index for performance boost up.

The general tendency that I found for implementation of Index is mentioned bellow.

The table objects contain one clustered index and maximum number of non clustered index.

Generally we builds the no-clustered index in each and every columns that is used in WHERE clause.

Is it the correct strategy to implement the index?

What happened, when INSERT, UPDATE, DELETE occurs in the table that have high number of index are there?

So, with proper knowledge of index we must follow some general guideline to implement the index in table objects.

The general guidelines for Index

Keep indexes lean. Try to build indexes on one or few columns at most. Wide indexes take longer to scan than narrow indexes.

Create the clustered index on every table. Try to create the clustered index on the column which is used most frequently for retrieving data and the column that does not have many duplicate values.

Create the clustered index on the column that will never be updated. Every time the clustered index key is updated, SQL Server must maintain not just the clustered index but also the non-clustered indexes since non-clustered indexes contain a pointer to the clustered index.

By default, SQL Server creates the clustered index on the PRIMARY KEY. It is beneficial to have the clustered index on the primary key. But sometimes it is better to make other columns as clustered index. We need a judgment over here that we make the clustered index on primary key or choosing other columns for it.

SQL Server has to perform some maintenance when index keys are updated, so each additional index could impose a slight performance penalty on the performance of INSERT, UPDATE and DELETE statements. Nevertheless the cost of index maintenance could be far less than the performance improvement you'll reap for your SELECT statements. Therefore you can be fairly liberal with the number of non-clustered indexes.

Don't create multiple indexes on the same set of columns. Such indexes provide no additional benefit but use up extra disk space and might hinder performance of INSERT, UPDATE and DELETE statements.

Check the default fill factor level configured at the server level. If you don't specify the FILLFACTOR option with CREATE INDEX statement, every new index will be created using the default fill factor.

Use the built-in tools for analyzing and recommending indexes but do not fully depends on them.

Always look at the query plan to finding the reason of poor performance query.

Related to Clustered Index

When we think about the performance factors, the Index plays the main role. In this article I am trying to discuss some points related to CLUSTERD INDEX.

The main tropics, that I am trying to highlight is mentioned bellow.

1. Cluster index Structure

2. How the Cluster index Works

3. Efficient Cluster Index

Clustered Index Structure

In order to understand the design principles of a good clustered index, we need to discuss how SQL Server stores clustered indexes. All table data is stored in 8 KB data pages. When a table contains a clustered index, the clustered index tells SQL Server how to order the table's data pages

In the SQL Server the index are organised as B-Tree structure.

1. Each pages of the index B-Tree are called index node.

2. The top node of the B-Tree is called the root node.

3. The bottom level of nodes in the index is called the leaf node.

4. Any level between the root and the leaf node are collectively known as intermediate level.

In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

The leaf level of the B-tree is always Level- 0 and the root level is always the highest level. In this figure there is only one intermediate Level, but in a large index will often have more than one intermediate Level, and a small index might not have an intermediate Level.

The Index pages in the root and intermediate levels contain clustering key and a page pointer down into the Next level of the B-Tree. This process will be repeated until the leaf node is reached. The leaf node of the clustered index contains the actual data in an ordered fashion based on the clustering key.

How the Clustered Index Works

To understand it, please look at the bellow figure of B-Tree structure.

It contains the 1 million records of clustered key Student ID. In this figure the Level-1 and 2 is the Index page. In Level – 1 each page contains 50000 records and clustered index values plus a pointer down into the associated page on the next level.

To understand it properly I am going to take an example.

We want to retrieve the details of student ID 500. For this SQL server has to read the three pages. The root page in Level 2, the intermediate page in Level 1, and the appropriate leaf level page in Level 0. The root level tells the SQL Server to which intermediate level to read and intermediate levels tells that which leaf level page to read to retrieve the details.

Effective clustered index

The prime characteristics of a clustered index are mentioned bellow.
Narrow
Unique
Static
Ever-increasing

I am trying to discuss it in little more details.

Narrow

The width of the Index refers to the number of bytes in the index key. Let's take an example to understand it.

CREATE TABLE student_dtl

(

studentID INT IDENTITY(1,1),

studentRefKEY UNIQUEIDENTIFIER,

studentName VARCHAR(50)

)

The table has the 15 melions of rows and only two columns are candidates for use as clustering key.

1. The "studentID" has the INT data type used the 4 bytes.

2. The "studentRefKEY" has the UNIQUEIDENTIFIER data type used the 16 bytes.

INSERT INTO student_dtl

(studentRefKEY, studentName)

VALUES (newid(), 'Joydeep Das')

GO

SELECT datalength(studentID) AS studentID,

datalength(empRefKEY) AS

FROM emp_dtl

-- Result Output

studentID studentRefKEY

----- ---------

4 16

So which one is good for clustered index. If we see the B-Tree structure of both, in case of UNIQUEIDENTIFIER the number of intermediate level is greter then the INT. The facts is the UNIQUEIDENTIFIER takes the much more space than INT data type. So when the clustered index is created few rows can be packed into each index page, and the clustered key requires an additional non-leaf level to store the Key.

Unique

First of all we have to know that the SQL Server douesnot require a cluster index to be UNIQUE.

A clustered index must be unique, If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users.

Static

We want to choose a clustering key that will never be updated. SQL Server must ensure that data exists in a logical order based upon the clustering key. Therefore, when the clustering key value is updated, the data may need to be moved elsewhere in the clustered index so that the clustering order is maintained.

There are other considerations that can affect how many pages are updated, such as whether an update to a variable-length column causes the row to exceed the amount of available space. In such a case, the data would still need to be moved, although only the data page of the clustered index is affected.

So, the updating the clustering key is more expensive. Therefore it is a best practice to avoid clustering on columns that are frequently updated.

Ever-increasing

An integer identity column is an excellent example of an ever-increasing column. The identity property continuously increments by the value defined at creation, which is typically one. This allows SQL Server, as new rows are inserted, to keep writing to the same page until the page is full, then repeating with a newly allocated page.

SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page

Reduction in clustered index fragmentation, this fragmentation results from data modifications and can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering.

Hope you like it.

Index and Disk Space

Introductions

Disk space is a very important factor to Create, Rebuild or Drop Index. Disk space causes the performance degradation of an index or even fails. So the understanding about disk space is very important factors. In this article I am trying to discuss related to it.

In this article we are discussing about

Space required in Index Data Definition Language Operations
An Example of Index Disk Space
Transaction Log Disk Space for Index Operations

Space required in Index DDL Operations

Here I am mentioning some Index operations not required any additional disk space is mentioned bellow.

Ø ALTER INDEX with REORGANIZE options not required any additional disk space as the index is reorganized within the existing pages allocated to it; no new pages are allocated. But the log space is required.

Ø When we are dropping a non-clustered index by DROP INDEX or when we are dropping a clustered index offline without specifying the MOVE TO clause and non-clustered index do not exist.

Ø When we are CREATE TABLE with PRIMARY KEY or UNIQUE constraints

The index operation that required additional disk space is mentioned bellow.

All other index DDL operations require additional temporary disk space to use during the operation, and the permanent disk space to store the new index structure or structures.

When a new index structure is created, disk space for both the old (source) and new (target) structures is required in their appropriate files and file groups. The old structure is not de-allocated until the index creation transaction commits.

The DDL operations that needs new index structures and required additional disk space is mentioned bellow.

Ø CREATE INDEX

Ø CREATE INDEX WITH DROP_EXISTING

Ø ALTER INDEX REBUILD

Ø ALTER TABLE ADD CONSTRAINT (PRIMARY KEY or UNIQUE)

Ø ALTER TABLE DROP CONSTRAINT (PRIMARY KEY or UNIQUE) when the constraint is based on a clustered index

Ø DROP INDEX MOVE TO (Applies only to clustered indexes.)

An Example of Index Disk Space

I am finding this example from MSDN.

The formula is used to measure, disk space requirements is mentioned bellow

IndexSize = NoOfRows * AvgSizeOfIndex * 2 * (1/(100-FILLFACTOR)/100

In this example, disk space requirements to create a clustered index are determined. Assume the following conditions are true before creating the clustered index:

Ø The existing table (heap) contains 1 million rows. Each row is 200 bytes long.

Ø Non-clustered index A contains 1 million rows. Each row is 50 bytes long.

Ø Non-clustered index B contains 1 million rows. Each row is 80 bytes long.

Ø The index create memory option is set to 2 MB.

A fill factor value of 80 is used for all existing and new indexes. This means the pages are 80% full.

Note: As a result of creating a clustered index, the two non-clustered indexes must be rebuilt to replace the row indicator with the new clustered index key.

Disk Space Calculations for an Offline Index Operation

In the following steps, both temporary disk space to be used during the index operation and permanent disk space to store the new indexes are calculated. The calculations shown are approximate: results are rounded up and consider only the size of index leaf level. The tilde (~) is used to indicate approximate calculations.
Determine the size of the existing (source) structures
Heap: 1 million * 200 bytes ~ 200 MB
Non-clustered index A: 1 million * 50 bytes / 80% ~ 63 MB
Non-clustered index B: 1 million * 80 bytes / 80% ~ 100 MB
Total size of existing structures: 363 MB
Determine the size of the new (target) index structures
Assume that the new clustered key is 24 bytes long including a uniqueifier. The row indicator (8 bytes long) in both non-clustered indexes will be replaced by this clustered key.

Clustered index: 1 million * 200 bytes / 80% ~ 250 MB
Non-clustered index A: 1 million * (50 – 8 + 24) bytes / 80% ~ 83 MB
Non-clustered index B: 1 million * (80 – 8 + 24) bytes / 80% ~ 120 MB
Total size of new structures: 453 MB

Total disk space required to support both the source and target structures for the duration of the index operation is 816 MB (363 + 453). The space currently allocated to the source structures will be deallocated after the index operation is committed.
Determine additional temporary disk space for sorting
Space requirements are shown for sorting in tempdb (with SORT_IN_TEMPDB set to ON) and sorting in the target location (with SORT_IN_TEMPDB set to OFF).

# When SORT_IN_TEMPDB is set to ON, tempdb must have sufficient disk space to hold the largest index (1 million * 200 bytes ~ 200 MB). Fill factor is not considered in the sorting operation. Additional disk space (in the tempdb location) equal to the index create memory Option value = 2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to ON ~ 202 MB.

# When SORT_IN_TEMPDB is set to OFF (default), the 250 MB of disk space already considered for the new index in step 2 is used for sorting. Additional disk space (in the target location) equal to the index create memory Option value = 2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to OFF = 2 MB.

Using tempdb, a total of 1018 MB (816 + 202) would be needed to create the clustered and non-clustered indexes. Although using tempdb increases the amount of temporary disk space used to create an index, it may reduce the time that is required to create an index when tempdb is on a different set of disks than the user database. For more information about using tempdb, see tempdb and Index Creation.

Without using tempdb, a total of 818 MB (816+ 2) would be needed to create the clustered and non-clustered indexes.

Disk Space Calculations for an Online Clustered Index Operation

When you create, drop, or rebuild a clustered index online, additional disk space is required to build and maintain a temporary mapping index. This temporary mapping index contains one record for each row in the table, and its contents are the union of the old and new bookmark columns.

To calculate the disk space needed for an online clustered index operation, follow the steps shown for an offline index operation and add those results to the results of the following step.

Determine space for the temporary mapping index. In this example, the old bookmark is the row ID (RID)) of the heap (8 bytes) and the new bookmark is the clustering key (24 bytes including a uniqueifier). There are no overlapping columns between the old and new bookmarks.

Temporary mapping index size = 1 million * (8 bytes + 24 bytes) / 80% ~ 40 MB. This disk space must be added to the required disk space in the target location if SORT_IN_TEMPDB is set to OFF, or to tempdb if SORT_IN_TEMPDB is set to ON.

Transaction Log Disk Space for Index Operations

To make sure that the Index operations can be rollback the transaction log cannot be truncated until the index operations has completed. Therefore, the transaction log must have sufficient room to store both the index operation transactions and any concurrent user transactions for the duration of the index operation. This is for both offline online index operations.

When we are using large scale index operation we must remember
The Transaction log should be backed up and truncated before index operation online. The log has sufficient space to store the project index and user transactions.
SORT_IN_TEMPDB options should be ON. This separates the index transactions from concurrent user transactions.
Use database recovery model that allows minimal logging of the index operation. This may reduce the size of the log and prevent the log from filling the log space.

Don't run the ONLINE index operations in explicit transactions. The log will not be truncated until the explicit transaction ends.

Hope you like it.

Filtered Non Clustered Index

Index is a most important factor for boosting up database performance. Choosing the right index for right columns in the table objects is one of the challenge factors, where performance is the key factors.

Here in this article I am trying to discuss related to FILTER Index.

Introduction

In Microsoft SQL Server 2008 introduce a special type of index called FILTERED index. Here the filter means the WHERE cluse. So the Filtered index is the Index with WHERE clause.

A filtered index is an optimized no clustered index, especially suited to cover queries that select

from a well-defined subset of data.

A well define filtered index can increase the performance of the query. Before going to depth, we are trying to enlarge the feature of the filtered index

Feature of the Filtered Index

The features are mentioned bellow.

1. A well defined filter index is smaller than the full-table non clustered index. Filtered statistics are more accurate than the full-table statistics as they cover the rows within the filtered index and hence improve the Query performance and plan quality.

2. The index is maintained when the data manipulation language (DML) operation affects the data in the Index. A filtered index reduces the index maintenance cost as the size is smaller than the full-table non clustered index and only maintained when the data of the filtered index are affected.

3. Filtered index can reduce disk storage for non clustered indexes when a full-table index is not necessary. You can replace a full-table non clustered index with multiple filtered indexes without significantly increasing the storage requirements.

Where it is most suitable

We must understand where the filter index is most suitable before implement it to get the higher degree of performance from the query.

There is a table that has huge amount of data and we often select only a subset of that data. For an example, we have a lot of NULL values in a column and we want to retrieve records with only non-NULL values. In MS SQL Server 2008 this type of columns is called Sparse Column.

In another scenario we have several categories of data in a particular column, but we often retrieve data only for a particular category value.

By this I mean to say that in the Query of Indexed column the WHERE statement is more or less fixed and the same WHERE statement is used in non clustered Filtered Index.

Implementation

-- The Base Table

CREATE TABLE Employee

(

EmpID INT PRIMARY KEY CLUSTERED,

EmpName VARCHAR(100) NOT NULL,

HireDate DATE NULL,

DOJ DATE NULL,

)

GO

-- Creating a non clustered index on HireDate column

CREATE NONCLUSTERED INDEX NC_Employee_HireDate

ON Employee(HireDate)

GO

-- Creating a non clustered Filetered index on DOJ column

CREATE NONCLUSTERED INDEX FI_Employee_DOJ

ON Employee(DOJ)

WHERE DOJ IS NOT NULL -- >>the filter criteria for the index

GO

-- Inserting Rrecords

DECLARE @intCounter INT = 1

WHILE @intCounter <= 10000

BEGIN

IF (@intCounter % 100) = 0

BEGIN

INSERT INTO Employee(EmpID, EmpName, HireDate, DOJ)

VALUES (@intCounter, 'ABC' + CAST(@intCounter AS VARCHAR),

CAST((GETDATE() - @intCounter) AS DATE)

, CAST((GETDATE() - @intCounter) AS DATE))

END

ELSE

BEGIN

INSERT INTO Employee(EmpID, EmpName, HireDate, DOJ)

VALUES (@intCounter, 'ABC' + CAST(@intCounter AS VARCHAR),

NULL, NULL)

END

SET @intCounter = @intCounter + 1

END

Execution Plan Analysis

Note

If the rows returned from the query are beyond the filtered index criteria, the optimizer will not use the filtered index. In this scenario, if we use a query hint to use the filtered index then in that case it will generate Error 8622.

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Hope you like it.

Index and Sort Order

Introduction

When we write the SQL Query we frequently use the ORDER BY clause to get the output in definite order (Ascending or descending).

But ORDER BY clause has deep impacts in the performance of the query executions. In this article I am trying to discuss related to it.

How it Works

When designing the index we should consider whether the data for the index key column should be stored in ascending or descending order. Ascending is the default order and maintains compatibility with earlier versions of SQL Server. It is most logical one, that the smallest values in the top and the biggest one is the bottoms.

Bust everything depends on the situations or the way we represent the records or how we constructs the SQL statements.

To understand it properly, I am taking an simple example where I am trying to demonstrate the performance issue with index data ordering.

Step-1 [ First create a Base Table ]

Here in this example I am taking my favorite employee table objects with attributes employee id, employee name and the department.

-- Base Table

CREATE TABLE my_Emp

(EmpID INT NOT NULL IDENTITY(1,1),

EmpName VARCHAR(50) NOT NULL,

EmpDept VARCHAR(1) NOT NULL)

Step-2 [ Now Insert some records on my base table ]

-- Record Insertion

INSERT INTO my_Emp

(EmpName, EmpDept)

VALUES ('Joydeep Das', 'A'),

('Tuhin Shinah', 'B'),

('Sangram Jit', 'C'),

('Sukamal Jana', 'A'),

('Sudip Das', 'B'),

('Manishankar Bhowmik', 'C'),

('Ratan Das', 'A'),

('Sidhu Jetha', 'B'),

('Subrata Kar', 'C')

GO

Step-3 [ Now I am creating a non clustered index on table ]

-- Index Created

DROP INDEX ix_nonclus_my_Emp ON my_Emp

CREATE NONCLUSTERED INDEX ix_nonclus_my_Emp

ON my_Emp (EmpID, EmpName)

Step-4 [ Now execute the following Query and find the Actual Execution Plan ]

SELECT EmpID, EmpName

FROM my_Emp WITH(INDEX(ix_nonclus_my_Emp))

ORDER BY EmpID ASC, EmpName DESC

Step-5 [ Now Drop the Index and Recreate it again by using ASC and DESC clause]

-- Drop Index

DROP INDEX ix_nonclus_my_Emp ON my_Emp

GO

-- Recreate Index

CREATE NONCLUSTERED INDEX ix_nonclus_my_Emp

ON my_Emp (EmpID ASC, EmpName DESC)

GO

Step-6 [ Now execute the following Query and find the Actual Execution Plan ]

SELECT EmpID, EmpName

FROM my_Emp WITH(INDEX(ix_nonclus_my_Emp))

ORDER BY EmpID ASC, EmpName DESC

Step-7 [ Now Compare Both the Execution Plan ]

In this case the SORT operator is found in the first execution plan but not in second hence increase the performance.

Summary

Sort order can be specified only for key columns. The sys.index_columns catalog view and the INDEXKEY_PROPERTY function report whether an index column is stored in ascending or descending order.

Hope you like it.

Scenario based Index type selection

Introduction

When we are thinking about index we have different options in mid to use them, such as Clustered / Non Clustered etc.

It is very important to understand in which scenario what type of Index gives us the performance boost up. As we all know that the too many indexes are not good for performance.

In this article I am trying to demonstrate the Scenario wise index selection process.

Scenario with Index Type

Here I am trying to create a table type structure, in one side I use the type of index and in opposite side I am using the scenario related to it.

Type

Scenario

Clustered Index

Clustered Index used in columns in the Query

1. The query that returns the range of values by using operators such as BETWEEN clause, >, >=, < and <= etc.

2. When the Query returns the larger result set.

3. When using JOIN clauses with FOREIGN key columns.

4. When ORDER BY or GROUP BY clauses in the query.

Use the Clustered Index on the following columns of Table

1. If the columns have the UNIQUE values.

2. The columns that are frequently used to retrieve the range of values.

3. The IDENTITY columns that are guaranteed to be UNIQUE and auto increasing numbers.

4. Columns that are used to sort data by using ORDER BY clause in SELECT statements.

Clustered Index is NOT good for the following columns of Table

1. The Columns those are frequently modified or changed.

2. Columns that use the composite key.

Non Clustered Index

Non Clustered Index used in columns in the Query

1. Query that use the JOIN or GROUP BY Clause.

2. Query that does not return large result set or data.

3. The columns of the query where we frequently use the WHERE clauses.

Use the non Clustered Index on the following columns of Table

1. Non clustered index is used in the non key columns that are frequently used in the query set.

2. In the columns where data is not distinct such as combination of columns where the clustered index is used in the others columns.

Covering Non Clustered Index

If we have to use a non-clustered index and if we know that your application will be performing the same query over and over on the same table, consider creating a covering index on the table for the query

Guideline to create Covered Index

· If the query or queries we run using the covering index are

seldom run, then the overhead of the covering index may

outweigh the benefits it provides.

· The covering index should not add significantly to the size of

the key. If it does, then it its use may outweigh the benefits it

provides.

· The covering index must include all columns found in the

SELECT list, the JOIN clause, and the WHERE clause.

Filtered Non Clustered Index

Use the filtered index in the following scenario

1. There is a table that has huge amount of data and we often select only a subset of that data. For an example, we have a lot of NULL values in a column and we want to retrieve records with only non-NULL values. In MS SQL Server 2008 this type of columns is called Sparse Column.

2. In another scenario we have several categories of data in a particular column, but we often retrieve data only for a particular category value.

By this I mean to say that in the Query of Indexed column the WHERE statement is more or less fixed and the same WHERE statement is used in non clustered Filtered Index.

Related tropics

1. Related to Clustered Index
http://sqlknowledgebank.blogspot.in/2012/07/related-to-clustered-index.html

2. Index Fragmentation
http://sqlknowledgebank.blogspot.in/2012/07/index-fragmentation.html

3. Index defragmentation
http://sqlknowledgebank.blogspot.in/2012/07/index-defragmentation.html

4. Index and Disk space
http://sqlknowledgebank.blogspot.in/2012/07/index-and-disk-space.html

5. Filtered non Clustered Index
http://sqlknowledgebank.blogspot.in/2012/07/filtered-non-clustered-index.html

6. Index and Sort Order
http://sqlknowledgebank.blogspot.in/2012/07/index-and-sort-order.html

Hope you like it.

Index and Statistics

Introductions

Microsoft SQL Server collects statistical information related to indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data.

If we understand the index statistics in better way, it helps us to optimize and fine tune our queries.

As the Statistics play a very important role in the performance of MS SQL Server query.

In this article I am trying discussing related to it. Before writing this article, I read several article and MSDN to gathering facts related to it.

Point to Focus

In this article I am trying to focus related to the following things of the SQL Server Index statistics.

1. What we mean by MS SQL server Index statistics

2. When the Index Statistics are Updated

3. Rules used by AUTO_UPDATE_STATISTICS

4. Determine that the Index statistics are out of date.

5. Updating the statistics

6. How to see the content of Index statistics

7. New feature for statistics in MS SQL Server 2008

What we mean by MS SQL server Index statistics

Before going to the actual definition of Index statistics, I am going to take an example so that we can understand it easily.

"Suppose you are a football coach and u have 20 players in your team and for paying final tournament you need 11 players out of 20. Now, you have to know (or maintain a list) all the statistics of each payer before making your final team."

Index statistics contains information related to the distribution of index key values. It means that the number of rows associated with each key value. To determine what kind of execution plan to be used when processing a query MS SQL Server query optimizer uses this information.

After creating the table Index when we perform the insert, update, or delete operations in the table, the statistics of the table become out of date. The MS SQL Server doesn't update the statistics information every time we made some changes (Insert/Update/Delete Operations) on the table. Statistics track the distribution of values within an index or within a particular column.

Note that, if a column is not indexed but can benefit from an index, SQL Server will automatically create statistics for that column.

When the Index Statistics are updated

It is very important to understand, when the MS SQL Server update the statistics. The database settings AUTO_UPDATE_STATISTICS controls when the statistics are automatically updated. By default the AUTO_UPDATE_STATISTICS is true that means that the statistics are automatically updated.

MS SQL Server determines that when to update the statistics after creating it. It is based on how old the statics is. It determines the outdated statistics based on number of Insert, Update and Delete from the date when the statistics is last updated and then recreate the statistics based on a threshold.

This threshold is relative to the number of records in the table. It means that the when the DML operation performs the index statistics slowly get older, until SQL Server determines to update the statistics.

Limitations:

When we have very large table and we are doing bulk Insert/Update/Delete operation the AUTO_UPDATE_STATISTICS can be overhead on our system. Because the AUTO_ UPDATE_STATISTICS busy to update the statistics and causes the system overhead. In such type of situation we must turn off the AUTO_UPDATE_STATISTICS and later we must manually update the statistics.

Rules used by AUTO_UPDATE_STATISTICS

"rowmodctr" column of the sysindexes table is used to determine the number of changes made since the last update of the statistics.

When the MS SQL Server updates the statistics it will follow the following rules.

Ø If our table have 6 or fewer rows, the statistics will be updated after 6 changes.

Ø If a table has 500 or fewer rows, statistics will be updated after 500 changes

Ø If a table has more than 500 rows, statistics will be updated after 20% of the total rows plus 500 rows are changed.

Syntax related to AUTO_UPDATE_STATISTICS

The syntax is mentioned bellow:

sp_helpdb DBName

GO

ALTER DATABASE DBName SET AUTO_UPDATE_STATISTICS ON

GO

Determine that the Index statistics are out of date

MS SQL Server uses the sampling methods to keep track of the last time when the statistics were updated. This information is used to determine how old your statistics is.

The function STATS_DATE is used to determine when the statistics was last updated. The sample script is uses this function to display index statistic date for all user defined indexes.

SELECT schema_name(o.schema_id)AS SchemaName,

OBJECT_NAME(si.object_id)AS TableName,

si.nameAS IndexName,

STATS_DATE(i.object_id, i.index_id)AS StatDate

FROM sys.indexes si

INNER JOIN sys.objects o

ON si.object_id= o.object_id

INNER JOIN sys.indexes i

ON i.object_id= si.object_id

AND i.index_id = si.index_id

WHERE o.type<>'S'

AND STATS_DATE(i.object_id, i.index_id)IS NOT NULL;

Updating the statistics

As we see that the problem related to MS SQL Server updating statistics automatically, to get the optimal output we need to manually update it when needed.

To update an Index statistics we can drop the Index and then recreate the Index, it will automatically update the statistics information. It works but it is not the good way to update statistics information manually.

The system stored procedure named "sp_updatestats" helps us to update the statistical information.

The syntax is mentioned below:

sp_updatestats[ [ @resample = ] 'resample']

Parameters

[ @resample =] 'resample'

Specifies that sp_updatestats will use the RESAMPLE option of the UPDATE STATISTICS statement. If 'resample' is not specified, sp_updatestats updates statistics by using the default sampling. Resample is varchar(8) with a default value of NO.

Example:

USE AdventureWorks2012;

GO

EXECsp_updatestats;

The UPDATE STATISTICS is another option to update the statistical information.

The syntax is mentioned bellow:

UPDATE STATISTICS table_or_indexed_view_name

[

{

{ index_or_statistics__name }

| ( { index_or_statistics_name } [ ,...n ] )

}

]

[ WITH

[

FULLSCAN

| SAMPLE number { PERCENT | ROWS }

| RESAMPLE

| <update_stats_stream_option> [ ,...n ]

]

[ [ , ] [ ALL | COLUMNS | INDEX ]

[ [ , ] NORECOMPUTE ]

] ;

<update_stats_stream_option> ::=

[ STATS_STREAM = stats_stream ]

[ ROWCOUNT = numeric_constant ]

[ PAGECOUNT = numeric_contant ]

For more information about syntax, follow the MSDN

http://msdn.microsoft.com/en-us/library/ms187348.aspx

Example:

USE AdventureWorks2012;

GO

UPDATE STATISTICS Sales.SalesOrderDetail WITH FULLSCAN;

GO

How to see the content of Index statistics

To see the actual contents of the statistics we use the DBCC SHOW_STATISTICS.

The syntax is mentioned bellow.

DBCC SHOW_STATISTICS (Table_Name, Index_Name)

For more information about SHOW_STATISTICS, follow the MSDN

http://msdn.microsoft.com/en-us/library/ms174384.aspx

Parameters:

Table_Name : The name of the Table.

Index_Name : The Index name of the Mentioned Table.

New feature for statistics in MS SQL Server 2008

The new feature of statistics in SQL 2008 are mentioned as per MSDN

SQL Server 2008 provides the option to create filtered statistics, which are restricted to a subset of the rows in the table defined by a predicate, also called a filter. Filtered statistics are either created explicitly, or more commonly, through creating a filtered index.

· Automatic creation: As mentioned, filtered statistics are automatically created as a by-product of filtered indexes. Unlike regular statistics, filtered statistics are never automatically created by the query optimizer as a result of query and update statements.

· Automatic update: To determine when statistics need to be automatically updated, the query optimizer uses column modification counters. For filtered statistics, the counter is multiplied by the selectivity of the filter, which ranges from 0 to 1, to compensate for the fact that the column counter tracks changes to all rows in the table, not the changes to the rows in the filtered statistics object.

· Sampling: To build the statistics, SQL Server 2008 reads every nth data page, according to the sampling rate, and then it applies the filter, if any.

· If you don't specify sampling rate, the default behaviour is to sample based on the number of rows in the table and, for filtered statistics, based on the selectivity of the filter. The more selective the filter, the more rows need to be sampled to gather enough data to build the histogram.

· Cardinality estimate: As with regular statistics, the optimizer attempts to pick the most relevant statistics when performing cardinality estimate. There is an additional check that the optimizer must make for filtered statistics: whether the statistics predicate contains the query predicate (that is, whether the query predicate must be true for a row if the statistics predicate is true for that row). Out of the possible filtered statistics, the query optimizer favours those that are as close to the query predicate as possible.

For more information about it, follow the MSDN

http://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx

Related tropics

1. When we use sp_updatestatshttp://sqlknowledgebank.blogspot.in/2012/04/when-we-use-spupdatestats.html

Referential Sources

http://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx

http://msdn.microsoft.com/en-us/library/ms187348.aspx

http://msdn.microsoft.com/en-us/library/ms174384.aspx

Fill Factor

Introduction

In my previous article, I am trying to discuss about the MS SQL Server Index and how the Index Boost up the performance of SQL Server Query.

Fill Factors is directly related to Index and Index is related to performance. So without the proper knowledge of Fill Factor, understanding of Index is not completed. So I in this article I am trying to discuss related to Fill Factors.

Point in Focus

1. What is Fill Factor

2. How it Effects

3. Measurement of Fill Factors % Value

4. How to Set Fill Factor

5. Is There anything New in MS SQL 2008/2012

6. Related tropics

What is Fill Factor

Before going to Fill Factor we have to understand the architecture of Index. In the MS SQL Server the Indexes are organised by B-Tree structure.

Please go to my previous article "Related to Cluster Index" to find the details of it.

http://sqlknowledgebank.blogspot.in/2012/07/related-to-clustered-index.html

In the MS SQL Server the smallest unit is the Page (8K) where one on more rows stores depends on the size of the rows.

Fill Factors is the value that determines the percentage of the spaces in each leaf-level pages to be field by data.

The range of the Fill Factor starts from 0% to 100% and the default value is 100%. Here the 0% and 100% means the same

How it Effects

I think the above example of Fill Factor is quite complex to understand to make it simple, let's take an example.

Index is fragmented due to DML operation on the table objects. Defragmented index is causes to bad performance of query in MS SQL Server.

Stuation-1

In this example we are taking an Index with Fill Factor of 0 or 100 % and the Leaf-level data pages are completely full. Now we are trying to update a row which is located in the same pages. The size of the new value that is updated is bigger than the previous one.

So In this situation the page size cannot support the new value as the page is full. So it create a new leaf-level pages and the 50% of the data moves to the new page and in the previous pages 50% data exists so the page filling of the old page is only 50% and the Index is fragmented. To find the desired rows SQL Server have to move multi pages as the Index is fragmented by Page splitting.

Situation-2

Now take above example with FILL FACTOR vale is between 0 to 100%. For example we are taking the FILL FACTOR Value = 70%.

Now for the above example (Situation-1) for the new updated value, MS SQL Server try to adjust the new data within 30% of the leaf-level pages as the 70% of the pages is full and the rest 40% is blank pages and the page is not split.

Measurement of Fill Factors % Value

It is hard to say that what percentage of fill factor that we use. It depends on situations to situations or we can say it how often the DML operation is done on the table objects.

Here I am trying to explain some situations and there fill factor percentage

Situation for Fill Factor

Fill Factor %

Static Table – The value of the table never changed

100

Tables Updated Less Often

95

Frequently Updated Table

70 to 90

If a table containing clustered Index with Primary key and use IDENTITY columns than we can use the Higher percentage vale of the Fill factor.

If the table containing clustered Index with Primary key and the value of the Primary key is NOT sequential like Auto incremental value than, we can use the Lower Percentage Value of the Fill Factor.

In case of OLTP systems we can take the lower percentage value of Fill Factor. But in case of OLAP system we can take the higher percentage value of Fill Factor.

Don't set the Fill Factor Value to 50% others your index is fragmented.

How to Set Fill Factor

To Set the Default Fill Factor

[A] MS SQL Server Enterprise Manager.

1. In the MS SQL Server console tree, right-click on your server.

2. From the shortcut menu, choose Properties.

3. Select the Database Settings tab.

4. Change the fill factor by choosing Fixed below Settings, and then specify the fill factor percentage by using the slider bar.

[B] By T-SQL

To set the Fill Factor 90%

sp_configure 'show advanced options', 1
GO

--Here is the message:

Configuration option 'show advanced options' changed from 0 to 1.

Run the RECONFIGURE command to install.

RECONFIGURE

GO

sp_configure 'fill factor', 90

GO

RECONFIGURE

GO

Specify Fill Factor of an Existing Index

USE AdventureWorks2012;

GO

-- Rebuilds the IX_Employee_OrganizationLevel_OrganizationNode index

-- with a fill factor of 80 on the HumanResources.Employee table.

ALTER INDEX IX_Employee_OrganizationLevel_OrganizationNode

ON HumanResources.Employee

REBUILD WITH (FILLFACTOR = 80);

GO

Another way to specify a fill factor in an index

USE AdventureWorks2012;

GO

/*

Drops and re-creates the IX_Employee_OrganizationLevel_OrganizationNode index on

the HumanResources.Employee table with a fill factor of 80.

*/

CREATE INDEX IX_Employee_OrganizationLevel_OrganizationNode ONHumanResources.Employee

(OrganizationLevel, OrganizationNode)

WITH (DROP_EXISTING = ON, FILLFACTOR = 80);

GO

Is There anything New in MS SQL 2008/2012

I don't find anything new on MS SQL Server 2005 and MS SQL Server 2012. If you have anything please give me some notes related to it.

Related tropics

1. Index defragmentation

http://sqlknowledgebank.blogspot.in/2012/07/index-defragmentation.html

Hope you like it.

Index of Materialized View Not Working

Introduction

I got a mail from one of my closed friends related to "Materialized view Index not working". I am little bit confused after getting this mail. When I am getting closer look at the problem that I understand what he mean by "Materialized view Index not working" This article is related to Why in Materialized view Index is not working.

Point to focus

1. Understanding the case scenario

2. Finding the rood of the problem

3. How we fix it.

4. Related Tropics

In this article I am not going to describe related to Materialize View as we all know that how important it is in case of performance boost up. If have you any confusion visit my previous article. You can find it in related tropic options of this article.

Understanding the case scenario

Here in this article, I cannot provide the exact scenario due to some restrictions but I can create a scenario related to it. So let's see what the scenario is.

Step-1 [ Creating the Base Table ]

IF OBJECT_ID('my_BaseTable') IS NOT NULL

BEGIN

DROP TABLE my_BaseTable

END

GO

CREATE TABLE my_BaseTable

(ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,

EMPNAME VARCHAR(50) NOT NULL,

GRADE VARCHAR(1) NOT NULL)

GO

Step-2 [ Now Insert Some records on our Base Table ]

INSERT INTO my_BaseTable

(EMPNAME, GRADE)

VALUES ('Mr. Sudip Das', 'A'),

('Mr. Joydeep Das', 'B'),

('Mr. Sukamal Jana', 'B'),

('Mr. Tuhin Kr. Shinah', 'C'),

('Mr. Sangram Jit', 'C'),

('Mr. Sourav Sarkar', 'D');

GO

Step-3 [ Display the Index Information ]

sp_helpindex my_BaseTable

index_name

index_description

index_keys

PK__my_BaseT__3214EC2766979D65

clustered, unique,primary key located onPRIMARY

ID

Here the UNIQUE CLUSTERED index is created.

Step-4 [ Optional -- Display the Records of the Base Table ]

ID EMPNAME GRADE

1 Mr. Sudip Das A

2 Mr. Joydeep Das B

3 Mr. Sukamal Jana B

4 Mr. Tuhin Kr. Shinah C

5 Mr. Sangram Jit C

6 Mr. Sourav Sarkar D

Step-5 [ Create the Materialize View ]

IF OBJECT_ID('View_my_BaseTable') IS NOT NULL

BEGIN

DROP VIEW View_my_BaseTable

END

GO

CREATE VIEW View_my_BaseTable

WITH SCHEMABINDING

AS

SELECT ID, EMPNAME, GRADE

FROM dbo.my_BaseTable;

Step-6 [ Create the Clustered index on the View ]

CREATE UNIQUE CLUSTERED INDEX IX_View_my_BaseTable

ON View_my_BaseTable (ID);

Step-7 [ Display the Index Information ]

sp_helpindex View_my_BaseTable

index_name

index_description

index_keys

IX_View_my_BaseTable

clustered, uniquelocated on PRIMARY

ID

Here the UNIQUE CLUSTERED index is created.

Step-7 [ Display the Actual Execution Plan ]

In this execution plan if we see it carefully we find that the Index named "PK__my_BaseT__3214EC2766979D65" is used which is the Base Table Index (The Index of the Primary Key). The Index that is created on the View is not used.

"And this is the Problem. Why the Index of the View is not used by the MS SQL Query optimizer? "

Finding the rood of the problem

The cost of using the indexed view may exceed the cost of getting the data from the base tables, or the query is so simple that a query against the base tables is fast and easy to find. This often happens when the indexed view is defined on small tables.

How we fix it

We can use the NOEXPAND hint if we want to force the query processor to use the indexed view.

SELECT * FROM View_my_BaseTable(NOEXPAND)

Related Tropics

1. Materialize view and the performance issue

http://sqlknowledgebank.blogspot.in/2012/03/materialize-view-and-performance-issue.html

Hope you like it.

Foreign Key and Index Binding

Introductions

In this article I am trying to discuss about the foreign key Index binding features. Hope it will be interesting and we can learn something new from it.

Point in focus

1. Foreign Key Definitions

2. Why Index Binding is necessary

3. What we find in our observations

Let's start it.

Foreign Key Definitions

Here I am just providing a definition of foreign key to understand it properly. If we look at the MSDN

"A foreign key (FK) is a column or combination of columns used to establish and enforce a link between the data in two tables. A link is created between two tables by adding the column or columns that hold one table's primary key values to the other table. This column becomes a foreign key in the second table."

In ideal conditions a foreign key must be liked with the Primary Key constraint in another table. But it is not necessary; it can be defined to reference the columns of a UNIQUE constraint in another table.

Why Index Binding is necessary

A foreign key referencing the primary key will always be bound to the primary key index. However, we might have additional unique indexes on the primary key column or column(s) for performance reasons.

This Index binding is necessary for two reasons.

Performance Factors

If the parent table is not queried often but rows are inserted into the child table frequently and heavily, a unique non-clustered index that "covers" the referential integrity check may be more desirable than the clustered index.

In SQL server 2008 and later versions preferred unique non-clustered index rather than the clustered index.

Control the index bound

If we try to drop an index bound to the foreign key it generate an error message. To do so we must drop the foreign key first and then drop the index.

What we find in our observations

Now we see a simple example of foreign key Index binding and observe the result very carefully.

Step-1 [ Creating Parent Table ]

IF OBJECT_ID('ChildTable') IS NOT NULL

BEGIN

DROP TABLE ChildTable;

END

GO

IF OBJECT_ID('ParentTable') IS NOT NULL

BEGIN

DROP TABLE ParentTable;

END

GO

CREATE TABLE dbo.ParentTable

(PtblID INT NOT NULL IDENTITY

CONSTRAINT PK_ParentTable

PRIMARY KEY CLUSTERED,

Col1 INT NOT NULL,

Col2 VARCHAR(50) NOT NULL

);

GO

Step-2 [ Creating non Clustered Covered Index ]

CREATE UNIQUE NONCLUSTERED INDEX idx_PTable_PtblID

ON dbo.ParentTable(PtblID) INCLUDE(Col1);

Step-3 [ Inserting some records in Parent Table ]

INSERT INTO dbo.ParentTable VALUES(1, 'Joydeep Das');

INSERT INTO dbo.ParentTable VALUES(2, 'Sukamal Jana');

INSERT INTO dbo.ParentTable VALUES(3, 'Tuhin Shinah');

GO

Step-4 [ Creating Child table with Foreign Key ]

CREATE TABLE dbo.ChildTable

(CtblID INT NOT NULL IDENTITY

CONSTRAINT PK_ChildTable PRIMARY KEY CLUSTERED,

PtblID INT NOT NULL

CONSTRAINT FK_ChildTable_ParentTable

FOREIGN KEY REFERENCES dbo.ParentTable(PtblID)

);

GO

Step-5 [ Inserting Records to Child Table ]

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

GO

Step-6 [ Update the Statistics ]

UPDATE STATISTICS dbo.ParentTable;

UPDATE STATISTICS dbo.ChildTable;

GO

Step-7 [ Finding the Foreign key Binding ]

SELECT a.name As [FK Binding]

FROM sys.foreign_keys f

INNER JOIN sys.indexes a

ON a.object_id = f.referenced_object_id

AND a.index_id = f.key_index_id

WHERE f.object_id = OBJECT_ID(N'dbo.FK_ChildTable_ParentTable');

GO

Now lets execute the above scrips in different version of the MS SQL Server.

MS SQL Server 2005

The output of the step-7 is

FK Binding

PK_ParentTable

SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, then the unique non-clustered index on the referenced column is used.

SQL 2008, SQL 2008R2 and SQL 2012

FK Binding

idx_PTable_PtblID

The foreign key is bound to the unique non-clustered index on the referenced column.

Hope you like it.

Missing Index

All of we know how important the index is. In this article I am trying to illustrate about the missing index and how we find the missing index.

Index is to improve the performance of the query. For the causes of missing index it takes long delay of performance of the query. I personally fall such conditions where a query takes more than 1 and ½ hour to complete the executions. So we understand that what the impacts of a missing index.

How we find the missing index

We can see the missing index by using the DMVs. Use these SQL Statements to find the missing index.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT ROUND(s.avg_total_user_cost * s.avg_user_impact

* (s.user_seeks + s.user_scans),0) AS [Total Cost],

d.[statement] AS [Table Name], equality_columns,

inequality_columns, included_columns

FROM sys.dm_db_missing_index_groups g

INNER JOIN sys.dm_db_missing_index_group_stats s

ON s.group_handle = g.index_group_handle

INNER JOIN sys.dm_db_missing_index_details d

ON d.index_handle = g.index_handle

ORDER BY [Total Cost] DESC

Here we use the TRANSACTION ISOLATION LEVEL READ UNCOMMITED. The purpose of this is to ensure the SQL that follows does not take out any locks and does not honour any locks.

The details of the related DMV are mentioned bellow.

DMV

Description

sys.dm_db_
missing_index_details

Contains details of the
database/schema/table
the missing index relates to, together with how the index usage has been identified in queries (such as equality/inequality).

sys.dm_db_missing_
index_group_stats

Contains details of how often the index would have been used, how it would be used (seek or scan), and a measure of its effectiveness.

sys.dm_db
_missing_index_groups

This is a linking DMV,
linking the previous two DMVs together.

Here the missing index gives the calculated columns name called "Total Cost". This uses a combination of the impact of the index, its cost, together with a measure of the number and type of index accesses to calculate its importance.

Finding Important Missing Index

It is not a good idea to blindly implement the suggested missing indexes, since indexes have a cost associated with them.

When data is inserted into a table, data may also need to be added to any associated indexes, this can increase the query duration. That said, some indexes may improve the performance of updates/deletes since these queries often have a WHERE clause. We can use a DMVs or the Statistics Time (or IO) command to measure the impact of adding an index.

Moving Index in Separate Filegroup

By default the index are stored in a same file group as the base table on which the index is created. It is important that if we have a very large table with index. To improve the performance of the query we have to move the index in a separate file group (Better in different Drive/Hard disk) to improve I/O performance.

But we can do the following:

1. 1. Create nonclustered indexes on a filegroup other than the filegroup of the base table.

2. Partition clustered and nonclustered indexes to span multiple filegroups.

3 Move a table from one filegroup to another by dropping the clustered index and specifying a new filegroup or partition scheme in the MOVE TO clause of the DROP INDEX statement or by using the CREATE INDEX statement with the DROP_EXISTING clause.

Some points should be taken care of:

1. If a table has a clustered index, moving the clustered index to a new filegroup moves the table to that filegroup.

2. You cannot move indexes created using a UNIQUE or PRIMARY KEY constraint using Management Studio. To move these indexes use the CREATE INDEX statement with the (DROP_EXISTING=ON) option in Transact-SQL.

The scripts gives us an idea, how to move the index in different file group.

USE my_db;

GO

/*

Creates the TranFG filegroup on the my_db database

*/

ALTER DATABASE my_db

ADD FILEGROUP TranFG;

GO

/*

Adds the TranFGData file to the TranFG filegroup. Please note that you

will have to change the

filename parameter in this statement to execute it without errors.

*/

ALTER DATABASE my_db

ADD FILE

(

NAME = TranFGData,

FILENAME = 'C:\Program Files\Microsoft SQL

Server\MSSQL11\MSSQL\DATA\TranFGData.ndf',

SIZE = 5MB,

MAXSIZE = 100MB,

FILEGROWTH = 5MB

)

TO FILEGROUP TranFG;

GO

/*

Creates the IX_Employee_OrganizationLevel_OrganizationNode index

on the TransactionsPS1 filegroup and drops the original

IX_Employee_OrganizationLevel_OrganizationNode index.

*/

CREATE NONCLUSTERED INDEX IX_Cust

ON My_Cust (CustId)

WITH (DROP_EXISTING = ON)

ON TranFG;

GO

Hope you like it.

Covered Index

In this article we have to learn the covered index. As the name suggest the cover index "COVER" all the columns in the select statements. The indexes are best when they are very small. I think integer data type is a very good data type for index key.

From SQL Server 2005 this functionality is added, nonclustered index is extended by adding non-key columns to the leaf level of the non key columns. In addition, SQL Server can use these entries in the index's leaf level to perform aggregate calculations.

The Database engine not considers the non key columns when calculating number of index key or the size of the index.

Generally the index takes the 16 key columns and size about 900 bytes. But by adding non key columns we can add more than that and also can add the data types that are not supported by index key.

But don't be so happy that you must include all of your columns in covered index. Avoid including unnecessary columns. While covering index boost the retrieval of data they can slow down INSERT, UPDATE and DELETE operations. As because the extra work is needed to maintain the cover index.

The example of covered index is mentioned bellow:

USE my_DB

GO

CREATE INDEX IX_Cust_ID

ON mst_Cust (custID)

INCLUDE (CustName, CustAdd)

GO

Everything has some good and bad effects. In this case I recommended not using the covered index if necessary. Not especially any transaction table where we do the frequent data modification and insertion.

Hope you like it.

INDEX Scan/Seek

One of my friends ask me about index scan/seek and when it happened. This article give an idea related to index scan and index seek.

Index scan

Index scan means the SQL server reads all rows on the table and returns those rows that satisfy the search criteria. All the rows of the left leaf of the index are scanned. That means all the rows of the index is searched instead of table directly. Please don't confuse with table scan. The time this takes is proportional to the size of the index.Generally the index scan is occurred when the query not finding a suitable index for a particular columns of the table.

When it used?

It is preferable when the table is very small and using index is over head

When a large percentage of the records match with searched criteria (10-15%).

Index seek

The SQL server uses the B-Tree structure of index to seek directly to the matching records. Time taken is only proportional to the number of matching records.

When it used?

Index seek is preferred when the number of matching records is proportionately much lower than the total number of records (greeter then the 50%).

Here is a general example where Index scan is used.

The Architecture

Table-A Contains 5 columns (Col-A, Col-B, Col-C, Col-D, Col-E)

The Index named Index-1 is activated on Table-A Col-A and Col-E

The Query

A SQL Query is fired with

WHEN Col-A='xxxx' AND Col-C='yyyyy'

The Output

As because the Index named Index-1 is on COL-A and COL-C and in the SQL statement we used COL-A and COL-C, it generate the Index Scan not Index Seek

SQL Server 2008 introduces a new hint, the FORCESEEK hint that allows to "kindly"

suggest Query Optimizer to use a seek operation instead of a scan.

.

Example:

SELECT *

FROM Table-A AS h

INNER JOIN Table-B AS d WITH (FORCESEEK)

ON h.ID = d.ID

WHERE h.DueAmt > 100

AND (d.OrderQty > 5 OR d.LineTotal < 1000);

Hope the article is quite informative and thanking you to provide your valuable time on it.

INCLUDE clause WITH NON CLUSTERED INDEX

This article is dedicated to one of my colleague, who want to know about the INCLUDE clause of NON CLUSTERED Index. I try to cover all the important points related to it in short.

In SQL Server 2005, the functionality of non-clustered indexes is extended by adding non-key columns to the leaf level of the non-clustered index.

By including non key columns we create non clustered index that covers more query and increase performance.

The features of Adding non key columns are mentioned bellow:

1. Index has the limitation of 16 key columns and maximum index key size of 900 bytes. But if we include non key columns the index not care about the index key size and 123 includes non key columns is allowed.
So a non clustered index contains:16 key Columns + 123 Includes non Key columns.

2. Columns cannot be repeated in the include list and Dropping of non-key columns is possible after dropping the non-key index first.

To DROP Non-KEY Columns à DROP the NON-KEY INDEX First

3. The database Engine not considers any non-key columns to calculate the number of index key columns or the size of the index.

Size of Index Not Related to Non-Key Columns

4. The data types of non-key columns can be anything evens the data type not supported by the non clustered Index Key-columns.

5. You should keep those columns in "Include" clause which generally comes under "SELECT" clause and not being used much in "WHERE", "GROUPBY" or "ON" clause of "JOIN".

The syntax of creating NON CLUSTERED INDEX with INCLUDE clause is mentioned bellow.

CREATE NONCLUSTERED INDEX [Ind-1] ON [dbo].[tab_exmaple]

(

[sroll] ASC

)

INCLUDE ( [sSec],[sName])

Architecture of INCLUDE on NON CLUSTERED Index:

The architecture of "Included Column" concerns, whatever columns you have defined under "Include" clause under the index, those will be stored on the Leaf pages, and it won't get stored on the Root page or Intermediate page of the index.

Hope the article is quite interesting and thanking you to provide your valuable time on it.

What type of Index You Choose for Primary Key?

In one of my article, someone wants to know more about where we used the clustered index and where to non clustered index. First of all I am thanking you "Anonymous" for giving your valuable time to it.

In this article I am trying to explain where to use clustered and where not. I am trying to mention some facts that MS told us to focus on.

Clustered Indexes

Point-1

Clustered indexes are ideal for queries that select by a range of values
or where you need sorted results. This is because the data is already presorted
in the index for you. Examples of this include when you are using BETWEEN,
<, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in your
queries.

Point-2

Clustered indexes are good for queries that look up a record with a unique
value (such as an employee number) and when you need to retrieve most or
all of the data in the record. This is because the query is covered by the
index.

Point-3

Clustered indexes are good for queries that use the JOIN or GROUP BY clauses.

Point-4

Clustered indexes are good for queries where you want to return a lot of
columns. This is because the data is in the index and does
not have to be looked up elsewhere.

NON Clustered indexes

Avoid putting a clustered index on columns that increment, such as an identity,
date, or similarly incrementing columns, if your table is subject to a high
level of INSERTS. Since clustered indexes force the data to be physically
ordered, a clustered index on an incrementing column forces new data to be
inserted at the same page in the table, creating a table hot spot, which
can create disk I/O bottlenecks. Ideally, find another column or columns
to become your clustered index.

Hope that the information is informative and thanking you to provide your valuable time on it.

PRIMARY KEY and NON CLUSTERD INDEX

When we are thinking about the PRIMARY KEY, side by side we are thinking about a clustered index. As the PRIMARY KEY by default create a clustered index. It is good idea that primary key should be CLUSTERED INDEXED.

In this article, I am telling something in opposite direction. We have to identify that the columns or columns we are choosing for primary key is good for CLUTERED INDEX or NOT.

If you choose a COMPOSITE primary key then selection of CLUSTERED INDEX is a bad idea as it fragmented your database quickly. SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process.

So please understand that the primary key and the clustered index are different concepts. One thing is common that the primary key takes a clustered index when it created by default.

We can create a primary key without a clustered index and put the clustered index in different columns if needed.

Here is the example of NONCLUSTERED Primary Key

CREATE TABLE my_tab2

(ROLL INTEGER NOT NULL PRIMARY KEY NONCLUSTERED,

SNAME VARCHAR(50))

sp_helpindex my_tab2

--Output---

index_name: PK__my_tab2__44C28DB76831A297

index_description: nonclustered, unique, primary key located on PRIMARY

index_keys: ROLL

Hope the article is informative and thanking you to prove your valuable time on it

Rebuilding all indexes on a table and specifying options

If you are planning rebuild index to use this statement below , you must run update statistics statement after rebuild index. Statistics are always updated when you rebuild index. But STATISTICS_NORECOMPUTE=ON disable the auto update statistics from updating the specific statistics for an index (or column-level statistics)

-- Try to avoid this options

USE AdventureWorks2012;
GO
ALTER INDEX ALL ON Production.Product
REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON,
STATISTICS_NORECOMPUTE = ON);GO

UPDATE STATISTICS (Production.Product)

GO

(OR)

Alternatively you can use below options

-- By default STATISTICS_NORECOMPUTE = OFF

USE AdventureWorks2012;
GO
ALTER INDEX ALL ON Production.Product
REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON,
STATISTICS_NORECOMPUTE = OFF);

-- For single index options

ALTER INDEX[IDX_INDEX_NAME] ON [dbo].[Product] REBUILD WITH (STATISTICS_NORECOMPUTE=OFF)

Source: Microsoft needs to correct this page.

http://technet.microsoft.com/en-us/library/ms188388.aspx

How to find index usage against table

If you want to find out index usage for user table then sys.indexes and sys.dm_db_index_usage_stats that give you more details. You should look for user seeks counts on this result

Select OBJECT_NAME(i.object_id) AS table_name,

i.name AS index_name,i.type_desc,

ius.user_seeks,ius.last_user_seek

From sys.indexes i

Inner Join sys.dm_db_index_usage_stats ius

on i.object_id = ius.object_id

and i.index_id = ius.index_id

and ius.database_id = DB_ID()

where ius.object_id = OBJECT_ID('dbo.Customers')

go

Microsoft Technologies

Pages

Friday, July 26, 2013

Indexes in SQL Server

About Me