Checking Index Fragmentation

October 18, 2011

If your not monitoring your indexes on a regular basis shame on you, you should be. Depending on how many databases you are supporting this could end up being extremely tiedious. So here is what I do and you can tailor the process to your env.

1) Each production server I support has a DBA database on it.

2) Create the table IndexStats in each of the DBA databases (scripts below)

3) Create a job with the script below to capture the index statisics . (I do it once a week)

4) Now here it’s up to you on how you want to do this. I have a huge number of servers that I support so I transfer the statistics from each individual server to a central repository. I use an SSIS package which connects to the table in each database and dumps it into a table of the same format on my repository. The SSIS package uses Simple Data flows to do this and is pretty straight forward. I’m not going to go into the package in this post. If you need assistance on creating data flows in SSIS there are plenty of resources out there.

5) Create the report stored procedure to email you indexes which need to be rebuilt and schedule it based on your needs. I run it once a week then rebuild the indexes that weekend on the server in questions.

Script 1 Create Table to hold your statistics in your DBA database.

CREATE TABLE [dbo].[IndexStats](
[SQLInstance] [varchar](160) NULL,
[DatabaseName] [varchar](160) NULL,
[TableName] [varchar](160) NULL,
[IndexName] [varchar](160) NULL,
[IndexType] [varchar](160) NULL,
[Databaseid] [smallint] NULL,
[object_id] [int] NULL,
[index_id] [int] NULL,
[user_seeks] [bigint] NULL,
[user_scans] [bigint] NULL,
[user_lookups] [bigint] NULL,
[user_updates] [bigint] NULL,
[last_user_seek] [datetime] NULL,
[last_user_scan] [datetime] NULL,
[last_user_lookup] [datetime] NULL,
[last_user_update] [datetime] NULL,
[system_seeks] [bigint] NULL,
[system_scans] [bigint] NULL,
[system_lookups] [bigint] NULL,
[system_updates] [bigint] NULL,
[last_system_seek] [datetime] NULL,
[last_system_scan] [datetime] NULL,
[last_system_lookup] [datetime] NULL,
[last_system_update] [datetime] NULL,
[rows] [int] NULL,
[StatisticsCaptured] [date] NULL,
[avg_fragmentation_in_percent] [float] NULL,
[page_count] [int] NULL,
[Index_Depth] [int] NULL,
[Index_level] [int] NULL,
[Avg_record_Size_in_bytes] [int] NULL
) ON [PRIMARY]GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[IndexStats] ADD DEFAULT (getdate()) FOR [StatisticsCaptured]
GO

Script 2 Schedule this script from SQL Agent to populate your IndexStats Table. I delete data from this table since my data is moved into a central repository but if you are not doing that then you can remove the delete part of the script

insert into MyDBADB..IndexStats
exec
sp_msforeachdb
‘use ?
DECLARE @dbid int
select @dbid = db_id(”?”)
select
DISTINCT
@@SERVERNAME as SQLInstance,
”?” as DB,
OBJECT_NAME(a.object_id) as [TableName],
B.name as [Index],
B.type_desc as [IndexType],
A.*,
p.rows,
getdate(),
OS.avg_fragmentation_in_percent,
OS.page_count,
OS.Index_Depth,
OS.Index_level,
OS.Avg_record_Size_in_bytes
FROM ?.sys.dm_db_index_usage_stats a
INNER JOIN ?.sys.indexes B on a.object_id = B.object_id and a.index_id = B.index_id
inner join ?.sys.partitions p on B.object_id =p.object_id and B.index_id =p.index_id
INNER JOIN ?.sys.dm_db_index_physical_stats(@dbid,NULL,NULL,NULL,”SAMPLED”) OS on
A.object_id = OS.Object_id and A.index_id =OS.Index_id
WHERE
OBJECT_NAME(a.object_id) not like ”sys%” and OBJECT_NAME(a.object_id) not like ”spt%”
and OBJECT_NAME(a.object_id) not like ”queue%”
and a.database_id = DB_ID(”?”)
order by B.namedelete from MyDBADB..IndexStats
where StatisticsCaptured > GETDATE() -14

Script 3 Create the following stored procedure on your repository and schedule it to run as you see fit. This will alert you on the indexes with Fragmentation percent greater then 50.

/******************************************************************************************************
* Created By: MotherofAllGeeks.Com
* Created on: 3/2011
* Purpose gather space information for trending or possibly alerting
* Create it in master and it will gather stats for all databases
* Sample Call proc_getdatabasespace_r
* Prereq is that you must have a DBADB and you must have created FileStats table in MyDBADb.
* exec [proc_GetFragmentationReport_r] 7,’Mom@motherofallGeeks.com’, ‘MyMailProfile’
******************************************************************************************************/
CREATE PROCEDURE [dbo].[proc_GetFragmentationReport_r]
@days int ,
@recipients varchar(255),
@profile varchar(255)

AS

BEGIN
SET NOCOUNT ON

BEGIN TRY

BEGIN
DECLARE @tableHTML NVARCHAR(MAX), @currentDate date, @usagepercentage int

SET @tableHTML =
N’

‘+ N’

‘+N’Database’+N’CountofFragmenedIndexes’+N’

‘ +CAST ((SELECTtd = convert(varchar(30),sqlinstance) ,” ,td = convert(varchar(60),databasename),” ,td = convert(varchar(24),counttables) ,”FROM (select sqlinstance,databasename,count(tablename) counttablesfrom indexstatswhere avg_fragmentation_in_percent > 50and StatisticsCaptured > GETDATE() – @daysgroup by sqlinstance, databasename) Aorder by counttables descFOR XML PATH(‘tr’), TYPE) as NVARCHAR(max) ) + N’

SQLInstance

IF (@tableHTML is not null)
BEGIN

declare @subject varchar(55)
SELECT @subject = ‘Fragmented Indexes ‘
exec msdb.dbo.sp_send_dbmail
@profile_name = @profile,
@recipients = @recipients,
@subject = @subject,
@body = @tableHTML,
@body_format = ‘HTML’
END
END

END TRY

BEGIN CATCH

DECLARE @ErrorMessage varchar(max)

SELECT @ErrorMessage =
‘ErrorNumber:’ + convert(varchar(7),ERROR_NUMBER()) + ‘
‘ +
‘ErrorSeverity:’ + convert(varchar(7),ERROR_SEVERITY()) +’
‘+
‘ErrorState:’ + convert(varchar(7),ERROR_STATE()) +’
‘+
‘Calling Procedure:’ + CONVERT(VARCHAR(25), OBJECT_NAME(@@PROCID))+ ‘
‘+
‘ErrorProcedure:’+ ERROR_PROCEDURE() +’
‘+
‘ErrorLine:’ + convert(varchar(7),ERROR_LINE()) +’
‘+
‘ErrorMessage:’ + ERROR_MESSAGE() +’


RAISERROR(@ErrorMessage,16,1)

END CATCH

END

GO


Finding Potential Bottlenecks with DMVs.

July 28, 2010

The following query is a great way to get a look into how your stored procedure driven application is performing. I wouldn’t recommend running this during your peak hours. What I have done is scheduled it to run in the off hours and store into a table in my dummy DB. I then query that table as it is less intensive then querying the DMVs. What this query shows me is area’s a I need to tune within my application. If a query is executed many times and has a high number of logical reads perhaps my index needs to be tuned. Same goes if I look into area’s where physical data processing is happening… If your data isn’t being found in the cache you may have a bottleneck or perhaps you are just pulling to much data. It’s also great to see which queries are executed most frequently as there is always room for improvement and it’s important to understand how your data is being accessed.

SELECT
     SUBSTRING(text,PATINDEX(‘%create %’,text),50),
     SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
          ((CASE statement_end_offset
          WHEN -1 THEN DATALENGTH(st.text)
          ELSE qs.statement_end_offset
          END
          – qs.statement_start_offset)/2) + 1) as statement_text,
     db_name(dbid) DB,
     creation_time,
     last_execution_time,
     execution_count,
     (total_elapsed_time/execution_count)/1000000 as time
     ,total_physical_reads
     ,last_physical_reads
     ,min_physical_reads
     ,max_physical_reads
     ,total_logical_writes
     ,last_logical_writes
     ,min_logical_writes
     ,max_logical_writes
     ,total_logical_reads
     ,last_logical_reads
     ,min_logical_reads
     ,max_logical_reads
     ,getdate()
          FROM sys.dm_exec_query_stats as qs
          CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
     where db_name(dbid) =’YourDB’


Table Scan

June 24, 2010

When reviewing execution plans I often look for table scans as an indicator of an area which performance can be improved. A table scan or a clustered index scan is when the query optimizer has to search the data entirely to find the row you are requesting.  The reason this can be costly is when you have a scan on a large table  SQL has to load all the data into the buffer this will slow performance and tax your resources. On smaller tables SQL Server may determine that a table scan is faster then an index seek, it can keep the entire data set in the buffer without issue.  A table scan would be caused by a query with no where clause. SELECT COLUMN1 from TABLE1. It can also be caused by a query with a WHERE clause which is not on the clustered index. SELECT COLUMN1 from TABLE1 where NOTPKEY=5.


What does a query plan show you?

June 18, 2010

I like to describe a query plan as a road map to your data. Your query plan shows you the steps the SQL Server engine is taking to fetch your data. Typical things you want to avoid in query plans are table scans, hash matching, bookmark lookups, excessive looping, spooling and excessive compute scalar operators.
To view a query plan you can use several set statements, for example: SET SHOWPLAN_TEXT or XML on, SET STATISTICS IO ON.
You can also see the plan in graphical format by using the SSMS from the query drop down menu. You can see even more information by selecting properties when you have a plan step highlighted.

When you write TSQL you want to get your data in the most effecient way possible. It is good practive to become familar with what the plan looks like and how even small things can have a large impact on performance.

Lets start with compute scalar for todays post. When you see a compute scalar step in your query plan SQL Server is doing an implicit conversion on a variable or column. Imagine if you are selecting 500,000 rows and for each row SQL Server is doing a conversion. How much over head do you think that will add? Just today I was tuning a piece of code. This code populated a table variable from a string of comma seperated values which were seperated by calling a function. The string passed to the function was a varchar, however the function was expecting a nvarchar and thus SQL was performing an implicit conversion on the string variable.  I changed the function to expect a varchar and the execution time of my stored procedure was reduced by 15 seconds. The number of rows I was processing was 32000.  A simple yet costly mistake.  Always make sure your variable declaration have the same data type as the data they will be holding, the same should be applied to parameters for stored procedures and functions.

In our next post we will discuss table scans.


Follow

Get every new post delivered to your Inbox.