How often have we not encountered this situation as a DBA.
Troubleshooting performance problems is often the critical challenge a DBA has to face.
When you see High CPU condition in your environment, you would need a to-do action checklist to bring everything to normal as soon as possible. If the high CPU condition is passed and everything is normal then you would need to investigate what caused it and how to avoid in future.
In this post I will try to cover the troubleshooting approach for a high CPU condition in SQL Server environment.
Below is a simple flowchart i would like to follow in such a situation:
Step-1: Identify SQL Instance
Determine if high CPU condition is caused by SQL or Windows. If it is SQL and your server is hosting multiple SQL Instances identify the instance responsible for high CPU.
Open task manager –> goto Performance tab –> click on Open Resource Monitor –> click on CPU tab and click on CPU column to arrange in descending order.
Now check the box against sqlservr.exe process and click on Services below to identify the SQL instance
Step-2: Identify SQL Kernel or User
Open Performance Monitor and add the below counters:
- Processor: % Privileged Time – Percentage of time processor spends on execution of Microsoft Windows kernel commands such as OS activity. (If more than 30% involve Windows Admins)
- Process (sqlservr): % Privileged Time – the sum of processor time on each processor for all threads of the process (SQL Kernel)
- Processor: % User Time – percentage of time the processor spends on executing user processes such as SQL Server. This includes I/O requests from SQL Server
Step-3: Caused by SQL Kernel
Divide SQL Server %Privileged Time by number of logical processors to understand if the usage is due to SQL Kernal. If this value is more than 30% involve the Windows Admins. (Process (sqlservr): % Privileged Time / No of logical cpus)
Potential causes could be antivirus faulty drivers or encryption services.
Step-4: Caused by User
SQL Background Threads:
- Run the below query to find if SQL threads are taking up CPU
SELECT * FROM sys.sysprocesses
WHERE cmd like 'LAZY WRITER' or cmd like '%Ghost%' or cmd like 'RESOURCE MONITOR'
Ghost clean up thread – Check if the user deleted large number of rows
Lazy Writer thread – Check if any memory pressure on the server
Resource Monitor thread – Check if any memory pressure on the server
- Use SP_WHO3 ( from Brenozar’s tool kit) identify the SPID and check with user executing the query to KILL the process. (You can achieve the same using Adam Mechanic’s SP_WHOISACTIVE)
- If the CPU condition is stable now and you need to find the root cause then run the below query to find the top CPU intensive queries.
SELECT GETDATE() AS "RunTime", st.text AS batch, SUBSTRING(st.text,statement_start_offset / 2+1
,((CASE WHEN a.statement_end_offset = -1
THEN (LEN(CONVERT(nvarchar(max),st.text)) * 2)
ELSE a.statement_end_offset END) - a.statement_start_offset) / 2+1) AS current_statement
, qp.query_plan, a.*
FROM sys.dm_exec_requests a CROSS APPLY sys.dm_exec_sql_text(a.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(a.plan_handle) AS qp
ORDER BY CPU_time DESC
- Open execution plan and check for operators with high cost
- Check indexes being used and number of rows estimated
- Index Scans / multiple scans
- Operators with high very high rowcount
- Make sure statistics are updated on the referenced tables.
- Check for any Index recommendations in the execution plan
- Check for Implicit Conversions in the execution plan tool tips as below. Implicit Conversion can cause HIGH CPU condition
Check the following configurations are according to Microsoft Best Practice recommendations:
- Affinity Mask
- Cost Threshold for Parallelism
- Max Degree of Parallelism (https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-confi )
- Max Worker Threads
- Optimize for ad-hoc workloads
- Priority boost
High Compilations and re-compilations:
These can be monitored using performance monitor and needs to be measured related to batch requests/sec counter.
Usually Compiles should be less than 10% of batch request and re-compiles should be less than 10% of compilations. If re-compilations are high check for :
- Any Schema changes
- Statistics change
- SET options changes in batch
- Temporary table changes
- Use of Query Hints in procedures like RECOMPILE or OPTION (RECOMPILE)