Tuna Helper - SQL Tuning for SQL Server - SQLer

1 downloads 66 Views 231KB Size Report
Many who undertake SQL tuning projects look for the magic bullet in the form of a software tool ..... me through a book named SQL Diagramming by Dan Tow.
WHITE PAPER

“Tuna Helper” Tuning SQL Statements: A Proven Process for SQL Server Environments

November 2009

Confio Software

www.confio.com

+1-303-938-8282

1. Introduction Many who undertake SQL tuning projects look for the magic bullet in the form of a software tool for help. While those tools can help tune simple SQL statements, tuning complex queries that contain sub-queries, outer joins, etc, provide challenges for those tools. I have used a variety of tools and have been less than impressed. If tools are less than adequate, how do we effectively and efficiently tune SQL statements and applications? Instead of relying on tools that do not work well, I suggest learning how to perform SQL tuning the old fashioned way and do it yourself. Too many people I cross paths with do not understand even the basics when it comes to SQL tuning, so the fundamental goal of this paper is to provide a process that will help you get started down the tuning path. Many people fail when it comes to tuning, so if you can become good at it, your image and worth to your company will undoubtedly rise. The reason most tools do not work, and the reason people fail at SQL tuning projects, is quite simple – it is difficult! Many people make their living from training people how to perform SQL tuning and those classes are usually 3-5 days in length. This paper is by no means going to make you an expert, but hopefully it will present a framework that can be used to become a better SQL tuner. Based on this framework you can start to build your own process. Becoming an expert SQL tuner takes time and practice, so if you are intimidated by the thought of undertaking tuning projects, roll up your sleeves and jump into it. You will never learn by sitting on the sidelines and doing the same old things. However, when beginning the project you will want to make sure you are not standing on an island by yourself. If you are a DBA, you most likely did not write the application or the SQL statements that are performing poorly. If you are a developer, you may not have a view into the production instances to see exactly which SQL statement is performing poorly. One of the biggest mistakes I see is DBAs or developers that work in isolation when tuning. It’s there nature – “I’m smart and I can do this by myself”, or possibly “I’m a recluse and I don’t want to deal with anyone else”. It is typically a better alternative to include other technical people that are familiar with the application. If the application is custom built by your company, work with the people that develop, support or design the application. Work with the business people to understand what they do in the application when the performance problem occurs. All of these groups are typically more than willing to help because they will benefit from a well tuned application.

2. Challenges There are many challenges you will face when undertaking a tuning project. I already said this, but it’s worth saying again – SQL tuning is difficult. To do it correctly, you and your team need to be very familiar with many different aspects of the application. From a technical standpoint, you need to understand execution plans, the way the database instance is executing the SQL and how the data is being accessed. You also need to be familiar with SQL design concepts because sometimes tuning the SQL means rewriting it. When you are working with the end users, understand how the application is used and why they do things. Why do you fill in those fields? Why do you use this screen? Understanding the purpose of the SQL and application will help you make better decisions down the road. •

Large Number of SQL Statements – typical applications contain thousands of SQL statements so how do you know for sure you are working on the right one? I will talk about this in more detail, but this is also where the end users can help you focus on the correct things. Instead of worrying about 100s or 1000s of SQL statements, worry about the ones that affect this user and this screen in this application they are complaining about. 2

Tuna Helper – A Proven Process for Tuning SQL November 2009

• •

• •

All Statements are Different - Just because you solved the last problem in 30 minutes by tuning a certain way does not mean the next project is that easy or can be tuned the same way. Lack of Priority - Some companies in general do not care about performance. As long as it gives the correct results, they do not seem to care as much about how long it takes to get the results. In this case, come companies will throw hardware at a problem instead of tuning a handful of SQL statements that may be causing a significant part of the problem. Indifference - some users get used to the way things work – “I always press this button the first thing in the morning and then go get coffee, because I know it takes an hour.”. Bad performance becomes a way of life. Never Ending Task - there always seems to be the next problem. Once you tune something and become good at it, other groups in your company will want their application tuned as well, but this is a good thing for you.

3. SQL Tuning Process Summary Working with many other customers and our Ignite for SQL Server product, I have developed a process that works very well for me. This does not mean it will work for you as is, but I believe it is a good starting point for anyone. The process centers around four main steps: 1. 2. 3. 4.

Identify – pick the correct SQL statement to tune and avoid wasting your time. Gather – gather the proper information that will help you make the best tuning decisions. Tune – tune the SQL statement based on gathered information. Monitor – ensure the SQL statement is tuned and stays tuned. Monitoring also helps you understand the exact benefits achieved. This step also helps you identify the next tuning project and starts the process over again.

4. Identify The Identify phase consists the following subphases that will be discussed below: 4.1 – Tune the correct SQL statement 4.2 – End to End View 4.3 – Wait Time View 4.4 – Simplication

4.1 Identify – Tune the Correct SQL Statement When you get back to the office and want to tune something, where do you start? Avoid the mistake of picking a SQL statement that looks interesting, although that could be valid for practicing. Have a method to choose the SQL. The SQL statement can come from a variety of sources: • discussion with users and understanding their complaints. • batch job that continues to run longer and longer. • the number one SQL in the database from a logical I/O (LIO), physical I/O (PIO) or most CPU used perspective. • An application that performs many table or index scans and from that you determine the top SQL statements affected by the problem. • known poorly performing statement that comes from someone else asking your help. Be careful, the top SQL statement from someone else may not be the one affecting your end users the most and you could end up tuning the wrong SQL statement. • Statements with the highest wait times or response time. When I mentioned ranking SQL statements by LIO, PIO or CPU above, that leaves out SQL statements suffering from locking problems or latching problems. Measuring performance via wait time techniques provides you with a list of SQL statements affecting your end users the most, no matter the problem they suffer from. 3

Tuna Helper – A Proven Process for Tuning SQL November 2009

4.2 Identify – End to End View When I mention End-to-End view, you may think of the performance of the application from the web browser or client application, through the application server and to the database. That is technically correct, but from a business perspective, you should also know the SQL and application end-to-end. I encourage you to understand detailed information about the SQL statement: • What is the business requirement for this SQL statement, i.e. what is its purpose? • Why does the business need to know this information? What is done with the information? • How often is this data needed? • Who consumes or uses the information? From a technical perspective, where does the poorly performing process spend its time? Understand this in detail because it helps everyone understand who can have the most impact on performance. In the example chart below, I list the processes that need to be tuned and create bar charts for the time spent at each layer of the application. If most of the time is spent at the application tier, most likely the application code is responsible and not the database. Even if you tuned the worst performing SQL statement for that process, you may not make much of a perceived difference if only 10% of the total time was spent in the database. However, if you see that most of the time is spent at the database tier, SQL statement tuning (or instance tuning which is outside the scope of this paper) would provide the most benefit. To get this type of information you will probably need a tool of some sort, but there are also ways to get this information via debugging, logging execution times to files, instrumenting code among many others.

Since this paper is about SQL statement tuning, let’s assume the problem is in the database tier. The next step is to determine the specific SQL statements that execute as part of this process. If there are three SQL statements, measure the time for each one of these to understand the 4

Tuna Helper – A Proven Process for Tuning SQL November 2009

specific statement to tune. If one SQL statement takes 90% of the time at the database tier, tune that one. If there are several statements that are very similar, you will need to tune more than one to make an impact on performance. Tracing or using Confio Ignite for SQL Server is a great way to get this information.

4.3 Identify – Database Wait Time View I already mentioned that I feel wait time information is very critical to successfully tune SQL statements. When SQL statements execute, SQL Server has instrumented their code to give information about where it is spending time. If a statement runs for 3 minutes, where did it spend the 3 minutes? This is where the sysprocesses table (and DMVs in SQL 2005 and above) gives those clues. You can also get this information from tracing, which is a great way to get detailed information about a problem. A detailed explanation of wait types and tracing are outside the scope of this paper, but I encourage you to understand those topics very well. For more information about wait types in SQL Server, here are good references: • SQL 2000 - http://support.microsoft.com/kb/822101 • SQL 2005 and above - http://msdn.microsoft.com/en-us/library/ms188754.aspx Speaking of wait time, here is a quick test. Which of these scenarios is worse? • SQL Statement 1 o executed 1000 times o made the end users wait for 10 minutes o waited 99% of the time on PAGEIOLATCH_SH (a wait type indicating PIO). • SQL Statement 2 o executed 1 time o made the end user wait for 10 minutes o waited 99% of it’s time on a locking problem This question is answered differently every time I give this presentation. Some think that locking problems are more severe and tend to escalate very quickly. Others think a SQL statement executing 1000 times doing a lot of disk I/O is bad as well. I think the answer is that both are equal because they each made the end user wait for 10 minutes. End users do not care what they wait for, only that it took 10 minutes to get their job done. SQL statement 2 may be harder to tune, because locking problems are typically application design issues, but both are equal in the eyes of the user.

4.4 Identify – Simplification So far we have identified something to work on and the SQL statements are not typically “select * from table1 where col1 = X”, they are usually much more difficult. They may be pages long, include sub-queries, outer joins and all sorts of things. I like to simplify the SQL and break it down into manageable pieces that can be more easily comprehended. Complex SQL statements may become 5 smaller SQL components that are simpler. Tune each of these separately. If views are used, get the definitions and tune the views separately as well. I will talk more about this later when we dive into execution plans – execution plans help you determine which part of the SQL is performing poorly. In other words, you do not typically need to worry about the entire SQL statement, only the portion that is performing poorly.

4.5 Gather – SQL Statement Metrics The next phase is to gather critical information and metrics about the SQL statement. These metrics should include the following: •

How long does the statement take now? 5

Tuna Helper – A Proven Process for Tuning SQL November 2009



What is acceptable to the end users? If they want the query to return in 10 seconds and the query is complex and reads a lot of data, you may have to stop the tuning project immediately, because you will not satisfy expectations.



Collect wait time information because all performance problems are not created equal. •

If you find a query waiting on locking/blocking wait events, you know that tuning the end user wait time is not necessarily about tuning the SQL statement. It could include tuning other SQL statements in the transaction that cause resources to be locked longer than needed. You should become very good friends with the developers and the people that designed the application.



If you find a statement waiting on physical I/O, e.g. any wait type that starts with PAGEIOLATCH, this also helps you get to the next step. Significant waits on these wait types are many times related to table or index scans. Tuning in this case is often accomplished by adding or modifying indexes.



If you have latch contention, you have to determine the exact latch causing the bottleneck and understand why it is so popular. Typically, SQL tuning is the answer.



If you see network waits, it is rarely a network problem. Is the statement returning a lot of data, either in the form of a lot of rows, or large columns like LOBs? On rare occasions there could be real network latency between the database and the client, and in this case, make friends with your network administrator.



Quite likely there will be multiple problems. There could be a table scan on Table1 and an inefficient index being used to access Table2 which is also causing latching problems.

If a query executes for 3 minutes, understand what makes up that 3 minutes from a wait time view, i.e. it waits 2:30 on PAGEIOLATCH_SH, 15 seconds on LATCH_SH, and 15 seconds on CPU (service time).

5. Gather The Gather phase consists the following sub-phases that will be discussed below: 5.1 – SQL Execution Plan 5.2 – Table and Index Information 5.3 – Entity Relationship Diagram (ERD)

5.1 Gather – SQL Execution Plan Once you have broken the query down you need to understand how each component behaves. This is where executions plans help supply costing information, data access paths, join operations and many other things. However, not all plans are the same. Do you know the “Estimated Plan” or “Actual Plan” from SQL Server Management Studio (SSMS) could be wrong and not match how SQL Server is really executing the statement? How can the plan from SSMS be wrong? There are many reasons including: 1. The “estimated plan” can be wrong because SSMS is just providing an estimate of how SQL server would execute the SQL statement. 2. The “actual plan” can also be wrong because it is typically executed in a completely different environment in SSMS and not from the application code and environment. The application may use specific settings that are not used in the SSMS environment and those settings can drastically affect SQL execution. 6

Tuna Helper – A Proven Process for Tuning SQL November 2009

There are many reasons the plan could be different so what should we do. There are ways to gather an execution plan that will be correct: •

DM_EXEC_QUERY_PLAN – contains plans of executed SQL statements. It provides the exact plan SQL Server used so why not go straight to the source.



Tracing – gives all sorts of great information as well as executions plans for SQL statements.



Historical Data – if you can, collect and save execution plan information so you can go back to a week ago and understand why the SQL statement started performed poorly. Plan changes are very commonly associated with SQL statements that suddenly start performing poorly.

An example query to gather information from the DM_EXEC_QUERY_PLAN is shown below. This statement finds the top five SQL statement based on average execution time (aka average response time) and retrieves the execution plans for that statement. If run from SSMS, the Query_Plan column will contain a link to a graphical plan. SELECT TOP 5 total_elapsed_time/execution_count AS [Avg Response Time], SQL_handle, statement_start_offset, statement_end_offset, Plan_handle, query_plan FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) ORDER BY total_elapsed_time/execution_count DESC;

Once you have an execution plan, determine how each of the SQL components is being executed. Based on wait time data, you should already have a feel whether table or index scans is your problem (waits on PAGEIOLATCH_*), locking (LCK_M_* waits) or something else. Reading execution plans are outside the scope of this document, but a few tips are: • Look for expensive steps. • Look for large arrows that represent large intermediate result sets. • Look for table scans or index scans which mean the entire object is being read.

5.2 Gather – Table and Index Statistics The next step is to gather information about each table being accessed inefficiently. These tables come from a review of the execution plan and are a result of finding the highest execution steps. There is no use gathering data for objects that are already being accessed efficiently. Confio has a script on the support site (http://support.confio.com/kb/1534) that will help with this process and will gather information such as: 1. Table sizes – a full table scan on a 20 row table is better than using an index even if it existed, so understand if you have small, medium, large or very large tables involved. 2. Existing indexes – get a list of all indexes that already exist on these tables. 3. Understand the selectivity or cardinality of columns contained in the WHERE clause. I will dicuss this in much more detail in the SQL Diagramming section below.

5.3 Gather – Entity Relationship Diagram Understand the relationships of the tables involved in the statement. The sample SQL statement we will tune in this paper is as follows and answers the question: “Who registered for the SQL Tuning class within the last day?”: SELECT s.fname, s.lname, r.signup_date FROM student s INNER JOIN registration r ON s.student_id = r.student_id INNER JOIN class c ON r.class_id = c.class_id 7

Tuna Helper – A Proven Process for Tuning SQL November 2009

WHERE AND AND

c.name = 'SQL TUNING' r.signup_date BETWEEN DATEADD(day, -1, current_timestamp) AND current_timestamp r.cancelled = 'N'

The ERD for this query is very simple and restricts the universe to the objects the query is accessing. If the statement only accesses 3 tables, do not review a full ERD containing 100s or 1000s of tables. Only review the relationships of the three tables. You will be surprised by the number of times you can find mistakes in the SQL statement by reviewing the ERD.

CLASS class_id name class_level …

REGISTRATION class_id student_id signup_date cancelled …

STUDENT student_id fname lname …

6. Tune In this section we will tune the sample SQL statement. The tuning phase consists of the following sub-phases: 6.1 – Review Execution Plan 6.2 – Create a SQL Diagram 6.3 – Analyze the SQL Diagram

6.1 Tune – Review Execution Plan The execution plan was retrieved from SSMS for our sample SQL statement and is given below. Notice a nice thing done by SSMS is to recommend an index. I will come back to this at the end of the tuning section and help you understand why it is not a nice thing at all.

When reviewing the plan, read it from right to left – the steps on the right are the first to execute. In the example above, the following occurs: 8

Tuna Helper – A Proven Process for Tuning SQL November 2009

• •

• • • •

A full index scan is done against the CLASS table using the CL_PK index. A full index scan is done against the REGISTRATION table using the REG_PK index. Since the bar between here and the Hash Match step is thicker than the one from CLASS, this tells you there are more rows involved. Also, the relative cost of this step is 62% of the entire process. A hash join is done to merge the results from CLASS and REGISTRATION which is 28% of the total. This is the second most expensive step probably because it has to join a lot of rows from the REGISTRATION table. An Index Seek is done to retrieve rows form the STUDENT table via the STUD_PK index. The result from the STUDENT table and the Hash Match step are merged using Nested Loops The final result is given back to the SELECT statement.

As mentioned, focus on the expensive steps towards the right side of the diagram and try to understand what is happening. Mousing over the graphical plan will provide more information about each step. Mousing over the arrows will provide information about the number of rows being shipped between steps as well. In this case, why is the query performing an inefficient index scan on the REGISTRATION and CLASS tables?

6.2 Tune – Create the SQL Diagram I would like to introduce a technique that I use to tune SQL statements correctly without the trial and error that I was often faced with before. The concept of SQL Diagramming was introduced to me through a book named SQL Diagramming by Dan Tow. For a complete understanding of this topic I encourage you to buy the book and get more information from http://www.singingsql.com. For our purposes, I will explain the basics of the following SQL diagram:

registration

student

.04

class

.002

The diagram looks a lot like an ERD diagram. To build it: • Start with any table in the FROM clause and put it on paper (the easiest way to draw these diagrams is by hand). In my case I started with the first table in the FROM clause named STUDENT. • Take the next table, which in my case is REGISTRATION and place it either above or below the existing tables based on the relationship. Since the REGISTRATION relates to the STUDENT table uniquely, i.e. one row in REGISTRATION points to one row in the STUDENT table, I put it above and draw and arrow downwards. • Take the next table, CLASS, and I know that one row in REGISTRATION points to one row in CLASS so I put it below REGISTRATION with another downward pointing arrow. The next step is to further understand the criteria used in the query to limit the number of rows from each of the tables. The first criteria I will explore is anything in the WHERE clause to limit the rows from REGISTRATION. This criteria is:

9

Tuna Helper – A Proven Process for Tuning SQL November 2009

r.signup_date BETWEEN DATEADD(day, -1, current_timestamp) AND current_timestamp AND r.cancelled = 'N'

To understand the selectivity of the criteria, run a query against the table and use the same criteria: select count(*) from registration r WHERE r.signup_date BETWEEN DATEADD(day, -1, current_timestamp) AND current_timestamp AND r.cancelled = 'N'

Results – 3,562 / 80,000 (total rows in REGISTRATION) = 0.0445 = 4.4% selectivity Use this number on the diagram next to the REGISTRATION table and underline it (easier to see the number in the next step). This represents the selectivity of the criteria against that table. The next table to review in the same manner is the CLASS table. The query I would run becomes: SELECT count(1) FROM class WHERE name = 'SQL TUNING'

Results – 2 / 1000 (total rows in CLASS) = 0.002 = 0.2% selectivity Add this to the diagram next to the CLASS table. The next table to explore is the STUDENT table, but there are no direct criteria against this table, so no underlined number next to that table. The diagram has now been completed for our example, however, Dan Tow goes into more detail than I will include for now.

6.3 Tune – Analyzing the SQL Diagram Analyzing the SQL Diagram begins by looking for the smallest, underlined number. In our case it is 0.002 next to the CLASS table. To limit the number of rows the query needs to process, starting here will trim our result sets the soonest. In other words, if we can make SQL Server start with this table the query can be executed optimally. I will leave it to Dan Tow to prove this fact. We know that we want SQL Server to hit the CLASS table first, but how do we do that? The easiest way is to ensure an index exists on our criteria. In this case that means the NAME column of the CLASS tables. In our case, an index does not currently exist, so let’s create it and see what our new execution plan is: create index cl_name on class(name)

10

Tuna Helper – A Proven Process for Tuning SQL November 2009

The query executed in 20 seconds (which is a 90% improvement from 200 seconds) and the plan has changed. The CL_NAME index is now used to find the two rows in the CLASS table with a NAME=’SQL TUNING’, which is far better than performing an Index Scan. We achieved partial results but an Index Scan is still done against the REGISTRATION table. Why would that be? When we review the indexes that exist on that REGISTRATION we find one index named REG_PK that contains the STUDENT_ID and CLASS_ID columns in that order. Since our query is hitting the CLASS table first it cannot use this index that starts with STUDENT_ID so it must perform a full Index Scan. This index could be modified to switch the columns around, but that could affect many other queries and may require significant testing. In this case, and since I know I need an index with a leading edge of CLASS_ID, I created an index on just that column: create index reg_alt on registration (class_id)

This query executes in 3 seconds so we are making significant progress. An Index Seek is being done against the CLASS and REGISTRATION table which is good. However, something new has shown up and that is a Key Lookup against REGISTRATION which is responsible for 93% of the total execution time. Remember that our criteria against the REGISTRATION table also included the CANCELLED and SIGNUP_DATE columns so SQL Server has to go back to the table to do further filtering. An easy way to avoid this step is to “include” those columns in the index: 11

Tuna Helper – A Proven Process for Tuning SQL November 2009

create index reg_alt on registration(class_id) include (signup_date, cancelled)

The query now takes 1.8 seconds to execute and the plan looks good. There may be ways to squeak a little better performance, but an execution time of 1.8 seconds is good enough for us at this point. Remember the index that SSMS suggested? Let’s review its performance: create index reg_can on registration(cancelled, signup_date) include (class_id, student_id)

The plan looks good in the sense that Index Seeks are being used against all tables. However, did you notice the thick arrow from REGISTRATION to the Hash Join step? Based on our queries before, we know that the CANCELLED and SIGNUP_DATE criteria would retrieve 3,562 rows instead of just a handful when we hit the CLASS table first. This is a great example of not letting tools perform SQL tuning for you. While the index did reduce the overall query execution time from 1:30 to 8 seconds, there was a better option and SQL diagramming provided it for us very quickly.

12

Tuna Helper – A Proven Process for Tuning SQL November 2009

7. Monitor The final step is to monitor the results. Even with very good testing, sometimes you will find that what worked in the test database does not work in production. Be sure to monitor the results when your end users begin using the newly tuned statement. Collect all metrics again and understand how quickly it executes, what it waits on, etc. Document the improvements to show the success of the tuning project. Monitoring is also important because it is the start of the next tuning opportunity. Tuning is iterative, so now go after the next project.

About the author Dean Richards, Senior DBA, Confio Software Dean Richards has over 20 years of performance tuning, implementation and strategic database architecting experience. Before coming to Confio, Dean was a technical director for Oracle Corporation managing technical aspects of key accounts including short and long-term technical planning and strategic alliances. Specifically, Dean has focused his entire career on performance tuning of Oracle and SQL Server environments. As a highly successful liaison between management and technical staff, Dean has proven to be an effective collaborator implementing cuttingedge solutions. About Confio Software Confio Software develops performance management solutions for SQL Server, Oracle, DB2, and Sybase database environments. Confio Ignite PI, which applies business intelligence analysis to IT operations, improves service levels and reduces costs for database and application infrastructure. The Confio Igniter Suite PI is an open, multi-vendor, agentless monitoring solution that allows DBAs and management the ability to detect problems, analyze trends, and resolve bottlenecks impacting database response time. Built on an industry best-practice Wait-Time methodology, Confio’s Igniter™ Suite improves service levels for IT end-users and reduces total cost of operating IT infrastructure. Confio Software products today are used by customers in North America, Europe, South America, Africa and Asia whose mission includes getting most value out of their business critical IT systems. For more detailed information about Confio, e-mail us at [email protected], telephone us at 1.303.938.8282 or see us on the web at http://www.confio.com.

Confio Software Boulder, Colorado, USA (303) 938-8282 [email protected] www.confio.com

13

Tuna Helper – A Proven Process for Tuning SQL November 2009