• As a common practice, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases, such as an identity column or some other column where the value is unique. In many cases, the primary key is the ideal column for a clustered index.
• Indexes should be measured on all columns that are frequently used in WHERE, ORDER BY, GROUP BY, TOP and DISTINCT clauses.
• Do not automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.
• For historical (static) tables, create the indexes with a FILLFACTOR and a PAD_INDEX of 100 to ensure there is no wasted space. This reduces disk I/O, helping to boost overall performance.
• Queries that return a single row are just as fast using a non-clustered index as a clustered index.
• Queries that return a range of rows are just as fast using a clustered index as a non-clustered index.
• Do not add more indexes on your OLTP tables to minimize the overhead that occurs with indexes during data modifications.
• Do not add the same index more than once on a table with different names.
• Drop all those indexes that are not used by the Query Optimizer, generally. You probably won't want to add an index to a table under the following conditions:
If the index is not used by the query optimizer. Use the Query Analyzer's "Show Execution Plan" option to see if your queries against a particular table use an index or not.
If the table is small, most likely indexes will not be used.
If the column(s) to be indexed are very wide.
If the column(s) are defined as TEXT, NTEXT or IMAGE data types.
If the table is rarely queried but insertion, updating is frequent.
• To provide up-to-date statistics, the query optimizer needs to make smart query optimization decisions. You will generally want to leave the "Auto Update Statistics" database option on. This helps to ensure that the optimizer statistics are valid, ensuring that queries are properly optimized when they are run.
• Keep the "width" of your indexes as narrow as possible. This reduces the size of the index and reduces the number of disk I/O reads required to read the index.
• If possible, try to create indexes on columns that have integer values instead of characters. Integer values use less overhead than character values.
• If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index. If the columns used for the joins are not naturally compact, then consider adding surrogate keys to the tables that are compact in order to reduce the size of the keys. This will decrease I/O during the join process, which increases overall performance.
• When creating indexes, try to make them unique indexes if at all possible. SQL Server can often search through a unique index faster than a non-unique index. This is because, in a unique index, each row is unique and once the needed record is found, SQL Server doesn't have to look any further.
• If a particular query against a table is run infrequently and the addition of an index greatly speeds the performance of the query, but the performance of INSERTS, UPDATES and DELETES is negatively affected by the addition of the index, consider creating the index for the table for the duration of when the query is run and then dropping the index. An example of this is when monthly reports are run at the end of the month on an OLTP application.
• Avoid using FLOAT or REAL data types as primary keys, as they add unnecessary overhead that can hurt performance.
• If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:
Of the search criteria in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
If at least one of the search criteria in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.
• The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or do not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has an index.
• If you have a query that uses ORs and it is not making the best use of indexes, consider rewriting it as a UNION and then testing performance. Only through testing can you be sure that one version of your query will be faster than another.
• If you use the SOUNDEX function against a table column in a WHERE clause, the Query Optimizer will ignore any available indexes and perform a table scan.
• Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:
Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or the GROUP BY clauses.
Include a clustered index on the columns in the GROUP BY clause.
Include a clustered index on the columns found in the SELECT clause.
Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often.
• Avoid clustered indexes on columns that are already "covered" by non-clustered indexes. A clustered index on a column that is already "covered" is redundant. Use the clustered index for columns that can better make use of it.
• Ideally a clustered index should be based on a single column (not multiple columns) that are as narrow as possible. This not only reduces the clustered index's physical size, it also reduces the physical size of non-clustered indexes and boosts SQL Server's overall performance.
• When you create a clustered index, try to create it as a unique clustered index, not a non-unique clustered index.
• SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should be included in every stored procedure, trigger, etc. that you write.
• Keep Transact-SQL transactions as short as possible within a stored procedure. This helps to reduce the number of locks, helping to speed up the overall performance of your SQL Server application.
• If you are creating a stored procedure to run in a database other than the Master database, don't use the prefix sp_ in its name. This special prefix is reserved for system stored procedures. Although using this prefix will not prevent a user defined stored procedure from working, what it can do is to slow down its execution ever so slightly.
• Before you are done with your stored procedure code, review it for any unused code, parameters or variables that you may have forgotten to remove while you were making changes and remove them. Unused code just adds unnecessary bloat to your stored procedures, although it will not necessarily negatively affect performance of the stored procedure.
• For best performance, all objects that are called within the same stored procedure should be owned by the same object owner or schema, preferably dbo, and should also be referred to in the format of object_owner.object_name or schema_owner.object_name.
• When you need to execute a string of Transact-SQL, you should use the sp_executesql stored procedure instead of the EXECUTE statement.
• If you use input parameters in your stored procedures, you should validate all of them at the beginning of your stored procedure. This way, if there is a validation problem and the client application needs to be notified of the problem, it happens before any stored procedure processing takes place, preventing wasted effort and boosting performance.
• When calling a stored procedure from your application, it is important that you call it using its qualified name, for example:
exec dbo.myProc
...instead of:
exec myProc
• If you think a stored procedure will return only a single value and not a record set, consider returning the single value as an output parameter.
• Use stored procedures instead of views. They offer better performance.
• Don't include code, variable or parameters that don't do anything.
• Don't be afraid to make broad-minded use of in-line and block comments in your Transact-SQL code. They will not affect the performance of your application and they will enhance your productivity when you have to come back to the code and try to modify it.
• If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications.
• If you have the choice of using a join or a sub-query to perform the same task within a query, generally the join is faster. This is not always the case, however, and you may want to test the query using both methods to determine which is faster for your particular application.
• If your application requires you to create temporary tables for use on a global or per connection use, consider the possibility of creating indexes for these temporary tables. While most temporary tables probably won't need -- or even use -- an index, some larger temporary tables can benefit from them. A properly designed index on a temporary table can be as great a benefit as a properly designed index on a standard database table.
• Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application's performance.
• For better performance, if you need a temporary table in your Transact-SQL code, consider using a table variable instead of creating a conventional temporary table.
• Don't repeatedly reuse the same function to calculate the same result over and over within your Transact-SQL code.
• If you use BULK INSERT to import data into SQL Server, then use the TABLOCK hint along with it. This will prevent SQL Server from running out of locks during very large imports and will also boost performance due to the reduction of lock contention.
• Always specify the narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store and the faster SQL Server is able to read and write data. In addition, if any sorts need to be performed on the column, the narrower the column, the faster the sort will be.
• If you need to store large strings of data and they are less than 8000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance.
• Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
• If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce the I/O reads that the cache memory uses to hold data, improving overall SQL Server performance.
• If a column's data does not vary widely in length, consider using a fixed-length CHAR field instead of a VARCHAR. While it may take up a little more space to store the data, processing fixed-length columns is faster in SQL Server than processing variable-length columns.
• If you have a column that is designed to hold only numbers, use a numeric data type such as INTEGER instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value than does a character data type. This helps to reduce the size of the columns and can boost performance when the columns are searched (WHERE clause), joined to another column or sorted.
• If you use the CONVERT function to convert a value to a variable length data type such as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the required task. This helps to reduce memory use and SQL Server resources.
• Avoid using the new BIGINT data type unless you really need its additional storage capacity. The BIGINT data type uses 8 bytes of memory, versus 4 bytes for the INT data type.
• Don't use the DATETIME data type as a primary key. From a performance perspective, it is more efficient to use a data type that uses less space. For example, the DATETIME data type uses 8 bytes of space, while the INT data type only takes up 4 bytes. The less space used, the smaller the table and index, and the less I/O overhead that is required to access the primary key.
• If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data.
• Carefully evaluate whether your SELECT query needs the DISTINCT clause or not. Some developers automatically add this clause to every one of their SELECT statements, even when it is not necessary. This is a bad habit that should be stopped.
• When you need to use SELECT INTO option, keep in mind that it can lock system tables, preventing other users from accessing the data they need while the data is being inserted. In order to prevent or minimize the problems caused by locked tables, try to schedule the use of SELECT INTO when your SQL Server is less busy. In addition, try to keep the amount of data inserted to a minimum. In some cases, it may be better to perform several smaller SELECT INTOs instead of performing one large SELECT INTO.
• If you need to verify the existence of a record in a table, don't use SELECT COUNT (*) in your Transact-SQL code to identify it. This is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXISTS to determine if the record in question exists, which is much more efficient.
• By default, some developers -- especially those who have not worked with SQL Server before -- routinely include code similar to this in their WHERE clauses when they make string comparisons:
SELECT column_name FROM table_name
WHERE LOWER (column_name) = 'name'
In other words, these developers are making the assumption that the data in SQL Server is case-sensitive, which it generally is not. If your SQL Server database is not configured to be case sensitive, you don't need to use LOWER or UPPER to force the case of text to be equal for a comparison to be performed. Just leave these functions out of your code. This will speed up the performance of your query, as any use of text functions in a WHERE clause hurts performance.
However, what if your database has been configured to be case-sensitive? Should you then use the LOWER and UPPER functions to ensure that comparisons are properly compared? No. The above example is still poor coding. If you have to deal with ensuring case is consistent for proper comparisons, use the technique described below, along with appropriate indexes on the column in question:
SELECT column_name FROM table_name
WHERE column_name = 'NAME' or column_name = 'name'
This code will run much faster than the first example.
• If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options, all of which offer better performance:
Use EXISTS or NOT EXISTS
Use IN
Perform a LEFT OUTER JOIN and check for a NULL condition
• When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will generally want to use the EXISTS clause, as it is usually more efficient and performs faster.
• If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN/OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index.
• If you use LIKE in your WHERE clause, try to use one or more leading characters in the clause, if possible. For example, use:
LIKE 'm%' instead of LIKE '%m'
• If your application needs to retrieve summary data often, but you don't want to have the overhead of calculating it on the fly every time it is needed, consider using a trigger that updates summary values after each transaction into a summary table.
• When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example...
SELECT task_id, task_name
FROM tasks
WHERE task_id in (1000, 1001, 1002, 1003, 1004)
...is much less efficient than this:
SELECT task_id, task_name
FROM tasks
WHERE task_id BETWEEN 1000 and 1004
• If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to use an index (assuming there is one). If the substring you are searching for does not include the first character of the column you are searching for, then a table scan is performed.
• If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead for better performance. Instead of doing this:
WHERE SUBSTRING(task_name,1,1) = 'b'
Try using this instead:
WHERE task_name LIKE 'b%'
• Avoid using optimizer hints in your WHERE clauses. This is because it is generally very hard to out-guess the Query Optimizer. Optimizer hints are special keywords that you include with your query to force how the Query Optimizer runs. If you decide to include a hint in a query, this forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically adapting to the current environment for the given query. More often than not, this hurts -- not helps -- performance.
• If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parentheses have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND:
Locate the least likely true AND expression first.
If both parts of an AND expression are equally likely of being false, put the least complex AND expression first.
You may want to consider using Query Analyzer or Management Studio to look at the execution plans of your queries to see which is best for your situation
• Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server.
• Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed:
ORDER BY
GROUP BY
SELECT DISTINCT
UNION
• If you have to sort by a particular column often, consider making that column a clustered index. This is because the data is already presorted for you and SQL Server is smart enough not to resort the data.
• If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order the list of values so that the most frequently found ones are placed at the start of the list and the less frequently found ones are placed at the end of the list. This can speed up performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes.
• If your application performs many wildcard (LIKE %) text searches on CHAR or VARCHAR columns, consider using SQL Server's full-text search option. The Search Service can significantly speed up wildcard searches of text stored in a database.
• The GROUP BY clause can be used with or without an aggregate function. However, if you want optimum performance, don't use the GROUP BY clause without an aggregate function. This is because you can accomplish the same end result by using the DISTINCT option instead, and it is faster. For example, you could write your query two different ways:
SELECT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
GROUP BY OrderID
...or:
SELECT DISTINCT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
• It is important to design applications that keep transactions as short as possible. This reduces locking and increases application concurrently, which helps to boost performance.
• In order to reduce network traffic between the client or middle-tier and SQL Server -- and also to boost your SQL Server-based application's performance -- only the data needed by the client or middle-tier should be returned by SQL Server. In other words, don't return more data (both rows and columns) from SQL Server than you need to the client or middle-tier and then further reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server resources and network bandwidth.
• To make complex queries easier to analyze, consider breaking them down into their smaller constituent parts. One way to do this is to simply create lists of the key components of the query, such as:
List all of the columns that are to be returned
List all of the columns that are used in the WHERE clause
List all of the columns used in the JOINs (if applicable)
List all the tables used in JOINs (if applicable)
Once you have the above information organized into this easy-to-comprehend form, it is much easier to identify those columns that could potentially make use of indexes when executed.
No comments:
Post a Comment