The Path to Improving Database Performance
Databases are the guts of an application; without them, you’re left with just skins and skeletons, which aren’t as useful on their own. Therefore, the overall performance of any app is largely dependent on database performance. There are dozens of factors that affect performance including how indexes are used, how queries are structured and how data is modeled.
Consequently, making minor adjustments to any of these elements can have a large impact. This database performance tuning guide will cover the basics of building and sustaining databases that maximize your system resources for optimal efficiency. The examples will assume you’re using SQL Server, but the general advice is applicable to any database management system.
Improve your indexing strategies
Indexing is too often overlooked during the development process, but a good indexing strategy can be your best tool for tuning your database. Indexes are data structures that allow you to quickly select and sort the rows in a database table. They facilitate faster data retrieval by providing random lookups and easy access to orderly records. Setting up indexes usually doesn’t require much coding, but it does take a bit of thought. If you’re new to creating indexes, check out the diagram below, inspired by Toptal, that outlines best practices for structuring queries:
Using indexes can sometimes be counterproductive. For example, if your tables are frequently hit by
DELETE statements, then your overall performance could drop because indexes must be modified after those operations.
When you need to perform a single, large batch insert, temporarily dropping your indexes can accelerate the process; however, keep in mind that doing so will affect all queries running in the affected tables, so you should only do this when you have more than a million rows of data to insert. Don’t forget to recreate your indexes when you’re done.
Periodically check for indexes that are missing or duplicated. Be extra careful when deleting indexes that you think are unused. It’s generally OK to delete duplicates. You should have a thorough understanding of the system and workloads before making any changes.
Steer clear of correlated subqueries and coding loops
Correlated subqueries depend on values from a parent query. Newer developers love using them because subqueries make coding more convenient. Unfortunately, structuring queries in this fashion can drag down performance. Take the following example:
SELECT c.Name, c.Age (SELECT StudentName FROM Student WHERE ID = c.StudentID) AS StudentName FROM Teacher c
In the above code, the inner query is being run for every row processed by the outer query, which is unnecessary. You could restructure this example using a join:
SELECT c.Name, c.Age, co.StudentName FROM Teacher c LEFT JOIN Student co ON c.StudentID = co.StudentID
Likewise, you should do everything you can to avoid coding loops at all costs. Cursors are used for looping in SQL Server. Using cursors may make your code look nice and tidy, but they can also result in your database getting bombarded with thousands of unnecessary requests. Instead of adding cursors, use well-written SQL statements when possible. If you must add cursors, only use them for jobs that are scheduled to run during off-peak hours.
Keep certain files on separate disks
You may want to consider dedicating a set of drives exclusively to tempdb if you haven’t already. It’s actually a good practice to also put your data, log and backup files onto a separate disk whenever you set up a database server.
If you didn’t do this from the start, the performance boost and other benefits you’ll receive from making those changes now are well worth the effort. Putting everything onto a single disk not only drags down database performance, but it can also create a crisis in the event of a recovery disaster. Partitioning isn’t good enough.
Upgrade your hardware
Like most tech-related issues, you can always boost your database performance by throwing money at it. Since every query must run through memory, adding capacity to your server should speed things up; however, if you wish to take full advantage of extra memory, you need to properly configure your server.
You should particularly keep an eye on your disk latency. It will increase along with the hard drive load, which will cause a decrease in your overall database performance. Take advantage of any available caching mechanisms to alleviate this issue.
If you have more money to spare, you could consider upgrading to solid-state drives. They are becoming more affordable, so if you work for a company, now may be a good time to talk them into making the investment. Nonetheless, you should still place all of your data files, logs and database backups on a drive separate from everything else.
Know who has access
Taking stock of the applications and services that have access to your database can help you pinpoint performance bottlenecks. Poor performance from a single service may be slowing your whole operation down. If you notice any individual clients that don’t measure up the others, then you should delve into its metric to figure out what’s the holdup. If all of your database’s clients appear to be suffering from poor performance, then you should check the health of your host.
Check your connection capacity
If a large chunk of your database’s response time is consumed by connection acquisition, then you may need to re-configure your connection pool. You must know the exact number of connections your database can handle; don’t set the default to maximum because this can cause problems for other clients connected to the database.
A connection pool’s size should remain constant, so always set the minimum and maximum to the same value. To determine your connection capacity, gradually increase load and the number of connections while monitoring your database server’s metrics. When your CPU or memory is maxed out, you’ll know your limits.
Get smart about caching
Be careful not to set your MySQL query cache size too large. It’s reasonable to assume that setting it higher would reduce the number of caching prunes, but the opposite is actually true. Anything over 200 MB is too much, and even 100 could be excessive for your needs. To find the sweet spot, start with 10mb and go up in tiny increments.
Your goal should be to keep the query cache hit rate percentage as close to 100 percent as possible. Once the hit rate drops below 50 percent, then you may be better off just disabling query caching. Medium has a detailed tutorial on how to optimize the caching capabilities of Redis to improve database performance.
More quick tips for boosting database performance
Now that we’ve covered the larger structural issues that affect database performance, here is a list of quick alterations you can make to see some small yet instantaneous improvements:
- Create a primary key as a clustered index for every table. Fortunately, Enterprise Manager clusters primary keys by default. You should also set up an index for any column that is a foreign key.
- When you reference objects in T-SQL, make it a habit to always owner qualify them. For example, you would use dbo.sysdatabases as opposed to just sysdatabases.
SET NOCOUNT ONat the top of every procedure and SET NOCOUNT OFF at the bottom of every procedure.
- Unless you’re writing banking software, then don’t bother with locking. Using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the top and reverting to READ COMMITTED at the bottom of your storage procedures is more effective than using the NOLOCK hint.
- Feel free to use transactions when needed, but don’t allow any user interaction while they are in progress. It’s best to keep all of your transactions inside a stored procedure.
- Don’t use temp tables unless you absolutely need them.
- Scan your code for uses of the NOT IN command and replace them with a left outer join.
- Speaking of which, make it a habit to review your code before and after each change you make.
- Find ways to lower the amount of required round trips to the server. For example, try returning multiple ResultSets.
- Don’t bother using index hints or join hints.
- Instead of using
SELECT *statements, you should just individually specify each column you require even if you need every column in a table.
- When checking to see if a record exists, use
COUNT(). They are both effective, but
EXISTS()stops running automatically once it finds the requested record, which will result in better performance and cleaner code.
The team-oriented approach to improving database performance
If you’re lucky enough to be working for a company with a database administrator, they should theoretically be in charge of tuning databases; however, even in such ideal circumstances, developers often end up doing much of their own troubleshooting. It’s not unusual for database administrators and developers to clash in their approaches, so here are some pointers to improve cooperation:
- If possible, developers and database administrators should work together on creating relational diagrams.
- Database administrators shouldn’t be asked to make changes in a production environment; developers should be responsible for their own changes.
- Administrators should consider using a real-time status panel to keep everyone informed. That way, developers don’t constantly have to ask for database updates. You’d be surprised by how much time this little addition can save everyone.
- Administrators should provide a testing environment for developers that allows them to simulate a production server and conduct tests on real data.
- Developers and database administrators should both prepare for the day when you must migrate all of your data to a new software version.
Dedicate yourself to database upkeep
The more you know about your overall database architecture, the better prepared you’ll be to make improvements. Optimizing database performance isn’t a one-and-done deal. It requires diligence and ongoing communication between administrators and developers. If you’re still looking for more ways to keep your database performance in peak condition, there are tools such as Vivid Cortex to improve your database visibility and performance. Most importantly, your databases should be securely hosted and adequately resource.