As a database administrator, one of the key things I look at is the database design. Over the past two decades, I have seen the DBMS evolve. Evolution in performance is obvious, but more so, the efficiency in content retrieval is spectacular. Query Optimization in DBMS is something that happens under the hood unknown to the user. Today, we will look into what it is why it matters.
Query Optimization in DBMS
We run a query usually to extract content from the database. Though for us, it seems trivial, the DBMS has to evaluate a vast number of possible execution paths before the query can be executed. First, the parsing happens, and as a result, we get the query parse tree.
Fig 1 – Image Credit: Learning SQL for Database Intrusion
From the image above, it becomes quite clear that there are multiple paths that the system can take to execute the entire query. Not everything has a sequence. Then, the question arises on which to use. Some paths might take longer than the others. To answer this very question, the query optimization in DBMS arose.
The Query Optimizer is an inbuilt component of the DBMS which is responsible for finding out the best execution path for a query. Its job is to find the path that costs the least in a fixed amount of time. Over the years, DBMS developers have made it highly efficient by making use of many statistical and heuristics-based approaches.
The Optimization Engine
There are multiple ways to perform query optimization in DBMS. We do not have one method that works the best. There’s always some tradeoff based on either computational power or time of execution.
One of the most widely used methods is to use cost-based optimization. This technique involves statistical analysis built using the data accessed from the database. Given the various available paths ( also called the execution plans), the optimizer evaluates the cost of each path it finds important.
It includes what resources to use like accessing the table directly versus the index. It also includes the time of execution or computational power necessary. With all this information it finds the path with the least cost. Hence the name. Oracle Database uses such a technique to perform database tuning or more specifically query tuning. Sometimes, the optimizer might decide when and where to perform optimization. For Example, MySQL optimize all tables while others may not.
Optimizer Decision Process
The execution of the query from start to end cannot take a long time. That hampers the database performance. So, query optimization must happen within a small amount of time. To ensure this, the optimizer has to key some key decisions.
1. Handling nested queries
A lot of nested queries can be flattened in order to be processed more quickly. However, flattening can take some time which might not be desirable. Using cost reduction techniques here might not work and newer databases use a rule-based approach to quickly decide the query plan for nested queries.
2. Ordering joins the best way possible
Table joins can be done in multiple ways. If there are more than two tables to be joined, then the time taken to join a set of two tables might be much more than another set of two tables. So, the order in which the tables need to be joined needs to be decided. IBM’s System R proposed a dynamic programming-based approach that helps to make this decision.
Optimization is necessary
The Database and SQL queries both need tuning, optimization, and performance improvement. While we Database Administrators perform a lot of manual Database Optimization , the DB software implicitly performs query optimization in DBMS. It’s a cycle that repeats throughout the life of the database. With time, we will be able to witness newer mechanisms and intelligence to push efficiency even further.