Deep dive into Storage Engine of DBMS Episode II(Query Planning/ Optimization, Execution, Transaction Manager, Lock Manager, Concurrency control).
Introduction:
Have you ever wondered in a high write and read environment database setup, whenever you write a query how does the data gets written into the disk without any changes in the underlying information and how one can read from a database when the other is writing, what happens when concurrent requests of reads/writes are fired onto a single data point? Well if you haven’t asked this question yet, do now and once you have given it some thought, continue with the article.
Part 1: Query Planning
Query planning is a crucial step in the execution of a database query, where the database management system determines the most efficient strategy to retrieve the requested data based on the query and the available database resources. It involves analyzing the query, understanding the data access patterns, and generating an optimized execution plan that minimizes the overall cost of executing the query.
During query planning, the DBMS performs the following tasks:
1. Query Parsing: The DBMS parses the SQL or query language statement to understand its structure and syntax. It verifies the statement’s validity and identifies the tables, columns, and conditions involved.
2. Query Optimization: The DBMS evaluates various strategies and algorithms to determine the most efficient way to execute the query. The goal is to minimize the time and resources required to retrieve the requested data.
3. Query Plan Generation: The plan defines the sequence of operations and the access methods/paths to retrieve the required data efficiently. The DBMS first prepares a logical plan and then maps that to the physical layout of the data, considering factors such as indexes(will be covered in later parts), storage structures, and available hardware resources. The physical plan outlines the specific data access methods, join algorithms, and other operations necessary to execute the query.
4. Plan Cost Estimation: The DBMS estimates the execution cost of the generated query plan, considering factors like disk I/O, CPU usage, memory requirements, and network communication. This estimation helps the DBMS compare and select the most efficient plan among the available alternatives.
Once the query planning phase is complete, the DBMS proceeds to the query execution phase, where it executes the generated query plan to retrieve and process the requested data from the database. Efficient query planning is crucial for optimizing query performance and ensuring the timely retrieval of accurate results from the database.
These are all just some high level overview of what the query planner does, lets now dive deep into the internals where we will get to explore the query lifecycle how it’s run, how data is gathered. Let’s go then…
Part 2: Query Execution
Well there’s nothing much to define about query execution, as the name says, after a plan is set to execute a query, the query execution process begins.
While executing a query, DBMS employs different strategies to perform the operations known as Processing Models. There are 3 types of the same, they are :
- Iterator Model
- Materialization model (suitable for in-memory DBs)
- Vectorized model (suitable for analytical workloads)
Basically in an iterator model whenever a query is executed, under the hood a series of for
loops will be created in a tree based structure, where the parent node will call the child node and asks it to access some part of the data from the disk and when the operation is completed the child returns the data to the parent node with an emit
event, this way the required data is moved up the tree and the finalised data will be present at the very parent.
Materialization is a technique used to improve the performance of queries by precomputing and storing the results of complex or expensive operations.
On a broader view there are 3 types of materialization which will be used according to the query plan. They are Eager Materialization, Lazy Materialization, On-Demand Materialization. If you want to know more please refer this.
One of the ways in which DMBS optimizes the data reading from cache pages is through paging. As Paging is a broader concept and will be covered in upcoming articles, now we’ll discuss one more method which is a subset of indexing by which this can be achieved. The concept is Zone Map. They are basically pre-computed aggregates for the attribute values in a page. The storage manager first checks the Zone map to decide whether the data it is looking for is present in that particular page or not. Zone map maintains kind of statistical data about the page like the minimum value of a column in the page, the maximum value etc… Lets say this is the query which is being executed SELECT * FROM table WHERE val > 600;
As the val column is sorted and indexed and the sorted version is stored in the page, using a zone map will really speeds up the data access.
Part 3: Transaction Manager and Lock Manager and Concurrency Control
A transaction manager is a piece of software which takes care of a single transaction from start to end i.e., from the very first operation of the query till the very end (committing) of the query. Lock manager on the other hand, helps the transaction manager in writing or reading data from the disk/cache without worrying about the changes in data which might happen while reading/updating the data. Concurrency control is a technique where in the isolation level of DB is defined so that the readers will either get the most latest uncommitted data or only committed data.
Transaction is a single block of operation which is bound by rules. A transaction will be performed atomically so that the durability of the system is maintained. If you are using a distributed architecture where in there are multiple read write replicas then consistency would be a major concern after durability. More about maintaining consistency will be explained further in the article. The below diagram shows the life cycle of a transaction and it’s stages in DBMS.
When a transaction or a query comes in, the transaction manager will be set to active state irrespective of whether the query is a read or write operation. If the query is a write operation it’ll bring in a change to the underlying data so a EXCLUSIVE lock is taken on the data which needs to be modified if it’s a modification, so that any concurrent read operation on the same dataspace is restricted so as to maintain the consistency and durability of data across the systems. This is also dependent on the database Isolation Levels. More on Isolation levels in the further parts of the article. Whenever a transaction fails to update the data, in any point if it fails then the whole transaction needs to be cancelled and the partially committed changes also needs to be brought back to the state when the transaction got started. This mechanism of reverting back the state of the database is called Rollback. If the updation went through perfectly then, the transaction is considered as success the it will be terminated accordingly only after committing the same. Whereas while reading the data from the source, the dbms will take a SHARED lock on the dataspace so that other threads which want to read the same dataspace can read as well. In database like PostgreSql, it has a concurrency control mechanism widely known as Multi Version Concurrency Control(MVCC). Lets say a thread is writing/modifying a data, and simultaneously a reader comes and tries to read the same data, in normal isolation levels the readers will be blocked while a process is writing to the same dataspace, but in MVCC, all the readers will see a snapshot of the data even if the writers have taken a lock so that the readers won’t be blocked at any time. It is also widely known as readers won’t block writers and writers won’t block readers. With this same approach comes a headache of how to clean the snapshot after the query is committed? The answer to this is through a special function in PostgreSQL called VACUUM
and VACUUM FULL
. Vacuum is a process by which all the dead tuples are deleted from the disk thereby gaining more disk space. Usually this is run periodically as a background process.
Some of the isolation levels are:
- Serializable Isolation: Each query(within transaction) could see only data committed before that transaction started, plus changes made in the transaction itself. So, there are no phantoms here.
- Read Committed: Each query(within transaction) could see only data committed before that query started. So, if you run the same query twice in the same transaction, you could see different results and phantoms.
- Read Uncommitted: Each query(within transaction) could see all the uncommitted data by other transaction being performed on the same data.
- Repeatable Read: This is the strictest possible isolation level. In this even when a reader is reading a data, the writers will be blocked from modifying the data hence there will be no phantom or dirty reads possible at all.
While garbage collection (GC) in PostgreSQL uses Vacuum and Full Vacuum both of them are different in nature, while simple vacuum regains the space after deleting the dead tuples, it won’t return it to the Operating System and it doesn’t take any exclusive lock on anything. Whereas Full Vacuum takes an exclusive lock on the whole table on which it is performing garbage collection, it’ll also release the regained space back to the operating system.
Conclusion:
In this article we learnt about life cycle of a query, how the reads / writes take place in a Database System, how transactions guarantee atomic reads/writes, locks in a database system, different isolation levels in DB and specifically in depth about PostgreSQL’s implementation of MVCC which is a game changer in some of the application. We also took a look into garbage collection process of PostgreSQL. In next article I’ll be explaining about indexes and it’s importance. Stay tuned !!!!!