How Indexing Works (Part 2): Understanding the Execution Plan

Designveloper
5 min readDec 17, 2020

--

Maybe you want to check the first part of this article here.

Understanding database’s execution plan can result in writing better queries, better indexing, better application performance and ultimately it could help us to become a better developer when it comes to working with databases.

Question: What is the execution plan?

Answer: It’s the steps that our database needs to perform/execute our query.

Question: How do we get an execution plan?

Answer: Almost for all database vendors, we just have to prepend an EXPLAIN to our query.

For example: Instead of writing SELECT * FROM users WHERE id = 1, we use EXPLAIN SELECT * FROM users WHERE id = 1. On mysql, we will get a result that looks something like this:

We’re not going to look through all of them, but we will look at the most important one which is the Type column. In the picture, it said “const”. On the Mysql documentation, it will be called the ‘join types’. It’s a bit confusing but we can simply think of it as access types for simplicity’s sake. Because what it really tells us is how the database is going to access our data or how exactly it is going to use an index or not use an index to execute our query.

Let’s go to some possible value we might encounter in the explain output in that column and go through some characteristics of each one of them

Const / eq_ref

These two values are actually distinct values. They might appear on their own in the EXPLAIN output, but because they behave in the same way so we could group them together and treat them as one.

The database will perform a B-Tree traversal to find a single value in the index tree. Basically we’re doing binary search. This can only get used if we can somehow guarantee uniqueness of our result set (Fun Fact: LIMIT does not guarantee uniqueness, it just fetches the data and discards the afterwards). That means there can be at most one result. Two ways we can do this:

  • Have a primary on the column
  • Have a unique constraint on the column

Basically we’re starting at the top of the tree and then we go left or right depending on if the value is less than or greater. Until we either find the value or find out that there is no value. It is super fast. If we see Const or Eq_ref in the EXPLAIN output, then we can stop optimizing because it will not get any faster.

Ref / Range

Again they are two distinct values that get grouped together because of their similar behavior. They are known as an index range scan. They also perform a B-tree traversal, but instead of finding a single value, they find the starting point of a range and then they scan from that point on. Let’s say we have a query “WHERE id > 15 AND id < 30”.

The database will do a traversal to find the first value (which is greater than 15). From that point on, it will start scanning through the leaf nodes (doubly linked list) until it hits the first value that is greater than or equal to 30. These rows are the only rows the database has to look at.

So the ref / range limits the total number of rows the database has to inspect to perform our query. It’s a good thing because of less work for the database. It stops if it finds a value that does not meet the criteria.

Index

Also known as a Full index scan. We’re still using the index but we’re not using it to limit the number of rows. We literally start at the very first leaf node and scan through all of them until we hit the very last leaf node. There is no filtering of anything going on, we simply traverse through the entire index (but we’re still using the index).

All

Also known as a Full table scan. It’s as bad as it sounds, we should avoid it at all costs, it’s never what we want. It doesn’t use index at all. It loads every column of every row from the table into memory, go through them one by one and then omit or discard them depending on whether or not they fulfill the filter criteria. In other words, it scans through all of them and emits or discards accordingly.

Recap:

Access types:

  • Const /eq_ref: Basically binary search to find a single value, stops if we got that. It’s super fast, almost perfect.
  • Ref/range: We use the index to limit number of rows to a certain subset of rows that we even have to look at
  • Index: Use the index but not for limiting or filtering, we’re just scanning through every value in the index
  • All: Avoid it at all costs, we just load up everything and go through it.

Common Pitfalls:

Functions

If we have a query like this, it doesn’t matter what goes into to function because the column will not be seen by the database at all. This is because we can not guarantee that the output of the function has anything to do with the index values.

Inequality operators

If we run that query, we can see that our database still has to look through a lot of rows in our database even though we have already put an index on these two columns (created_at and user_id).

Why is that? The reason is that we have an inequality operation (BETWEEN) on the created_at column (the created_at column is our first index). It seems like the index just stops there.

In this situation, we can just simply put the user_id into the first position because we have an equality operation on user_id.

--

--

Designveloper
Designveloper

Written by Designveloper

Designveloper is leading software development company in Vietnam. We offer Web development, mobile application, UI/UX Design, VOIP. www.designveloper.com

No responses yet