Basic principles of database indexes

Are you using relational databases but the indexes are closed book for you? You absolutely don't know what the indexes are and how they work, or you are confused by various types of indexes (B-Tree, Hash, Bitmap) and you're not sure about their advantages or disadvantages? Or you just can't find out why a particular index is not used when evaluating a query? Maybe this article will help you to answer these questions ...

The following text definitely is not meant to be an exhausting description of index implementation in any relational database system - it's rather an overview of basic principles common to all index implementations. I'm convinced - and several years of quite intensive work with various databasse (MySQL, PostgreSQL, Oracle, Sybase) acknowledges this experience - that basic principles of indexes are common to all database systems. Sure, the individual systems have different features related to principles - for example supported index types, behavior under stress, etc. but the underlying principles are identical. Source code examples introduced in this article are for demonstration purposes only, they should outline "how it works" and that's they're only goal. After all, they're written in a Java-like language ;-)

If you're interested in implementation details in a particular database system, search them in the documentation (but in case of commercial system's you won't be lucky I guess).

Indexes are without question one of the most important topics when designing the schema and optimizing the database, as they are invaluable tool when effectively sorting, joining and reading tables and searching records within them. A lot of developers who work with database systems (and not always beginners) regrettably believes the indexes are somehow magic, and do not understand what the indexes really are and how it works, not mentioning various types of indexes.

Before discussing the indexes we should quickly describe the database tables - what they really are, and so on.

What is a database table?

There are many definitions of database tables, but in sum it's just a data file (or a set of data files) containing two types of data:

  • metadata ie. description of the table (names and types of columns, statistics of the table, etc.)
  • data ie. rows of the table (n-tuples of the data), stored one after another

Of course - this is a really simplified model - in modern database systems the metadata are usually separated from the data, stored in a separate set of files and available through system catalogs etc. - but for the purpose of this article this description is absolutely sufficient.

As an illustration let's consider the following structure:

CREATE TABLE Persons (
    id           INTEGER,
    first_name   VARCHAR(64),
    last_name    VARCHAR(64),
    age          INTEGER,
    email        VARCHAR(64),
    height       INTEGER,
    weight       INTEGER
);

Data stored in this table may be for example

id first_name last_name age email height weight
1 John Holmes 29 j.holmes@example.com 185 78
17 George Blue 43 g.blue@example.com 171 65
65 Jane West 19 j.west@example.com 175 60
44 Roger Smith 25 r.smith@example.com 165 65
... ... ... ... ... ... ...

If we choose pipe "|" as a column separator and "\n" then the table might look like this after storing in a file (let's dismis the fact that the data are usually stored in binary form, and store them in ASCII):

1|John|Holmes|29|john.holmes@example.com|185|78\n
17|George|Bush|43|george.bush@example.com|171|65\n
65|Jane|West|19|jane.west@example.com|175|60\n
44|Roger|Smith|25|roger.smith@example.com|165|65\n

That seems quite fine, but the problem occurs if there is a lot of records and it's necessary to search the table for rows with a given value in a given column. That means a necessity to search the whole file row by row (and if it's not in a memory this means reading it from a disk, which may cause reading gigabytes of data), and for each row check the value ni a given column.

Principle of indexes

Basic principle of the indexes is quite simple - using a "suitable data structures" (tree, hast table, bitmap and so on) accelerate access to the rows matching a given condition (most usually with a given value in a given column) compared to plain sequential read of the whole table.

Types of indexes

Besides the three basic index types (b-tree, bitmap and hash indexes) described in this article, there are many other specialized types (GIST, GIN, fulltext, R-Tree, etc.). These specialized types are not described here as they're database specific or very specialized, but I definitely recommend checking all index types supported by your database system.

B-tree indexes

As the name suggests, this type of indexes is based on trees (namely B-trees). If you don't know features of this data structure, especially those related to searching and sorting, check the articles on wikipedia about trees as a data structure, binary trees and B-trees.

Let's mention at least two most important features of trees:

  • The average depth of a binary tree containing N values is log2N, in case of B-tree of order m the average depth is logm/2N in the worst case. Due to this feature, trees are extremely suitable for searching.
  • Given the data stored in a tree, it's very simple to sort them as the tree keeps the data sorted. All you need to do is pre-order or post-order processing of the tree and you get values in ascending or descending order.

So how do the trees fit into the topic of database indexes? If we want to create an index on a table column, we can create a set of row "addresses" (index or offset of the row) for each unique value of the column, and then store these pairs [value, set of rows with the value] in a tree.

In Java, a primitive index on a text column might be defined like this:

Map<String, List<Long>> index = new TreeMap<String, List<Long>>();

Algorithm used to create the index might be roughly this:

BufferedReader reader = new BufferedReader(new FileReader(filename));

Map<String, List<Long>> index = new TreeMap<String, List<Long>>();

long offset = 0;

while ((String line = reader.readLine()) != null) {

    // split the rows into columns
    String[] fields = String.split("|");

    // read the value in 3rd column
    String field = fields[3];

    // check if the value is in the index already
    List<Long> element = index.get(field);

    // if it isn'n, create a new tree node
    if (element == null) {
        element = new ArrayList<Long>();
        index.put(field, element);
    }

    // add the row into the node
    element.add(new Long(offset));

    // update the offset (add length of the row plus end-of-line)
    offset += line.length() + 1;
}

Thanks to the tree structure we're able to seek for rows with a particular value very fast, as we're able to look up the whole set of matching rows without reading the whole table row-by-row.

B-Tree indexes and LIKE conditions

An interesting remark is related to the possibility of using B-tree indexes when evaluating certain LIKE conditions, often used to implement simple text search. In case of general LIKE conditions

column LIKE '%string%'

the B-tree indexes are unusable, but it's possible to use them in case of prefix or postfix conditions, i.e. conditions in the form

column LIKE 'prefix%'

or in the form

column LIKE '%postfix'

In case of prefix conditions the utilization of B-tree indexes is quite straightforward - the condition is actually equivalent to the following condition

column >= 'prefix' AND sloupec < 'prefiy'

(as the first "greater" column not beginning with "prefix" is "prefiy"). This condition may be applied to keys of the tree while using their natural ordering.

Application for postfix conditions is like a trick, as it actually transforms the condition to a prefix one (just described) using indexes over expressions. Instead of creating an index on column directly, the index is created on expression

revert(column)

where the function "revert" returns the string parameter backwards. The condition then is modified to

revert(column) LIKE revert('%postfix')

which is a prefix condition, whose evaluation using B-tree indexes has been described above.

In general the LIKE conditions are problematic - I've seen several applications suffering serious performance problems due to searching implemented using LIKE conditions, as it used most of the I/O subsystem capacity (the average read rate was over 80 MB/s) and slowed down all other applications. The situation considerably improved after deploying fulltext search engine able to use indexes. So consider using a specialized tool for searching, e.g. Lucene.

When the B-Tree indexes do not work well

Of course, there are other cases when B-trees do not work well (besides the general LIKE conditions), i.e. when usage of the B-tree index does not improve (or even worsens) performance or it may not be used at all. One of such cases may be indexing of column with a low variability, i.e. with a low number of different values, eventually a case when the value used in a condition is very frequent. In such case (some of) the tree nodes contain a lot of columns and random access to many rows (whereas the random access is a direct consequence of index usage) is considerably slower than reading the whole table and filtering the rows on the fly.

Not only that reading a large portion of the table may cause performance problems, but in case of more complicated queries (e.g. when there are multiple conditions connected by logical connectives AND / OR) it comes into question to combine multiple indexes (for each condition), but while possible, it's not trivial with B-tree indexes. As we will see in the following section, bitmap indexes are much more suitable for this.

Bitmap indexes

Bitmap indexes are designed to work well even in case when the B-tree index fails, i.e. in case of low variability columns. In contrast to B-tree indexes (bases on a tree data structure), bitmap indexes are based on bit arrays or bitmaps.

All distinct values are loaded for the column being indexed, and for each of these values a bit array is created with length equal to all rows of the table. Let's put aside technical details of the bit array implementation, as for example allocation, preallocation and so on, and set the i-th field of the bitmap to "1" iff the i-th row of the table contains value of the bitmap in the given column.

By extending the table to contain "sex" column with three possible values (m - male, f - female, u - unknown) we get a typical low variability column.

id first_name last_name age email height weight sex
1 John Holmes 29 j.holmes@example.com 185 78 m
17 George Blue 43 g.blue@example.com 171 65 m
65 Jane West 19 j.west@example.com 175 60 f
44 Roger Smith 25 r.smith@example.com 165 65 u

For each of the three values (m, f, u) a bitmap will be created:

m => '1100...'
f => '0010...'
u => '0001...'

So for an index on "sex" column there would be three bitmaps allocated, one for each value. Compared to the B-tree indexes, a problem arises as the bitmap itself contains just information about row index, not the offset - so there should be some "mapping" of indexes to offsets, e.g. using an array of offsets. Creating a bitmap index may be implemented like this, for example:

BufferedReader reader = new BufferedReader(new FileReader(filename));

// values in the column
Set<String> values = new HashSet<String>();

// number of rows
long rowCount = 0;

// in the first walk, we have to find out values and number of rows
while ((String line = reader.readLine()) != null) {

    // split the rows into columns
    String[] fields = String.split("|");

    // get the value in the column
    String field = fields[3];

    // add the value into list
    values.add(field);

    // update number of rows
    ++rowCount;

}

// skip to the beginning of the file
reader.reset();

// turn the list into array
String[] valuesArray = values.toArray();

// index - map of bitmaps (value -> bitmap)
Map<String, boolean[]> index = new HashMap<String, boolean[]>();

// create bitmaps of the length, one for each value
for (int i = 0; i < valuesArray.length; i++) {
    index.put(valuesArray[i], new boolean[pocetRadek]);
}

// index and offset of the current row
int rowIndex = 0;
long rowOffset = 0;

// array of row offsets (mapping index -> offset)
long rowOffsets = new long[rowCount];

// create the bitmaps
while ((String line = reader.readLine()) != null) {

    // store offset of the row
    rowOffsets[rowIndex] = rowOffset;

    // split the row into columns
    String[] fields = String.split("|");

    // load value of the given column
    String field = fields[3];

    // set bitmaps for the current row
    for (int i = 0; i < valuesArray.length; i++) {
        boolean[] bitmap = index.get(valuesArray[i]);
        bitmap[rowIndex++] = valuesArray[i].equals(field);
    }

    // update the offset
    rowOffset += line.length() + 1;

}

As noted at the end of B-tree section, some operations are much easier with bitmap indexes - an example may be evaluation of logical connectives (AND, OR, XOR, NOT) as the evaluation may be performed directly on the bitmaps. For example to evaluate this condition

(sex = 'm' OR sex = 'u')

you need just to apply the OR connective to the bitmaps (element by element). That's naturally much faster than walking the whole table, as the bitmaps are dramatically smaller compared to the table, and applying the logical connectives to single bits (or boolean variables) is very fast.

Of course, multiple bitmap indexes may be used in a similar way. Even if the database supports utilizing multiple indexes of various types (not only bitmap ones) to evaluate a single query, it's usually implemented by deriving temporary bitmaps for separate parts of the query (e.g. evaluated using b-tree index) and these bitmaps are then combined just as described in the previous section.

Disadvantages of bitmap indexes

One of the disadvantages of bitmap indexes is obviously the need to iterate through all rows (indexes of the bitmap) and check if there is "1" or not, but in most cases this is much more effective than walking the table itself. But even bitmap indexes bear some overhead, and in some cases it may be faster to work with the table directly (especially if the number of matching rows is high).

Regarding other possibilities to use bitmap indexes - while the bitmap indexes are not very suitable to enforce uniqueness of the values (as it would mean allocating as much bitmaps as rows), they may be used for sorting quite effectively. All you need to do is to sort the values associated with the bitmaps (and stored in the index) and then load the bitmaps in the proper order.

Hash indexes

The third very common type of index are hash indexes. As the name suggests, they have something in common with hashes or hash tables. If you're not sure what a hash table is, it's high time to fix this fundamental deficiency of your education!

The general principle of hash tables is that you compute hash for each value you want to store - let's suppose the result is an integer between 1 and 1024. This actually distributes the values into 1024 "bins" while

  • same values are always in the same bin (as the hash is always the same)
  • there may be multiple various values in the same bin (but with the same hash)
  • the values should be distributed uniformly across all bins

A simple hash index creation may be implemented like this:

BufferedReader reader = new BufferedReader(new FileReader(filename));

// index: hash -> list of tuples [value, offset]
Map<Long, List<Object[]>> index = new HashMap<Long, List<Object[]>>();

long offset = 0;

while ((String line = reader.readLine()) != null) {

    // split the rows into columns
    String[] fields = String.split("|");

    // get the value in the column
    String field = fields[3];

    // check if the value is in the index already
    List<Object[]> bin = index.get(field);

    // if it isn't, create it
    if (bin == null) {
        bin = new ArrayList<Object[]>();
        index.put(field.hashCode(), bin);
    }

    // add the current offset into the index
    bin.add(new Object[] {field, new Long(offset)});

    // update the offset
    offset += line.length() + 1;
}

OK, so we have the values sorted into bins, but what does that gives us? Well, simply said, when looking up a value or joinin a table, we don't have to do so many comparisons (and as usual with indexes, we don't have to read the whole table).

Consider a table with 1.000.000 rows, and let's try to find out rows by a value in a given column (with a hash index on it). Without the index the database system would have to read all 1.000.000 rows (from a drive), with hash index all you need to do is

  1. compute hash of the value you're looking for
  2. "jump" into the proper bin
  3. look-up the value in the given bin (in an ideal case there should be only 1.000 rows)

So in the end you have 1.000 comparisons instead of 1.000.000, not the mention the significantly smaller amount of data you have to read from a disk.

When joining two tables, the effect is similar - when joining two tables, each one containing 1.000.000 rows, that means 1.000.000.000.000 comparisons without an index (well, maybe a little bit stupid implementation). Using a hash index on both sides, you don't need to compare all the values - you just have to compare the values in the same bin. There are 1000 bins, each of them containing 1000 values in average. That gives 1.000.000 comparisons per bin, thus 1.000.000.000 comparisons in total.

Disadvantages of hash indexes

However, compared to B-tree indexes, hash indexes have some disadvantages too. The most serious one is that hash indexes are applicable only for equality conditions ("=" operator), and may not be used for other conditions (e.g. ">", "<" atc.). Due to this hash indexes may not be used for sorting.

When the indexes may be used

Enough theory - let's look at some basic condition types that may be efficiently evaluated using indexes. Individual database systems have different features regarding usage of indexes - when one system uses an index effectively, another system may not be able to use the index at all. Keep this in mind when reading the following section.

WHERE conditions

Basic and most frequent case of index utilization are WHERE conditions. As the basic types of WHERE conditions (and the possibility to use individual types of indexes) were analyzed in the preceding sections about index types, let's look at some related topics.

FOREIGN KEY and JOIN

Among such topics fall queries used to check "link" between two tables - foreign keys. Such queries are executed by the database system internally, but otherwise they are regular queries just like any other SELECT.

Problems related to foreign keys emerge when the dependent table (the one containing the foreign key) is "too large." Consider the following two tables:

CREATE TABLE A (
    id INT PRIMARY KEY,
    ...
);

CREATE TABLE B (
    ...
    a_id INT REFERENCES A(id),
    ...
);

When deleting the records from table "A" the database system has to check if there are not any rows in table "B" containing the deleted foreign key values - that means a query similar to the following one has to be executed:

SELECT COUNT(*) FROM B WHERE a_id = #id#

where #id# is ID of the deleted row from table "A." If there is no suitable index on column "B.a_id", the whole table "B" will be sequentially scanned, which is an obvious performance problem if the table is too large.

Another problem related to foreign keys is joining of tables - in most cases the join condition involves the foreign key columns (equality between referencing and referenced table). If there is no index on column "B.a_id" it means exactly the same performance problem as described above.

So there are two recommendations:

  1. Index columns that are often used for joining, preferably on both sides of the join - referencing as well as referenced table (on the referenced table this usually is not a problem, as there is a PRIMARY KEY or UNIQUE INDEX).
  2. Index all foreign keys (especially) on large tables.

ORDER BY, min / max, first / last

Another quite traditional utilization of indexes is sorting - some index types either keep the rows in a sorted structure (e.g. the B-tree indexes) or it's possible to derive the ordering (e.g. in case of the bitmap indexes). Sure, there are index types that are not suitable for sorting (e.g. hash indexes).

In general the following rules hold

  • If you often sort by a certain column, the column should be indexed.
  • When often sorting by multiple columns, a multicolumn index should be created on them (and the order of columns in the index should be the same as in the ORDER BY clause).

Two other topics are related to sorting - searching for extremes (minimums and maximums) in a given column, and fetching the first / last rows of the result.

GROUP BY

Evaluating the GROUP BY may be faster if the columns in the GROUP BY clause are indexed (in the multicolumn index), but you have to try it.

Uniqueness

In databases it's often needed to enforce uniqueness in a given column - this is called a "UNIQUE constraint," a it's usually implemented using an index.

Why an index is not used ...

Quite often the develepers ask me "I have an index on a table, but it's not used. Why?" Answering this question is not simple, but in general there are several basic reasons why an index is not used:

It would be slower

Many developers live in an illusion that using an index automatically improves the performance, and that's not true. It's possible the database system actually did a correct decision when it choose not to use the index. Besides that some database systems allow to enforce index usage (e.g. by hints in Oracle, or using "SET enable_seqscan = off" in PostgreSQL).

Database believes it would be slower

When the database prepares an execution plan, i.e. when it decides in what order and how to join tables, what indexes to use, etc. a "cost based" estimates are used to compare which of the possible plans is the best one. But when computing the estimates, more or less imprecise values are used, especially statistics of tables and columns (number of rows, histograms of values in the columns, most frequent values, etc.), estimastes of random / sequential reading cost, and last but not least information about the system (amount of disk cache, etc.).

If the values are too inaccurate, the resulting execution plan cost estimates may be so skewed that the database erroreously decides for a harsh execution plan, There may be several reasons for such imprecise values, for example:

  • statistics of tables / columns are inaccurate - The statistics may be outdated or not detailed enough, so the database mistakenly believes the table contains dramatically more or less rows than it actually does.

  • random / sequential read cost estimates are not matching realiry - This depends especially on the HW used (type of hard drives, disk controller, etc.) - for example on SSD drives both types of reading are about equally time-consuming, while on old IDE drives the random access may be substantially slower than sequential read.

  • database does not have necessary information - There are cases when some of the necessary information simply are not available when preparing the execution plan - e.g. then processing a prepared statement the particular values of the parameters are unknown, so the database may not estimate how many rows of the table will match a condition (the more rows have to be read, the less profitable is index usage). Databases in such case usually choose a "safe plan" instead of "risky" plan that may be a little bit faster in some cases, but considerably slower in other cases.

  • the query is inconveniently formulated - Often the problem is in the query itself, which is formulated so unfortunately that the database may not use the index, and the correction is usually just a question of a small change of the query. There are many ways to screw up a query, but one of the most popular ways is using functions or expressions in WHERE conditions, when the index is created on the column directly.

    For example when the index is created on column "a", then it may not be used to evaluate the condition "WHERE a*a >= 12492" as it contains square power of the column. This mistake may be fixed either by creating an index on the expression (a function applied to the column, in this case "a*a") or by a small fix of the query so that the index may be used - for example by moving the expression to the other side of the condition, in this case "WHERE a >= sqrt(12492) OR a <= -sqrt(1249)".

What to avoid

Indexes are not for free and there are potential problems connected to them. You may prevent some of them by a careful schema design, so try to avoid:

  • too many indexes - Updating an index yields an overhead, and slows down the operations modifying data in the indexed columns (INSERT, UPDATE, DELETE). This holds especially for tables with a lot of write operations, as for example logging tables etc.
  • needlessly redundant indexes - This recommendation is related to the previous one, as redundant indexes unnecessarily increase the number of indexes on a table.
  • indexes on a column with a low variability - low variability usually implies low selectivity of conditions on this column (i.e. a large portion of the table matches the condition) which means ineffectivity of the index compared to the plain sequential reading of the whole table. This problem may be (to a certain degree) circumvented by using multicolumn indexes.
  • indexes on small tables - If the table occupies just a few disk pages, it's much cheaper to read the whole table instead of using an index. This does not hold for indexes defined to enforce uniqeness etc.
  • for generic LIKE conditions and regular expressions - Generic conditions of the form "LIKE %text%" and "~* '%text%'" may not be accelerated using indexes (see above).

The whole problem is further complicated by the fact that removing an index during an application lifecycle is quite problematic, as it's very difficult to identify unused indexes. Some database systems provide statistics of index usage, but although it may be a useful hint, what if an index is used only once a year or something like that - for example when preparing statement of balances? Removal of such index might cause failure of such important job ...

Examples for download

In the preceding section devoted to individual index types a few examples of (naive) implementation were cited. If you wish, you may download a little bit more complete and elaborated version. The package contains source code written in Java, an API documentation (javadoc) and besides that a sample of data used for the benchmark.

As an overview, I present here a list of basic package elements (packages, classes, etc.):

  • cz.fuzzy.examples.db.indexes.Index - an interface common all index types
  • cz.fuzzy.examples.db.indexes.BitmapIndex - bitmap index implementation
  • cz.fuzzy.examples.db.indexes.HashIndex - hash index implementation
  • cz.fuzzy.examples.db.indexes.TreeIndex - tree index implementation (although it's not a proper B-Tree index, it's sufficient for the demonstration purposes)
  • cz.fuzzy.examples.db.indexes.IndexException - error accessing an index (reading, ...)
  • cz.fuzzy.examples.db.tables.Table - interface common to all tables (in disregard to the way the data are stored - in a file, in a memory, ...)
  • cz.fuzzy.examples.db.tables.FileTable - table with data stored in a file
  • cz.fuzzy.examples.db.tables.TableException - error accessing a table (reading, ...)
  • cz.fuzzy.examples.db.IndexBenchmark - a class comparing multiple type of data access (no index, with various kinds of indexes, ...)
  • cz.fuzzy.examples.db.tests - JUnit tests for individual classes
  • build.xml - ANT build script, with targets for compilation and building JAR archive
  • example.data - sample data, used by the benchmark (but you may use yourr own data)

I warn you that these examples are somehow primitive, their sole purpose is to demonstrate the usage of indexes when evaluating queries, and the production database systems have to solve many other important issues. For example modification of the data (the tables do allow just reading of the data from an unchangeable file), multiuser access or multi threading, or transactions.

You may download the whole package here - all you need to do is extract it, compile into a JAR, and then run the benchmark:

$ unzip examples-indexes.zip
$ cd examples-indexes
$ ant dist
$ java -jar dist/dbindexes-20090425.jar

An example of benchmark results (searching for 200 values using different way) may be for example:

$ java -jar dist/dbindexes-20090425.jar
==== Preparing table ====
==== Benchmarking plain table (no index) ====
200 searches: 165 ms in average

==== Benchmarking tree index ====
index built: 205 ms
200 searches: 0 ms in average

==== Benchmarking hash index ====
index built: 249 ms
200 searches: 0 ms in average

==== Benchmarking bitmap index ====
index built: 286 ms
200 searches: 0 ms in average

If you wish, you may download a larger version of example data (about 20MB instead of the 1MB file included in the package) - all you need to do is to rewrite the examples.data file.

Links

When writing this article, the most important sources were:

 

Comments

There are no comments for this article (or are awaiting acceptance).

New comment

All the comments have to be accepted, so there may be some delay between submitting and accepting (or rejecting) the comment. If you enter the e-mail address, you will be informed about acceptance or rejection.

Subject or body may not contain HTML tags - they will be automatically removed. Paragraphs may be separated using a newline (ENTER).

(optional)