Prof. Dr. Tilmann Rabl

The Log-Structured Merge-Tree

(Patrick O’Neil et al., The log-structured merge-tree, Acta Informatica 33, 351-385 (1996))

A summary by Jana Trenti.

It is often said that Information Technology is a rapidly developing field. That major breakthroughs might seem old, out-dated, just a few years later. But some ideas manage to linger.

One of those ideas is the one of the log-structured merge-tree (or just LSM-Tree for short). LSM-trees are commonly used by modern database management systems.[1] And while some changes may have been applied to them over the years, LSM-Trees are not a new invention. In fact, they were first presented in a paper published in 1996. Why is it that LSM-Trees have managed to persist where so many other ideas have been lost to the ever new developments of the industry? It may be because they present a solution for a problem that is still as viable as it was in 1996.

Why were LSM-Trees developed?

Though not necessarily simple to solve, the core problem is not complicated. It is not surprising news that indexing may greatly increase the performance of many given databases. But what if the indexing becomes a slowing factor? Without LSM-Trees, this situation actually occurred for high-performance systems having to do many record inserts. The reason for this was simple: I/O costs. Classical indexing systems like B-Trees have to perform many disk interactions. It is also not surprising news that many disk interactions will slow a system down significantly. LSM-Trees now proposed a way of reducing those disk interactions, thereby increasing performance.

What does an LSM-Tree do?

To do this, LSM-Trees utilize a rather simple core idea: they split the index into two or more components, the default case being two. The important part is that only one of those two components is disk-resident, while the other one, usually significantly smaller, is held in main memory. To talk about the components more easily, they are usually numbered. C0 is the component held in main memory. C1 is the component on disk. C0 may be an arbitrary tree-like structure. C1 follows the structure of a B-Tree, albeit one that is optimized for sequential disk access: Nodes are completely filled and neighboring nodes are stored in contiguous multi-page disk blocks.

But how does this actually help mitigate I/O costs?

When an entry is now inserted into the database, it is first indexed within C0. In fact, the main purpose of C0 is to take new entries. For a while, all new entries are simply added to C0. The next step only occurs when C0 reaches a certain size. As main memory is often expensive and thus limited, C0 is always kept under a threshold size defined beforehand. When enough entries have been added and that size is reached, the merge-tree starts merging. In this step, takes some contiguous entries form C0 and merges them into C1, deleting them from C0 in the process and thereby getting it under the threshold size again.

In doing so, a number of new entries is not written to disk one at a time, but rather in a more efficient batch manner. Thanks to the structure of C1, it is also often possible to load just one contiguous block of pages from disk. By grouping new entries together like this, multiple disk accesses can be gathered into one, thus reducing I/O costs. Deletes can be handled in a similar way. If an entry is deleted while still in C0, this operation does not pose a problem. It can either be deleted from main memory instantly and cost-efficiently, or it is simply marked and dropped during the next merge. Should an entry to be deleted already be disk-resident, a delete node entry can be inserted in C0 where a normal entry of this value would be. During the next merge, the delete node entry causes the corresponding entry in C1 to be dropped. Thus, deletes are not executed instantly, but efficiently. One has to keep in mind the added filtering cost during finds, however, needed to check whether an entry is still valid or due to be deleted.

When are LSM-Trees used?

Finds generally are not the strongest aspect of LSM-Trees. They have been developed for scenarios where many entries are added over time, with only occasional retrieves of data being performed. An often-used example are logging scenarios, where most entries are stored and seldomly accessed afterwards. In general, finds can be performed on an LSM-Tree as one would on regular index structures like a B-Tree, and the performance is similar in this case. Some additional costs may come from reading multiple components and possibly needed filtering of the results, for example for deleted entries. Thus, the LSM-Tree does not provide any advantages for scenarios where frequent finds are needed.

As it is highly beneficial to mentioned scenarios with mostly inserts, LSM-Trees have been improved on ever since. Additional possibilities have already been explored in the paper first proposing them, and many of them are common practice today. One of the most important aspects is the possibility to build a LSM-Tree with more than two components. The core idea remains the same, and only one component is held in main memory, the remaining ones on disk. All components but one are then configured with threshold sizes and entries are migrated through them by asynchronous merge processes between them. Used correctly, multiple component may provide increased performance.

This improvement is only one of many made on the original idea of LSM-Trees. Maybe this constant improvement is another reason for why LSM-Trees are still popular today. We refer the interested reader to the original paper as well as newer ones detailing such improvements to be the judge to that [2].

The fact remains, though, that even more than 20 years after their first mention, LSM-Trees are still popular, proving that even in a field as fast-developing as Information Technologies, a good idea may prevail.

[1] https://en.wikipedia.org/wiki/Log-structured_merge-tree
[2] https://dl.acm.org/doi/10.1145/3299869.3300097