Our recent paper compares and evaluates eight index selection algorithms along different dimensions, such as solution quality, runtime, and complexity. Afterward, we assess strengths and weaknesses before we give recommendations on when to use which approach. Our extensible evaluation platform allows reproducing all results. Please contact Jan Kossmann, Stefan Halfpap, or Rainer Schlosser for questions.
Abstract:
Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and chal- lenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main chal- lenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations re- sulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selec- tion algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granular- ity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.