Bachelor-Project ProCSIA: Column Store Benchmarks
Fabian Tschirschnitz, January 6, 2011
Authors: Philipp Langer, Florian Westphal, Marian Gawron, Andreas Henning, Fabian Tschirschnitz, Patrick Schulze, Gary Yao, Michael Wolowyk
Introduction
Roughly one week ago the bachelor project team “ProCSIA” (Profiling Column Stores with IBM’s Information Analyzer) evaluated six mainly column-oriented DBMSs in order to find out which one of them works fastest on a given OLAP workload and on simple profiling operations.
These were mostly open source projects (in the case of Sybase IQ 15.2 a free evaluation copy was used):
This test was conducted using a TPC-H benchmark (size: 1 GiB) as well as our own simple benchmark for data profiling-like queries. The second benchmark contains column-based operations, such as calculating a minimum (select min attr from table) or finding out about a column’s frequency distribution (select attr, count(*) from table group by attr). The TPC-H SQL queries as well as the queries used in our own benchmark are attached at the end of this blog post.
Please be aware of the fact that the trial versions of the DBMSs listed above are not comprehensive enough to make solid comparisons of actual performance. However, these trials provided us a hint on which column stores are suited best for our profiling tasks.
We ran the benchmark on a Dell Optiplex 745 with an Intel Core 2 CPU 6600 @ 2,40 GHz and 4 GB of RAM of which 3,25 GB could actually be used on a 32-bit operating system.
DISCLAIMER: The bachelor project team performed these tests to the best of their knowledge. However, they are no many-years database performance tweaking engineers. Also, they aimed to find out how fast the systems are in a none-tweaked mode.
Results
Benchmark 1
We did not execute the TPC-H benchmark on MetaKit as it does not provide an SQL interface. In the figures below you can see how each DBMS performed (for detailed information about the queries, see the end of this blog post). It is interesting to see how different DBMSs perform on this small amount of data. We plan to test the DBMSs with a 100 GiB TPC-H database to find out about scaling issues.
Figure 1: TPC-H 1G load times