Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Repeatability: Cardinality Estimation: An Experimental Survey

This is a repeatability page for the paper:

Harmouch, H., Naumann, F.: Cardinality Estimation: An Experimental Survey. Proceedings of the VLDB Endowment (PVLDB). pp. 499 - 512 (2017).

The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations.

Metanome: How To`s

What is Metanome?
How to use Metanome GUI?
How to compiling Metanome and the algorithms from source?
- GitHub repositories (Metanome and Profiling Algorithms).
How to repeate the experiments in the paper?
- Scripts to automate the preprocesssing of realworld datasets can be downloaded from here.
- The code used to generation of synthetic datasets and run the experiments is available here (MetanomeTestRunner).

Materials

Results.
VLDB 2018 talk.
The tool we used to plot the figures (MagicPlot Student).

Algorithms

The following twelve algorithms are the most popular and well-know cardinality estimation algorithms:

Flajolet and Martin (FM)	1985	jar	code
Probabilistic counting with stochastic averaging(PCSA)	1985	jar	code
Linear Counting (LC)	1990	jar	code
Alon, Martias and Szegedy (AMS)	1996	jar	code
Baryossef, Jayram, Kumar, Sivakumar and Trevisan(BJKST)	2002	jar	code
LogLog	2003	jar	code
SuperLogLog	2003	jar	code
MinCount	2005	jar	code
AKMV	2007	jar	code
HyperLogLog	2008	jar	code
Bloom Filters	2010	jar	code
HyperLogLog++	2013	jar	code
Baseline used a hash table		jar	code
GEE Sampling-Based		jar	code

Datasets

All algorithms have been exhaustively tested on the following datasets:

Dataset	#Attributes	#Tuples
NCVoter	25 (of 71)	7,560,886
Openadresses-Europe	11	93,849,474

As well as 90 synthetic datasets were generated by the Mersenne Twister random number generator.

Contact

For any question, please contact Hazar Harmouch.

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Repeatability: Cardinality Estimation: An Experimental Survey

Metanome: How To`s

Materials

Algorithms

Datasets

Contact

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions