Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Repeatability: Cardinality Estimation: An Experimental Survey

This is a repeatability page for the paper:

 Harmouch, H., Naumann, F.: Cardinality Estimation: An Experimental Survey. Proceedings of the VLDB Endowment (PVLDB). pp. 499 - 512 (2017).

The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations.

Metanome: How To`s

  • What is Metanome?
  • How to use Metanome GUI?
  • How to  compiling Metanome and the algorithms from source? 
  • How to repeate the experiments in the paper?
    • Scripts to automate the preprocesssing of realworld datasets can be downloaded from here
    • The code used to generation of synthetic datasets and run the experiments is available here (MetanomeTestRunner).

Materials

Algorithms

The following twelve algorithms are the most popular and well-know cardinality estimation algorithms:

Flajolet and Martin (FM)  

1985          jar          code

Probabilistic counting with stochastic averaging(PCSA)

1985jarcode

Linear Counting (LC

1990jarcode

Alon, Martias and Szegedy (AMS)

1996jarcode

Baryossef, Jayram, Kumar, Sivakumar and Trevisan(BJKST)           

2002jarcode

LogLog

2003jarcode

SuperLogLog

2003jarcode

MinCount 

2005jarcode

AKMV

2007jarcode

HyperLogLog 

2008jarcode

Bloom Filters

2010jarcode

HyperLogLog++

2013jarcode

Baseline used a hash table

 jar     

code  

GEE Sampling-Based       

  jar 

 code

 

 

Datasets

All algorithms have been exhaustively tested on the following datasets:

Dataset#Attributes           #Tuples

NCVoter

25 (of 71) 7,560,886

Openadresses-Europe            

1193,849,474


As well as 90 synthetic datasets were generated by the Mersenne Twister random number generator.

 

 

Contact

For any question, please contact Hazar Harmouch.