Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Repeatability: Cardinality Estimation: An Experimental Survey

This is a repeatability page for the paper:

 Harmouch, H., Naumann, F.: Cardinality Estimation: An Experimental Survey. Proceedings of the VLDB Endowment (PVLDB). pp. 499 - 512 (2017).

The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations.

Metanome: How To`s

Materials

Algorithms

The following twelve algorithms are the most popular and well-know cardinality estimation algorithms:

Flajolet and Martin (FM)  

1985           jar           code

Probabilistic counting with stochastic averaging(PCSA)

1985 jar code

Linear Counting (LC

1990 jar code

Alon, Martias and Szegedy (AMS)

1996 jar code

Baryossef, Jayram, Kumar, Sivakumar and Trevisan(BJKST)           

2002 jar code

LogLog

2003 jar code

SuperLogLog

2003 jar code

MinCount 

2005 jar code

AKMV

2007 jar code

HyperLogLog 

2008 jar code

Bloom Filters

2010 jar code

HyperLogLog++

2013 jar code

Baseline used a hash table

  jar     

code  

GEE Sampling-Based       

   jar 

 code

 

 

Datasets

All algorithms have been exhaustively tested on the following datasets:

Dataset #Attributes            #Tuples

NCVoter

25 (of 71)  7,560,886

Openadresses-Europe            

11 93,849,474


As well as 90 synthetic datasets were generated by the Mersenne Twister random number generator.

 

 

Contact

For any question, please contact Hazar Harmouch.