Prof. Dr. Felix Naumann

Aggregation Detection in Verbose CSV Files


Here we list the datasets and their annotations used in our project. Note that due to license issues, only publicly distributable datasets are listed here. Each link points to a compressed json file that includes both the verbose CSV files and their annotations.

The validation dataset comprises files from the Troy and the EUSES datasets, while the unseen dataset comprises files from the SAUS and the CIUS datasets.

Dataset# files# aggregationsDescription
Validation38520,280The Statistical Abstract of the United States (SAUS) from 2010.
Unseen815,8541000 tables collected from international statistical websited by DocLab graduate students in 2009-2010. This dataset includes 200 sample files of them.


Annotation Tool

Coming soon.