Prof. Dr. Felix Naumann

Structure Detection in Verbose CSV Files


Here we list the datasets and their annotations used in our project. Note that due to license issues, only publicly distributable datasets are listed here. Each link points to a compressed json file that includes both the verbose CSV files and their annotations.

Dataset# files# lines# non-empty cellsDescription
SAUS22311,598157,767The Statistical Abstract of the United States (SAUS) from 2010.
CIUS26934,556367,172The Crime In the US Census Bureau (CIUS) from 2007 and 2017.
DeEx44477,852784,229A mixture of files from ENRON, FUSE, and EUSES datasets.
Troy2004,34823,0771000 tables collected from international statistical websited by DocLab graduate students in 2009-2010. This dataset includes 200 sample files of them.


Annotation Tool

Coming soon.