Here we list the datasets and their annotations used in our project. Note that due to license issues, only publicly distributable datasets are listed here. Each link points to a compressed json file that includes both the verbose CSV files and their annotations.
|Dataset||# files||# lines||# non-empty cells||Description|
|SAUS||223||11,598||157,767||The Statistical Abstract of the United States (SAUS) from 2010.|
|CIUS||269||34,556||367,172||The Crime In the US Census Bureau (CIUS) from 2007 and 2017.|
|DeEx||444||77,852||784,229||A mixture of files from ENRON, FUSE, and EUSES datasets.|
|Troy||200||4,348||23,077||1000 tables collected from international statistical websited by DocLab graduate students in 2009-2010. This dataset includes 200 sample files of them.|