Prof. Dr. Felix Naumann

Schema Change Recommendation for User-curated Tables

This is the reproducibility page for our paper on "Schema Change Recommendation for User-curated Tables".


On the web, huge corpora of tables exist, which can include millions of tables, as in the case of Wikipedia. Maintaining them can be a time-consuming task and, in the case of many authors and editors, also requires a great deal of coordination to ensure high quality, complete, consistent, and readable schemata. In this work, we investigate how to provide automatic suggestions to improve the schema of web tables, namely how to recommend schema changes. For this purpose, we derive rules from past schema changes via a lattice-based approach and then rank these rules to provide the best-fitting suggestions for each webtable.

Making use of the entire edit history of Wikipedia tables, we are able to compare our suggestions with the changes that were actually performed by editors. We show that for 75.13% of the changes in the test set, we make a correct recommendation, namely a change that was also observed on Wikipedia. In 58.66% of the cases, our recommendation even covers the entire observed change. Finally, we rank the recommendations with a mean reciprocal rank (MRR) of 0.73 and 0.69 for matches and full matches, respectively.


Extracted schema changes: https://my.hidrive.com/share/59srisau3i


Our source code can be found on GitHub: https://github.com/tbsblfs/schemachangerec