Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Schema Change Recommendation for User-curated Tables

This is the reproducibility page for our paper on "Schema Change Recommendation for User-curated Tables".

Abstract

On the web, huge corpora of tables exist, which can include millions of tables, as in the case of Wikipedia. Maintaining them can be a time-consuming task and, in the case of many authors and editors, also requires a great deal of coordination to ensure high quality, complete, consistent, and readable schemata. In this work, we investigate how to provide automatic suggestions to improve the schema of web tables, namely how to recommend schema changes. For this purpose, we derive rules from past schema changes via a lattice-based approach and then rank these rules to provide the best-fitting suggestions for each webtable.


Making use of the entire edit history of Wikipedia tables, we are able to compare our suggestions with the changes that were actually performed by editors. We show that for 75.13% of the changes in the test set, we make a correct recommendation, namely a change that was also observed on Wikipedia. In 58.66% of the cases, our recommendation even covers the entire observed change. Finally, we rank the recommendations with a mean reciprocal rank (MRR) of 0.73 and 0.69 for matches and full matches, respectively.

Data

Extracted schema changes: https://my.hidrive.com/share/59srisau3i

Code

Our source code can be found on GitHub: https://github.com/tbsblfs/schemachangerec