Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Tracking Data Evolution through Wikipedia Page Revisions

A considerable amount of useful information on Wikipedia is encoded as tables. An extensive corpus of prior work addresses the problem of making these tables interpretable by algorithms even though they are designed to be human-readable. Most of these works focus only on a snapshot of these tables at a certain point in time. However, the evolution of these tables over time represents valuable information that has barely been tapped, enabling various applications including visual change exploration and trust assignment. To realize the full potential of this information, it is critical to match tables and cells within tables across page revisions.

Datasets

Here we provide the gold standard as well as the output of our matching algorithm for both table and cell level matching.

Table matching:

  • Identity graph

  • Lineage graph silver standard (Version 0 ZIP)

Cell matching:

  • Gold standard (Version 0 ZIP)
  • Output

    • Structured (Version 0 ZIP)
    • Individual (Version 0 ZIP)

Version 1.0 will be available at the time of publication of our corresponding paper.