Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Data change, all the time. In this project we want to explore and understand those changes. We call this activity change exploration:  For a given, dynamic dataset, we want to efficiently capture and summarize changes at instance-, and schema-level, and enable users to effectively explore this change in an interactive and graphical fashion.

Change-cube

We choose a generic model to represent changes to a dataset. It includes the following four dimensions to represent what changed where, when, and how:

  1. Time
  2. Entity (ID)
  3. Property
  4. Value

A change c is a quadruple of the form

<Time, ID, Property, Value> or in brief <t, id, p, v>.

Its semantics is: At time t the property p of the entity identified with id was created as or changed to v.

A change-cube is a set of such changes.

For more details on our data model see our vision paper published at PVLDB:

  • Exploring Change - A New ... - Download
    [1]Bleifuß, Tobias, Leon Bornemann, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. Exploring Change - A New Dimension of Data Analytics. Proceedings of the VLDB Endowment (PVLDB). 12(2):85–98, 2018.
     

Exploration Tool

For initial exploration, we devised a set of query primitives on change-cubes. We are using these in order to understand what type of changes we can observe, so we can develop tools to help the user find the most interesting changes amongst all of the changes.

We also developed a tool that implements those query primitives. Below you can see two examples of how this tool can lead to interesting findings. In the first video, we detect disagreements in the (settlement) infobox entries of the English Wikipedia. Particularly, users do not seem to agree on whether the leader_name of the city 'Chicago' should be updated after the election or the inauguration. In the second video, we follow the traces of Chicago again, but this time we look at changes that contain 'Chicago' in the value dimension. Here we find a lot of changes on the same day that update the 'subdivision_name3' of various locations of Chicago. Some further investigation reveals that on that day two infobox templates (community area and settlements) were merged.

We demoed DBChEx at CIDR'19:

  • DBChEx: Interactive Explo... - Download
    [1]Bleifuß, Tobias, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. DBChEx: Interactive Exploration of Data and Schema Change. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019.
     

Demo Video 1

Demo Video 2

Contact

For more information on this project please contact Prof. Felix Naumann, Tobias Bleifuß or Leon Bornemann.