Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Data change, all the time. In this project we want to explore and understand those changes. We call this activity change exploration:  For a given, dynamic dataset, we want to efficiently capture and summarize changes at instance-, and schema-level, and enable users to effectively explore this change in an interactive and graphical fashion.

Change Cube

We choose a generic model to represent changes to a dataset. It includes the following four dimensions to represent what changed where, when, and how:

  1. Time
  2. Entity (ID)
  3. Property
  4. Value

A change c is a quadruple of the form

<Time, ID, Property, Value> or in brief <t, id, p, v>.

Its semantics is: At time t the property p of the entity identified with id was created as or changed to v.

A change cube is a set of such changes.

For more details on our data model see our vision paper at ExploreDB'17:

Enabling Change Exploration
Tobias Bleifuß and Theodore Johnson and Dmitri V. Kalashnikov and Felix Naumann and Vladislav Shkapenyuk and Divesh Srivastava
In Proceedings of the Fourth International Workshop on Exploratory Search in Databases and the Web, 2017 accepted

Exploration Tool

For initial exploration, we devised a set of query primitives on change cubes. We are using these in order to understand what type of changes we can observe, so we can develop tools to help the user find the most interesting changes amongst all of the changes.

We also developed a tool that implements those query primitives. Below you can see two examples of how this tool can lead to interesting findings. In the first video, we detect disagreements in the (settlement) infobox entries of the English Wikipedia. Particularly, users do not seem to agree on whether the leader_name of the city 'Chicago' should be updated after the election or the inauguration. In the second video, we follow the traces of Chicago again, but this time we look at changes that contain 'Chicago' in the value dimension. Here we find a lot of changes on the same day that update the 'subdivision_name3' of various locations of Chicago. Some further investigation reveals that on that day two infobox templates (community area and settlements) were merged.

Demo Video 1

Demo Video 2

Contact

For further information on this project please contact Tobias Bleifuß.