The Janus (IANVS) Project

Data change, all the time. In this project we want to explore and understand those changes. We call this activity change exploration: For a given, dynamic dataset, we want to efficiently capture and summarize changes at instance-, and schema-level, enable users to effectively explore this change in an interactive and graphical fashion and analyze patterns in the changing data.

The art of exploration is to preserve order amid change and to preserve change amid order. (adapted from Alfred North Whitehead)

Change-cube

We choose a generic model to represent changes to a dataset. It includes the following four dimensions to represent what changed where, when, and how:

Time
Entity (ID)
Property
Value

A change c is a quadruple of the form

<Time, ID, Property, Value> or in brief <t, id, p, v>.

Its semantics is: At time t the property p of the entity identified with id was created as or changed to v. A change-cube is a set of such changes. For more details on our data model see our vision paper at VLDB 2019 (see below).

Sources

Code Repositories:
- Change Clustering Framework: Framework to cluster changes represented in a change cube
- IMDB Parser: Parser and Scraper for the data in the IMDB semi-structured text format (pre 2018)
- Natural Key Discovery in Wikipedia Tables: Supervised Learning Approach for the discovery of natural keys (entity identifiers) in relational Wikipedia tables.
- Matching Roles from Temporal Data: The complete CBRM framework to discover role matchings in temporal fact data is linked on the project page.
Datasets:
- Matching Roles from Temporal Data: All datasets relevant for this project can be found the project page.
- Structured Object Matching Across Web Page Revisions:
  - Matching of Infoboxes, Lists and Tables (Download)
  - Source-Code (Download)
- Natural Keys in Wikipedia Table Histories: 1000 Wikipedia Table Histories with annotated natural keys
  - All histories of relational Wikipedia tables with programatically annotated natural keys: Due to its large size we provide this dataset only upon request.
- Other relevant datasets
  - Wikipedia Revisions
  - IMDB pre 2018
Tools:
- DBChex

Team

Project lead: Prof. Felix Naumann
Doctoral researchers: Tobias Bleifuß and Leon Bornemann
In collaboration with: Dmitri V. Kalashnikov, and Divesh Srivastava – AT&T Labs - Research

Former members

Student assistant: Joana Bergsiek, Kshitij Kumar, Hung Nguyen
Collaborators: Theodore Johnson – AT&T Labs - Research

Publications

[1]Bornemann, Leon, Tobias Bleifuß, Dmitri V. Kalashnikov, Fatemeh Nargesian, Felix Naumann, and Divesh Srivastava. Matching Roles from Temporal Data: Why Joe Biden is Not Only President, but Also Commander-in-Chief. Proceedings of the ACM on Management of Data (PACMMOD). 1(1):1–26, 2023. DOI:https://doi.org/10.1145/3588919.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[2]Barth, Malte, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. Detecting Stale Data in Wikipedia Infoboxes. In Proceedings of the International Conference on Extending Database Technology (EDBT), 2023.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[3]Bleifuss, Tobias, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. The Secret Life of Wikipedia Tables. In Proceedings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEAData), co-located with VLDB, 2021.

[ Abstract ] [ BibTeX ] [ Download ]

[4]Bleifuß, Tobias, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. Structured Object Matching across Web Page Revisions. In IEEE International Conference on Data Engineering (ICDE), pages 1284–1295, 2021.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[5]Bornemann, Leon, Tobias Bleifuß, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. Natural Key Discovery in Wikipedia Tables. In Proceedings of The World Wide Web Conference (WWW), pages 2789–2795, 2020.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[6]Bleifuß, Tobias, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. DBChEx: Interactive Exploration of Data and Schema Change. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[7]Bleifuß, Tobias, Leon Bornemann, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. Exploring Change - A New Dimension of Data Analytics. Proceedings of the VLDB Endowment (PVLDB). 12(2):85–98, 2018.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

[8]Bornemann, Leon, Tobias Bleifuß, Dmitri Kalashnikov, Felix Naumann, and Divesh Srivastava. Data Change Exploration using Time Series Clustering. Datenbank-Spektrum. 18(2):1–9, 2018. DOI:https://doi.org/10.1007/s13222-018-0285-x.

[ Abstract ] [ BibTeX ] [ URL ]

[9]Bleifuß, Tobias, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, Vladislav Shkapenyuk, and Divesh Srivastava. Enabling Change Exploration (Vision). In Proceedings of the Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pages 1–3, 2017.

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Student projects

Master project: Vandalism Detection in Wikipedia Table Revisions
Bachelor project: Unit Testing Data for Machine Learning (with Amazon Research Berlin)
Master project: Discovering Change Dependencies

The Janus (IANVS) Project

Change-cube

Sources

Team

Publications

Student projects

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions