22.02.2022

Demo Paper Accepted at SIGMOD 2022

Gerardo Vitagliano, Lucas Reisener, Lan Jiang, Mazhar Hameed, Felix Naumann

We are excited to announce that our demo paper titled "Mondrian: Spreadsheet Layout Detection" has been accepted at the Annual ACM SIGMOD/PODS Conference 2022.

Abstract

Spreadsheet datasets are valuable sources of data, but often ill-suited for machine-consumption. Their unstructured nature allows users to arrange data and metadata freely in a human-readable format, often in canvas-like layouts. To extract their content, data practitioners need to resort to manual inspection and run cumbersome preparation pipelines. The Mondrian system is designed to assist users in identifying and handling multiregion layout templates: spreadsheet layouts composed of independent regions that appear repeatedly across different files. Mondrian comprises an automated approach to detect multiple regions within a single file and an algorithm that leverages mapping region layouts to graphs to compute layout similarity and identify templates. Users interact with Mondrian through a web-based visual interface, that serves as a practical toolkit to handle collections of multiregion spreadsheets and enables their automated preparation.

Authors

Gerardo Vitagliano ( Hasso Plattner Institute )

Lucas Reisener ( Hasso Plattner Institute )

Lan Jiang ( Hasso Plattner Institute )

Mazhar Hameed ( Hasso Plattner Institute )

Felix Naumann ( Hasso Plattner Institute )