Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Table Recognition (Sommersemester 2021)

Dozent: Prof. Dr. Felix Naumann (Information Systems) , Gerardo Vitagliano (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/current-courses/ss-21/table-recognition.html

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 18.03.2021 - 09.04.2021
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 6

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
Data Engineering MA


Structured files, like spreadsheets, are valuable sources of data, but often ill-suited for machine-consumption. Although spreadsheets contain cells in a grid-like structure, the data they contain is often arranged with a free layout, with no clearly defined tabular structure. Or worse, tables are arranged in several, independent regions that have to ultimately be recognized and merged by end-users which are interested in their content. In light of automated data preparation, extraction, or integration, there is great value in recognizing the presence and layout of regions, especially tables, within a spreadsheet.

Table recognition is a well-known problem, tackled by different researchers on various domains, and with different assumptions. In this seminar, we will introduce you to the research area of table recognition in spreadsheet files. Each team, ideally consisting of 2 students, will explore, implement and potentially improve on a different solution to detect and extract tables from spreadsheet files.

We will provide you with state of the art papers that suggest solutions to the above problem, which you will implement and try to improve upon with your own ideas in a scalable way. We will provide thousands of files for testing and evaluation.

Please see Website for more details.