Table Representation Learning

Francesco Pugnaloni, Lukas Laskowski, Christoph Hönes, Prof. Dr. Felix Naumann

Introduction

Representation learning (RL) aims to find meaningful representations of given objects to make them easier to process or understand. It finds application in various areas, e.g., cybersecurity, healthcare, time-series analysis, natural language processing, audio processing, and table understanding, and can be used to process data in different modalities, e.g., images, text, audio, or tabular data.

After the rise of foundation models, finding compact and uniform representations of different modalities of data became more important than ever, but while text and images have strong and consolidated representation methods, tabular data have been overlooked until recently. The research area that is trying to fill this gap is called table representation learning (TRL) and aims to extract meaningful information from tabular data to create expressive vectorial representations.

If you are interested in participating, please attend the initial session and reach out to lukas.laskowski(at)hpi.de, francesco.pugnaloni(at)hpi.de, and Christoph.Hoenes(at)hpi.de until October 19 (EOD). You can withdraw from this seminar until November 9 (when we give out the team topics) without consequences.

Goals

In this seminar, we will introduce you to the field of table representation learning, and explore together how different approaches perform in classic table-related tasks. To achieve that, we have the following plan:

Team activities: each team ideally consists of 2 students and will be assigned a specific TRL archetype, e.g., graph-based, LLM-based, word-embedding-based, etc. Your part is to choose one or more representative models from the ones proposed, implement them, and use them to solve classic table-related tasks, e.g., entity resolution, schema matching, etc.
Deliverable: The outcome of the seminar is a paper-style technical report that the teams will write collaboratively to present the results of the conducted analysis. In addition to the code, models, and datasets that have been produced.
Bonus: You will learn how to read/write a research paper and how to conduct scientific experiments and present the results in a paper.

Slides

15.10. Initial Meeting

Organization

Prerequisites

Python
Basic knowledge of machine learning and deep learning

Organization

The organizational details for this seminar are as follows:

Project seminar for master students
Language of instruction: English
6 credit points, 4 SWS
At most 6 participants (ideally, 3 teams of 2 students each)

Grading

In the seminar, each team will develop an approach and write a short report. The final grade consists of the following three parts:

Approach (35%)
Written report (35%)
Midterm presentation (10%)
Final presentation (20%)

Modules

IT-Systems Engineering MA

HPI-OSIS-(K/T/S)

Data Engineering MA

HPI-DANA-(K/T/S)
HPI-CODS-(K/T/S)

Software Systems Engineering MA

HPI-DSYS-(C/T/S)