Processing Web Tables (Sommersemester 2019)

Dozent: Prof. Dr. Felix Naumann (Information Systems) , Leon Bornemann (Information Systems) , Dr. Hazar Harmouch (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/teaching/ss-19/processing-web-tables.html

Allgemeine Information

Semesterwochenstunden: 4
ECTS: 6
Benotet: Ja
Einschreibefrist: 26.04.2019
Lehrform: Vorlesung / Seminar
Belegungsart: Wahlpflichtmodul
Lehrsprache: Englisch
Maximale Teilnehmerzahl: 6

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA

OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge

Data Engineering MA

CODS: Complex Data Systems
- HPI-CODS-K Konzepte und Methoden
CODS: Complex Data Systems
- HPI-CODS-T Techniken und Werkzeuge
CODS: Complex Data Systems
- HPI-CODS-S Spezialisierung
PREP: Data Preparation
- HPI-PREP-K Konzepte und Methoden
PREP: Data Preparation
- HPI-PREP-T Techniken und Werkzeuge
PREP: Data Preparation
- HPI-PREP-S Spezialisierung

Beschreibung

Tables on the web are a significant source of structured information. In a large-scale crawling effort in 2008, Cafarella et al. extracted 14.1 billion tables from billions of HTML webpages. While many webtables are used for layout purposes only, there are still much more tables that contain high-quality and structured information. Cafarella et al. estimate that 154 million of the 14.1 billion tables contain relational data, i.e, database alike tables. Even just the English version of Wikipedia contains more than 1 million tables as of 11/2017.

However, making use of webtables automatically is challenging: the tables usually contain few records and are designed to be read by humans, not machines. The main use cases of webtables are:

Knowledge base augmentation or creation (meaning the extraction of structured information in RDF Form)
Searching or Querying large table Corpora (find relevant tables in response to a given textual or table query)
Table join (find the set of tables joinable with a query table)
...

The above use-cases demand solutions for many different problems, which include, but are not limited to:

Detection of genuine (relational) tables
Header (Row/Rows/Column/Columns) Detection
Schema Normalization
...

In this seminar, we will introduce you to the research area of webtables. Each team, ideally consists of 2 people, will implement a solution for one of the above mentioned problems (or any other relevant problem in the research area of webtables they found it interesting). We will provide you with state of the art papers that suggest solutions to the above problems which you will implement and can try to improve upon with your own ideas in scalable way.

Termine

See chair webpage

Zurück