Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Processing Web Tables (Sommersemester 2019)

Dozent: Prof. Dr. Felix Naumann (Information Systems) , Leon Bornemann (Information Systems) , Dr. Hazar Harmouch (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/teaching/ss-19/processing-web-tables.html

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 26.04.2019
  • Lehrform: Vorlesung / Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 6

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
Data Engineering MA

Beschreibung

Tables on the web are a significant source of structured information. In a large-scale crawling effort in 2008, Cafarella et al. extracted 14.1 billion tables from billions of HTML webpages. While many webtables are used for layout purposes only, there are still much more tables that contain high-quality and structured information. Cafarella et al. estimate that 154 million of the 14.1 billion tables contain relational data, i.e, database alike tables. Even just the English version of Wikipedia contains more than 1 million tables as of 11/2017.

However, making use of webtables automatically is challenging: the tables usually contain few records and are designed to be read by humans, not machines. The main use cases of webtables are:

  • Knowledge base augmentation or creation (meaning the extraction of structured information in RDF Form)

  • Searching or Querying large table Corpora (find relevant tables in response to a given textual or table query)

  • Table join  (find the set of tables joinable with a query table)

  • ...

The above use-cases demand solutions for many different problems, which include, but are not limited to:

  • Detection of genuine (relational) tables

  • Header (Row/Rows/Column/Columns)  Detection

  • Schema Normalization

  • ...

In this seminar, we will introduce you to the research area of webtables. Each team, ideally consists of 2 people, will implement a solution for one of the above mentioned problems (or any other relevant problem in the research area of webtables they found it interesting). We will provide you with state of the art papers that suggest solutions to the above problems which you will implement and can try to improve upon with your own ideas in scalable way.

Termine

See chair webpage

Zurück