Processing Web Tables (Sommersemester 2019)
Dozent:
Prof. Dr. Felix Naumann
(Information Systems)
,
Leon Bornemann
(Information Systems)
,
Dr. Hazar Harmouch
(Information Systems)
Website zum Kurs:
https://hpi.de/naumann/teaching/teaching/ss-19/processing-web-tables.html
Allgemeine Information
- Semesterwochenstunden: 4
- ECTS: 6
- Benotet:
Ja
- Einschreibefrist: 26.04.2019
- Lehrform: Vorlesung / Seminar
- Belegungsart: Wahlpflichtmodul
- Lehrsprache: Englisch
- Maximale Teilnehmerzahl: 6
Studiengänge, Modulgruppen & Module
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- CODS: Complex Data Systems
- HPI-CODS-K Konzepte und Methoden
- CODS: Complex Data Systems
- HPI-CODS-T Techniken und Werkzeuge
- CODS: Complex Data Systems
- HPI-CODS-S Spezialisierung
- PREP: Data Preparation
- HPI-PREP-K Konzepte und Methoden
- PREP: Data Preparation
- HPI-PREP-T Techniken und Werkzeuge
- PREP: Data Preparation
- HPI-PREP-S Spezialisierung
Beschreibung
Tables on the web are a significant source of structured information. In a large-scale crawling effort in 2008, Cafarella et al. extracted 14.1 billion tables from billions of HTML webpages. While many webtables are used for layout purposes only, there are still much more tables that contain high-quality and structured information. Cafarella et al. estimate that 154 million of the 14.1 billion tables contain relational data, i.e, database alike tables. Even just the English version of Wikipedia contains more than 1 million tables as of 11/2017.
However, making use of webtables automatically is challenging: the tables usually contain few records and are designed to be read by humans, not machines. The main use cases of webtables are:
-
Knowledge base augmentation or creation (meaning the extraction of structured information in RDF Form)
-
Searching or Querying large table Corpora (find relevant tables in response to a given textual or table query)
-
Table join (find the set of tables joinable with a query table)
-
...
The above use-cases demand solutions for many different problems, which include, but are not limited to:
In this seminar, we will introduce you to the research area of webtables. Each team, ideally consists of 2 people, will implement a solution for one of the above mentioned problems (or any other relevant problem in the research area of webtables they found it interesting). We will provide you with state of the art papers that suggest solutions to the above problems which you will implement and can try to improve upon with your own ideas in scalable way.
Termine
See chair webpage
Zurück