Tables on the web are a significant source of structured information. In a large-scale crawling effort in 2008, Cafarella et al. extracted 14.1 billion tables from billions of HTML webpages. While many webtables are used for layout purposes only, there are still much more tables that contain high-quality and structured information. Cafarella et al. estimate that 154 million of the 14.1 billion tables contain relational data, i.e, database alike tables. Even just the English version of Wikipedia contains more than 1 million tables as of 11/2017. However, making use of webtables automatically is challenging: the tables usually contain few records and are designed to be read by humans, not machines. The main use cases of webtables include knowledge base augmentation, searching or querying large table Corpora, and find the set of tables joinable with a query table.
The above use-cases demand solutions for many different tasks, which include, but are not limited to:
In this seminar, we will introduce you to the research area of webtables. Each team, ideally consists of 2 people, will implement a solution for one of the above mentioned tasks (or any other relevant problem in the research area of webtables they found it interesting). We will provide you with state of the art papers that suggest solutions to the above problems which you will implement and try to improve upon with your own ideas in scalable way.