For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
Integrating Open Government Data with Stratosphere for more Transparency
Heise, Arvid; Naumann, Felix
Web Semantics: Science, Services and Agents on the World Wide Web
Governments are increasingly publishing their data to enable organizations and citizens to browse and analyze thedata. However, the heterogeneity of this Open Government Data hinders meaningful search, analysis, and integrationand thus limits the desired transparency.In this article, we present the newly developed data integration operators of the Stratosphere parallel data analysisframework to overcome the heterogeneity. With declaratively specified queries, we demonstrate the integration ofwell-known government data sources and other large open data sets at technical, structural, and semantic levels.Furthermore, we publish the integrated data on theWeb in a form that enables users to discover relationships betweenpersons, government agencies, funds, and companies. The evaluation shows that linking person entities of dierentdata sets results in a good precision of 98.3% and a recall of 95.2%. Moreover, the integration of large data sets scaleswell on up to eight machines.