For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
Former PhD student of the HPI Research School at University of Potsdam Email: Johannes Lorey
Large-Scale Data Analysis and Processing
Linked Data Management
Identifying and Determining SPARQL Endpoint Characteristics
International Journal of Web Information Systems
Publicly accessible SPARQL endpoints contain vast amounts of knowledge from a large variety of domains. Utilizing the structured query language, users can consume, integrate, and present data from such Linked Data sources for different application scenarios. However, oftentimes these endpoints are not configured to process specific workloads as efficiently as possible. Implemented restrictions further impede data consumption, e.g., by limiting the number of results returned per request. Assisting users in leveraging SPARQL endpoints requires insight into functional and non-functional properties of these knowledge bases. In this work, we introduce several metrics that enable universal and fine-grained characterization of arbitrary Linked Data repositories. We present comprehensive approaches for deriving these metrics and validate them through extensive evaluation on real-world SPARQL endpoints. Finally, we discuss possible implications of our findings for data consumers