For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
I provide supervision for Master's theses in the area of News Comment Analysis, e.g., Toxic Comment Classification, User Engagement Prediction, Comment Recommendation, and Discussion Summarization/Visualization. Feel free to schedule an informal meeting with me to discuss details of these topics and/or your own ideas.
Real or Fake? Large-Scale Validation of Identity Leaks
Maschler, Fabian; Niephaus, Fabio; Risch, Julian
47. Jahrestagung der Gesellschaft für Informatik (INFORMATIK)
On the Internet, criminal hackers frequently leak identity data on a massive scale. Subsequent criminal activities, such as identity theft and misuse, put Internet users at risk. Leak checker services enable users to check whether their personal data has been made public. However, automatic crawling and identification of leak data is error-prone for different reasons. Based on a dataset of more than 180 million leaked identity records, we propose a software system that identifies and validates identity leaks to improve leak checker services. Furthermore, we present a proficient assessment of leak data quality and typical characteristics that distinguish valid and invalid leaks.