For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
I provide supervision for Master's theses in the area of News Comment Analysis, e.g., Toxic Comment Classification, User Engagement Prediction, Comment Recommendation, and Discussion Summarization/Visualization. Feel free to schedule an informal meeting with me to discuss details of these topics and/or your own ideas.
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
van Aken, Betty; Risch, Julian; Krestel, Ralf; Löser, Alexander
Proceedings of the 2nd Workshop on Abusive Language Online (co-located with EMNLP)
Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task’s challenges others still remain unsolved and directions for further research are needed. To this end, we compare different approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of- the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels.