For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
User participation has become an integral part of news, journalism, and political communication. Social networks are not merely a representation of users' real-world relations but function as information distributors, protest platforms, and political campaign tools. News are no longer dominated by news corporations, instead, everybody can report news, discuss topics in public and share their opinions. Democracy 2.0 allows participation in political processes at a finger tip and information is available to everybody at any time. The downside of this development - fake news, hate speech, online stalking - poses a thread to the open, participatory discussion culture. In this project, we focus on the analysis of news in traditional media, e.g. by analyzing political bias in major news outlets and analyzing comments on news articles to identify, e.g., hate speech. We also investigate social networks and the information that can be extracted from them.
In today's social media, news often spread faster than in mainstream media, along with additional context and aspects about the current affairs. Consequently, users in social networks are up-to-date with the details of real-world events and the involved individuals. Examples include crime scenes and potential perpetrator descriptions, public gatherings with rumors about celebrities among the guests, rallies by prominent politicians, concerts by musicians, etc. We are interested in the problem of tracking persons mentioned in social media, namely detecting the locations of individuals by leveraging the online discussions about them. Existing literature focuses on the well-known and more convenient problem of user location detection in social media, mainly as the location discovery of the user profiles and their messages. In contrast, we track individuals with text mining techniques, regardless whether they hold a social network account or not. We observe what the community shares about them and estimate their locations. Our approach consists of two steps: firstly, we introduce a noise filter that prunes irrelevant posts using a recursive partitioning technique. Secondly, we build a model that reasons over the set of messages about an individual and determines his/her locations. In our experiments, we successfully trace the last U.S. presidential candidates through millions of tweets published from November 2015 until January 2017. Our results outperform previously introduced techniques and various baselines.
Krestel, R., Risch, J.: How Do Search Engines Work? A Massive Open Online Course with 4000 Participants.Proceedings of the Conference Lernen, Wissen, Daten, Analysen. pp. 259-271 (2017).
Massive Open Online Courses (MOOCs) have introduced a new form of education. With thousands of participants per course, lectur- ers are confronted with new challenges in the teaching process. In this pa- per, we describe how we conducted an introductory information retrieval course for participants from all ages and educational backgrounds. We analyze different course phases and compare our experiences with regular on-site information retrieval courses at university.
Gruetze, T., Krestel, R., Lazaridou, K., Naumann, F.: What was Hillary Clinton doing in Katy, Texas?Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 3-7 April, 2017. ACM (2017).
During the last presidential election in the United States of America, Twitter drew a lot of attention. This is because many leading persons and organizations, such as U.S. president Donald J. Trump, showed a strong affection to this medium. In this work we neglect the political contents and opinions shared on Twitter and focus on the question: Can we determine and track the physical location of the presidential candidates based on posts in the Twittersphere?
Jenders, M., Krestel, R., Naumann, F.: Which Answer is Best? Predicting Accepted Answers in MOOC Forums.Proceedings of the 25th International Conference Companion on World Wide Web. pp. 679-684. International World Wide Web Conferences Steering Committee (2016).
Massive Open Online Courses (MOOCs) have grown in reach and importance over the last few years, enabling a vast userbase to enroll in online courses. Besides watching videos, user participate in discussion forums to further their understanding of the course material. As in other community-based question-answering communities, in many MOOC forums a user posting a question can mark the answer they are most satisfied with. In this paper, we present a machine learning model that predicts this accepted answer to a forum question using historical forum data.
Gruetze, T., Krestel, R., Naumann, F.: Topic Shifts in StackOverflow: Ask it like Socrates.Lecture Notes in Computer Science. p. 213--221. Springer (2016).
Community based question-and-answer (Q&A) sites rely on well posed and appropriately tagged questions. However, most platforms have only limited capabilities to support their users in finding the right tags. In this paper, we propose a temporal recommendation model to support users in tagging new questions and thus improve their acceptance in the community. To underline the necessity of temporal awareness of such a model, we first investigate the changes in tag usage and show different types of collective attention in StackOverflow, a community-driven Q&A website for computer programming topics. Furthermore, we examine the changes over time in the correlation between question terms and topics. Our results show that temporal awareness is indeed important for recommending tags in Q&A communities.
Gruetze, T., Yao, G., Krestel, R.: Learning Temporal Tagging Behaviour.Proceedings of the 24th International Conference on World Wide Web Companion (WWW). p. 1333--1338. ACM (2015).
Social networking services, such as Facebook, Google+, and Twitter are commonly used to share relevant Web documents with a peer group. By sharing a document with her peers, a user recommends the content for others and annotates it with a short description text. This short description yield many chances for text summarization and categorization. Because today’s social networking platforms are real-time media, the sharing behaviour is subject to many temporal effects, i.e., current events, breaking news, and trending topics. In this paper, we focus on time-dependent hashtag usage of the Twitter community to annotate shared Web-text documents. We introduce a framework for time-dependent hashtag recommendation models and introduce two content-based models. Finally, we evaluate the introduced models with respect to recommendation quality based on a Twitter-dataset consisting of links to Web documents that were aligned with hashtags.
Krestel, R., Werkmeister, T., Wiradarma, T.P., Kasneci, G.: Tweet-Recommender: Finding Relevant Tweets for News Articles.Proceedings of the 24th International World Wide Web Conference (WWW). ACM (2015).
Twitter has become a prime source for disseminating news and opinions. However, the length of tweets prohibits detailed descriptions, instead, tweets sometimes contain URLs that link to detailed news articles. In this paper, we devise generic techniques for recommending tweets for any given news article. To evaluate and compare the different techniques, we collected tens of thousands of tweets and news articles and conducted a user study on the relevance of recommendations.
Roick, M., Jenders, M., Krestel, R.: How to Stay Up-to-date on Twitter with General Keywords.Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. CEUR-WS.org (2015).
Microblogging platforms make it easy for users to share information through the publication of short personal messages. However, users are not only interested in sharing, but even more so in consuming information. As a result, they are confronted with new challenges when it comes to retrieving information on microblogging platforms. In this paper we present a query expansion method based on latent topics to support users interested in topical information. Similar to news aggregator sites, our approach identifies subtopics to a given query and provides the user with a quick overview of discussed topics within the microblogging platform. Using a document collection of microblog posts from Twitter as an exemplary microblogging platform, we compare the quality of search results returned by our algorithm with a baseline approach and a state-of-the-art microblog-specific query expansion method. To this end, we introduce a novel, innovative semi-supervised evaluation strategy based on expert Twitter users. In contrast to existing query expansion methods, our approach can be used to aggregate and visualize topical query results based on the calculated topic models, while achieving competitive results for traditional keyword-based search with regards to mean average precision.