For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.
Mapping and Understanding the Evolution of Language
Our natural language is constantly evolving. The words we use change over time, but also their meaning or the context in which we use them. One word can even mean different things, for example, “apple” is a fruit or a company. In the research area of Natural Language Processing (NLP), there are already models that try to analyze evolving language or that automatically identify words with multiple senses. However, doing both at the same time or successfully using such models in applications for automated text processing is only now moving into the focus of current research.
Master's Project Winter 2019/20
Team: Jan Ehmüller, Lasse Kohlmeyer, Holly McKee, Daniel Paeschke
As language evolves, a word can gain new senses. For instance, the word “cloud” was once only used in the newspaper section of weather forecasts. Nowadays, it is increasingly used in the context of “Cloud Computing”. In this master’s project, we develop a novel approach that represents the usage of words over time as a graph. We address the following research question: How can we identify words with new senses? We develop an approach to rank the likelihood that one word gained a new sense over the last 200 years. Our approach was applied to the “Corpus of Historical American English” (COHA).
This project is a great foundation for future work towards the vision presented above. The analysis of the visualised sense graphs already provides valuable insights into the usage and context of terms over time. In collaboration with the Theodor-Fontane-Archiv and Digital Humanities Network we are working applying our model on other historical corpora.
As stated in our LWDA 2020 paper, we conducted a survey in which linguists annoted a list of words. They were asked to state, whether a word has gained an additional sense since 1800 or not. If you are using the data, please cite our work or contact us with any questions you might have.
Schwanhold, R., Repke, T., Krestel, R. (2021) ‘Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks’, Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change (LChange@ACL 2021), 1–6.
Languages evolve over time and the meaning of words can shift. Furthermore, individual words can have multiple senses. However, existing language models typically only reflect one word sense per word and don't deal with semantic changes over time. While there are language models that can either model semantic change of words or multiple word senses, none of them cover both aspects simultaneously. We propose a novel force-directed graph layout algorithm to draw a network of frequently co-occurring words. In this way, we are able to use the drawn graph to visualize the evolution of word senses. In addition, we hope that jointly modeling semantic change and multiple senses of words results in improvements for the individual tasks.
Ehmüller, J., Kohlmeyer, L., McKee, H., Paeschke, D., Repke, T., Krestel, R., Naumann, F. (2020) ‘Sense Tree: Discovery of New Word Senses with Graph-based Scoring’, in Proceedings of the Conference on "Lernen, Wissen, Daten, Analysen" (LWDA), 1–12.
Language is dynamic and constantly evolving: both the us-age context and the meaning of words change over time. Identifying words that acquired new meanings and the point in time at which new word senses emerged is elementary for word sense disambiguation and entity linking in historical texts. For example, cloud once stood mostly for the weather phenomenon and only recently gained the new sense of cloud computing. We propose a clustering-based approach that computes sense trees, showing how meanings of words change over time. The produced results are easy to interpret and explain using a drill-down mechanism. We evaluate our approach qualitatively on the Corpus of Historic American English (COHA), which spans two hundred years.