Master's Project WS 2019/20
In this master’s project, we will develop a novel approach that represents the usage of words over time as a graph. Using carefully crafted restrictions to the two-dimensional layout when drawing that graph, we will have fine-grained control to understand the resulting language models. Although this is mainly a computational problem, we will focus on visualization subsets of the model to explore the feasibility and quality of such an approach.
The visualization will show the graph structure of selected words and their context as used in the underlying text corpus. In a large timeline, where each line represents a word, it will be possible to see how they appear, split into different meanings, or disappear again. The closeness of lines in the resulting chart conveys semantic similarity. The figure above shows a sketch of an interactive visualization that we aim to develop. This will help to investigate the feasibility and quality of this approach and provide a better way for qualitative evaluation of dynamic language models.
Students will train word embeddings on documents from different time slices. In the visualization, each time slice is a vertical axis on which words are placed based on their similarity in the embedding. After connecting the words across time slices, we use the weighted edges as force constraints. An iterative algorithm will try to optimize network layout by reordering nodes (i.e. words on the axes) or splitting them. For the visualization, we only draw selected words and their neighbors, and use additional meta-data to make the timeline more interesting, for example, topic zones, frequencies as sizes, or color for highlights.