Robust Visualisation of Dynamic Text Collections: Measuring and Comparing Dimensionality Reduction Algorithms


Visualisations are supposed to provide intuitive ways to explore large document collections. State-of-the-art approaches usually transform high-dimensional representations of documents into 2-dimensional vectors using dimensionality reduction algorithms. These vectors are then placed into a landscape hopefully retaining semantic information regarding similarity from the high-dimensional representation. Traditionally, dimensionality reduction algorithms are developed with static collections in mind. However, many ``real-world'' document collections, such as news articles, scientific literature, patents, Wikipedia, or tweets, to name a few, grow and evolve over time. Visualising the temporal change of these collections poses various challenges for out-of-the-box dimensionality reduction algorithms. In this paper, we propose strategies to adapt existing dimensionality reduction algorithms to incorporate change. These strategies ensure that landscapes at different intervals of the collection are robust with regard to spatio-temporal coherence. Furthermore, we propose metrics to measure the stability over time and compare several popular dimensionality reduction algorithms.

Full Paper


Conference Homepage

CHIIR 2021

Bibtex Entry