Topic models automatically learn probabilistic representations for documents and their underlying semantic topics. In this project, we extend state-of-the-art topic models for new applications and compare and combine them with other document representations.
Combining several text collections into a joint, large dataset can reveal connections between apparently unrelated documents. However, usual text mining approaches cannot deal with different document styles and collection-specific language use. In this project, we jointly model documents despite linguistic differences for various tasks, such as clustering, classification, recommendation, or retrieval. For example, we allow to measure document similarity on a semantic level across patents and scientific papers or newspaper articles and tweets.