Project Description

Topic models automatically learn probabilistic representations for documents and their underlying semantic topics. In this project, we extend state-of-the-art topic models for new applications and compare and combine them with other document representations.

Combining several text collections into a joint, large dataset can reveal connections between apparently unrelated documents. However, usual text mining approaches cannot deal with different document styles and collection-specific language use. In this project, we jointly model documents despite linguistic differences for various tasks, such as clustering, classification, recommendation, or retrieval. For example, we allow to measure document similarity on a semantic level across patents and scientific papers or newspaper articles and tweets.

Subprojects

Entropy-Based Topic Modeling
Combining Topic Models and Word Embeddings

Project-Related Publications

Bunk, S., Krestel, R.: WELDA: Enhancing Topic Models by Incorporating Local Word Contexts. Joint Conference on Digital Libraries (JCDL 2018). ACM, Forth Worth, Texas, USA (2018).

[ Abstract ] [ BibTeX ] [ Download ]

Risch, J., Krestel, R.: My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections. Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL). pp. 283–292 (2018).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Risch, J., Krestel, R.: What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers. Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL). pp. 40–46 (2017).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Park, J., Blume-Kohout, M., Krestel, R., Nalisnick, E., Smyth, P.: Analyzing NIH Funding Patterns over Time with Statistical Text Analysis. Scholarly Big Data: AI Perspectives, Challenges, and Ideas (SBD 2016) Workshop at AAAI 2016. AAAI (2016).

[ Abstract ] [ BibTeX ] [ Download ]

Project Description

Subprojects

Project-Related Publications

Chair

News

03.04.2024 | Congratulations to the EDBT Best Paper Award!

05.03.2024 | Another Paper marked as reproducible by pVLDB Reproducibility Committee

21.01.2024 | Paper accepted at W-NUT 2024

19.12.2023 | Congratulations Dr. Gerardo Vitagliano!

13.12.2023 | Two papers accepted at EDBT Conference 2024

Project highlights

People and open positions