Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Project Description

Topic models automatically learn probabilistic representations for documents and their underlying semantic topics. In this project, we extend state-of-the-art topic models for new applications and compare and combine them with other document representations.

Combining several text collections into a joint, large dataset can reveal connections between apparently unrelated documents. However, usual text mining approaches cannot deal with different document styles and collection-specific language use. In this project, we jointly model documents despite linguistic differences for various tasks, such as clustering, classification, recommendation, or retrieval. For example, we allow to measure document similarity on a semantic level across patents and scientific papers or newspaper articles and tweets.

Subprojects

Project-Related Publications

  • Bunk, S., Krestel, R.: WELDA: Enhancing Topic Models by Incorporating Local Word Contexts. Joint Conference on Digital Libraries (JCDL 2018). ACM, Forth Worth, Texas, USA (2018).
     
  • Risch, J., Krestel, R.: My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections. Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL). p. 283--292 (2018).
     
  • Risch, J., Krestel, R.: What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers. Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL). pp. 40-46 (2017).
     
  • Park, J., Blume-Kohout, M., Krestel, R., Nalisnick, E., Smyth, P.: Analyzing NIH Funding Patterns over Time with Statistical Text Analysis. Scholarly Big Data: AI Perspectives, Challenges, and Ideas (SBD 2016) Workshop at AAAI 2016. AAAI (2016).