Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Project Description

Combining several text collections into a joint, large dataset allows new applications and reveals connections between apparently unrelated documents. However, usual text mining approaches cannot deal with different document styles and collection-specific language use. In this project, we jointly model documents despite linguistic differences for various tasks, such as clustering, classification, recommendation, or retrieval. For example, we allow to measure document similarity on a semantic level across patents and scientific papers or newspaper articles and tweets.

Subprojects

  • Jointly Modeling Patents and Scientific Papers
  • Analyzing NIH Project Proposals and Funding
  • News and Tweets

Project-Related Publications

  • Risch, J., Krestel, R.: What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers. Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL). pp. 40-46 (2017).
     
  • Park, J., Blume-Kohout, M., Krestel, R., Nalisnick, E., Smyth, P.: Analyzing NIH Funding Patterns over Time with Statistical Text Analysis. Scholarly Big Data: AI Perspectives, Challenges, and Ideas (SBD 2016) Workshop at AAAI 2016. AAAI (2016).
     
  • Krestel, R., Werkmeister, T., Wiradarma, T.P., Kasneci, G.: Tweet-Recommender: Finding Relevant Tweets for News Articles. Proceedings of the 24th International World Wide Web Conference (WWW). ACM (2015).