Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

13.06.2018

Best Paper Nomination Received at JCDL 2018

Our submission “My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections” by Julian Risch and Ralf Krestel received a best paper nomination at the 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018). The conference took place June 3–7, 2018 in Fort Worth, Texas. We are very thankful for the nomination and the recognition it represents. The paper can be found here.

Abstract

Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. ThŒese applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specifi€c vocabulary. 

We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. ThŒis model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and diff‚erences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up to 31% higher document classification accuracy. More importantly, our approach is the €first topic model that ensures disjunct general and specific word distributions, resulting in clear-cut topic representations.