Ralf Krestel

You are here: Home > Publications > Conference Papers > TPDL 13

About Me
Publications
- Book Chapters
- Journal Articles
- Conference Papers
  - ESWC 21
  - CHIIR 21
  - WI 20
  - CIKM 20
  - KI 20
  - ICWSM 20
  - JCDL 20a
  - JCDL 20b
  - TPDL 19
  - ICADL 18
  - ICDIM 18
  - JCDL 18a
  - JCDL 18b
  - NAACL 18
  - ECIR 18
  - ICDM 17
  - TPDL 17
  - NLDB 16
  - WI 15
  - KI 15
  - RECSYS 13
  - TPDL 13
  - HT 13
  - WI 11
  - KI 10
  - ECDL 10
  - WI 10
  - NLDB 10
  - LAWEB 09
  - RECSYS 09
  - WI 08
  - ASWC 08
  - LAWEB 08
  - LREC 08
  - RANLP 07
  - CanadianAI 07
- Workshop Papers
- Posters & Demos
- Proceedings
- Others
Travels

TPDL 13

Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora

Abstract

Topic modeling has gained a lot of popularity as a means for identifying and describing the topical structure of textual documents and whole corpora. There are, however, many document collections such as qualitative studies in the digital humanities that cannot easily benefit from this technology. The limited size of those corpora leads to poor quality topic models. Higher quality topic models can be learned by incorporating additional domain-specific documents with similar topical content. This, however, requires finding or even manually composing such corpora, requiring considerable effort. For solving this problem, we propose a fully automated adaptable process of topic cropping. For learning topics, this process automatically tailors a domain-specific cropping corpus from a general corpus such as Wikipedia. The learned topic model is then mapped to the working corpus via topic inference. Evaluation with a real world data set shows that the learned topics are of higher quality than those learned from the working corpus alone. In detail, we analyzed the learned topics with respect to coherence, diversity, and relevance.

Full Paper

TPDL13.pdf

Conference Homepage

TPDL 2013

BibTex Entry

@InProceedings{krestel-tpdl13, author = {Nam Khanh Tran and Sergej Zerr and Kerstin Bischoff and Claudia Nieder\'{e}e and Ralf Krestel}, title = {{Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora}}, booktitle = {{Research and Advanced Technology for Digital Libraries --- International Conference on Theory and Practice of Digital Libraries (TPDL), Proceedings}}, location = {Valletta, Malta}, month = {September 22--26}, year = 2013, pages = {297--308}, publisher = {Springer}, series = {Lecture Notes in Computer Science} }

« prev| top| next »

News

Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Publication

Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.

New Photos

I added some photos from my trip to Hildesheim.