Ralf Krestel

You are here: Home > Publications > Conference Papers > JCDL 20a

About Me
Publications
- Book Chapters
- Journal Articles
- Conference Papers
  - ESWC 21
  - CHIIR 21
  - WI 20
  - CIKM 20
  - KI 20
  - ICWSM 20
  - JCDL 20a
  - JCDL 20b
  - TPDL 19
  - ICADL 18
  - ICDIM 18
  - JCDL 18a
  - JCDL 18b
  - NAACL 18
  - ECIR 18
  - ICDM 17
  - TPDL 17
  - NLDB 16
  - WI 15
  - KI 15
  - RECSYS 13
  - TPDL 13
  - HT 13
  - WI 11
  - KI 10
  - ECDL 10
  - WI 10
  - NLDB 10
  - LAWEB 09
  - RECSYS 09
  - WI 08
  - ASWC 08
  - LAWEB 08
  - LREC 08
  - RANLP 07
  - CanadianAI 07
- Workshop Papers
- Posters & Demos
- Proceedings
- Others
Travels

JCDL 20a

Hierarchical Document Classification as a Sequence Generation Task

Abstract

Hierarchical classification schemes are an effective and natural way to organize large document collections. However, complex schemes make the manual classification time-consuming and require domain experts. Current machine learning approaches for hierarchical classification do not exploit all the information contained in the hierarchical schemes. During training, they do not make full use of the inherent parent-child relation of classes. For example, they neglect to tailor document representations, such as embeddings, to each individual hierarchy level. Our model overcomes these problems by addressing hierarchical classification as a sequence generation task. To this end, our neural network transforms a sequence of input words into a sequence of labels, which represents a path through a tree-structured hierarchy scheme. The evaluation uses a patent corpus, which exhibits a complex class hierarchy scheme and high-quality annotations from domain experts and comprises millions of documents. We re-implemented five models from related work and show that our basic model achieves competitive results in comparison with the best approach. A variation of our model that uses the recent Transformer architecture outperforms the other approaches. The error analysis reveals that the encoder of our model has the strongest influence on its classification performance.

Full Paper

JCDL20a.pdf

Conference Homepage

JCDL 2020

BibTex Entry

@inproceedings{krestel-jcdl2020a, author = {Risch, Julian and Garda, Samuele and Krestel, Ralf}, booktitle = {Proceedings of the Joint Conference on Digital Libraries (JCDL)}, month = {August 1--5}, title = {Hierarchical Document Classification as a Sequence Generation Task}, year = {2020} }

« prev| top| next »

News

Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Publication

Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.

New Photos

I added some photos from my trip to Hildesheim.