Ralf Krestel

You are here: Home > Publications > Journal Articles > DTA 19

About Me
Publications
- Book Chapters
- Journal Articles
  - DMKD 21
  - WPI 21
  - JLCL 20
  - ARTI 20
  - DBS 19
  - DTA 19
  - NAR 17
  - DBS 17
  - TCDL 16
  - SMR 16
  - NN 15
  - NLE 14
  - IR 12
  - NEURO 12
  - IR 10
  - IS 10
- Conference Papers
- Workshop Papers
- Posters & Demos
- Proceedings
- Others
Travels

DTA 19

Domain-specific word embeddings for patent classification

Abstract

Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. To examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases. To account for this language use, we present domain-specific pre-trained word embeddings for the patent domain. We train our model on a very large dataset of more than 5 million patents and evaluate it at the task of patent classification. To this end, we propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings. Experiments on a standardized evaluation dataset show that our approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, we further investigate the model’s strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge.

Full Paper

DTA19.pdf
Download: Emerald

BibTex Entry

@Article{krestel-dta19, author = {Julian Risch and Ralf Krestel}, title = {Domain-specific word embeddings for patent classification}, journal = {Data Technologies and Applications}, volume = {53}, number = {1}, pages = {108--122}, year = {2019}, issn = {2514-9288}, doi = {10.1108/DTA-01-2019-0002}, note={Emerald} }

« prev| top| next »

News

Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Publication

Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.

New Photos

I added some photos from my trip to Hildesheim.