Prof. Dr. Felix Naumann

Julian Risch

I am a Ph.D. student at the Information Systems Group and a member of the HPI Research School. My research focuses on topic modeling and deep learning with applications in the field of comment analysis. Further, I am involved in projects on patent classification and book recommendation.

Contact Information

Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Room: F-2.08

Phone: +49 331 5509 272

Email: Julian Risch

Open Master's Theses

I provide supervision for Master's theses in the area of News Comment Analysis, e.g., Toxic Comment Classification, User Engagement Prediction, Comment Recommendation, and Discussion Summarization/Visualization. Feel free to schedule an informal meeting with me to discuss details of these topics and/or your own ideas.



Domain-specific word embeddings for patent classification

Risch, Julian; Krestel, Ralf in Data Technologies and Applications 2019 .

Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. To examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases. To account for this language use, we present domain-specific pre-trained word embeddings for the patent domain. We train our model on a very large dataset of more than 5 million patents and evaluate it at the task of patent classification. To this end, we propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings. Experiments on a standardized evaluation dataset show that our approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, we further investigate the model’s strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge.
[ DOI ]
Domain-specific word embe... - Download
Further Information
Tags deep_learning  hpi  myown  patent_classification  web_science