Prof. Dr. Felix Naumann

Julian Risch

I am a Ph.D. student at the Information Systems Group and a member of the HPI Research School. My research focuses on topic modeling and deep learning with applications in the field of text mining, in particular, comment analysis. Further, I am involved in projects on patent classification and book recommendation.

Source code for my publications can be found here and on GitHub.

Contact Information

Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Room: F-2.08

Phone: +49 331 5509 272

Email: Julian Risch

Open Master's Theses

I provide supervision for Master's theses in the area of News Comment Analysis, e.g., Toxic Comment Classification, User Engagement Prediction, Comment Recommendation, and Discussion Summarization/Visualization. Feel free to schedule an informal meeting with me to discuss details of these topics and/or your own ideas.


Advised Master's Theses

  • Enriching Document Embeddings With Domain Knowledge
  • Modeling News Commenters for Discussion Recommendation
  • Jointly Learning Document and Label Embeddings for Hierarchically Labeled Text
  • Context-aware Classification of News Comments
  • Quality Management for Online News Comments 


Measuring and Facilitating Data Repeatability in Web Science

Risch, Julian; Krestel, Ralf in Datenbank-Spektrum 2019 .

Accessible and reusable datasets are a necessity to accomplish repeatable research. This requirement poses a problem particularly for web science, since scraped data comes in various formats and can change due to the dynamic character of the web. Further, usage of web data is typically restricted by copyright-protection or privacy regulations, which hinder publication of datasets. To alleviate these problems and reach what we define as “partial data repeatability”, we present a process that consists of multiple components. Researchers need to distribute only a scraper and not the data itself to comply with legal limitations. If a dataset is re-scraped for repeatability after some time, the integrity of different versions can be checked based on fingerprints. Moreover, fingerprints are sufficient to identify what parts of the data have changed and how much. We evaluate an implementation of this process with a dataset of 250 million online comments collected from five different news discussion platforms. We re-scraped the dataset after pausing for one year and show that less than ten percent of the data has actually changed. These experiments demonstrate that providing a scraper and fingerprints enables recreating a dataset and supports the repeatability of web science experiments.
Weitere Informationen
Tagshpi  isg  myown  web_science