Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Project Description

The comment sections of online newspapers are an important space to indulge in political discussions and discuss various opinions. These discussion forums have to be moderated due to the misuse by spammers, haters, trolls, and means of propaganda. This moderation process is very expensive and many online news providers have discontinued their comment sections. With more and more political campaigning, or even agitation being distributed over the internet, serious and safe platforms to discuss political topics are increasingly important.

In this project, we therefore analyze comments, users, and articles to understand the dynamics, the information flow, and the interactions in the comment sections. We work on detecting inappropriate comments, predicting popular news topics, identifying fake news and recommending information. Source code and datasets are available here.

Word Embeddings

We provide 300-dimensional fastText embeddings, which we pre-trained on more than 60 million comments from The Guardian: (5.5GB): Link

Associated Activities

  • Master Thesis by Victor Künstler, 2019: Modeling User Behavior in Online Discussions on News Platforms
  • Master Thesis by Johannes Filter, 2019: Context-aware Classification of News Comments
  • Master Seminar, 2018: Text Mining in Practice
  • Master Thesis by Carl Ambroselli, 2018: Quality Management for Online News Comments
  • Master Project, 2017: Hate Speech Detection
  • Master Thesis by Dustin Gläser, 2017: Detection of Inappropriate Content in Online Comments
  • Master Thesis by Christian Godde, 2016: Classification of German Newspaper Comments

Project-Related Publications

  • 1.
    Risch, J., Repke, T., Kohlmeyer, L., Krestel, R.: ComEx: Comment Exploration on Online News Platforms. Joint Proceedings of the ACM IUI 2021 Workshops co-located with the 26th ACM Conference on Intelligent User Interfaces (IUI). pp. 1–7. CEUR-WS.org (2021).
     
  • 2.
    Risch, J., Künstler, V., Krestel, R.: HyCoNN: Hybrid Cooperative Neural Networks for Personalized News Discussion Recommendation. Proceedings of the International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (WI-IAT). pp. 41–48 (2020).
     
  • 3.
    Risch, J., Krestel, R.: A Dataset of Journalists’ Interactions with Their Readership: When Should Article Authors Reply to Reader Comments?. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 3117–3124. ACM (2020).
     
  • 4.
    Risch, J., Ruff, R., Krestel, R.: Explaining Offensive Language Detection. Journal for Language Technology and Computational Linguistics (JLCL). 34, 29–47 (2020).
     
  • 5.
    Risch, J., Ruff, R., Krestel, R.: Offensive Language Detection Explained. Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC). pp. 137–143. European Language Resources Association (ELRA) (2020).
     
  • 6.
    Risch, J., Krestel, R.: Bagging BERT Models for Robust Aggression Identification. Proceedings of the Workshop on Trolling, Aggression and Cyberbullying (TRAC@LREC). pp. 55–61. European Language Resources Association (ELRA) (2020).
     
  • 7.
    Risch, J., Krestel, R.: Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions. Proceedings of the International Conference on Web and Social Media (ICWSM). pp. 579–589. AAAI (2020).
     
  • 8.
    Risch, J., Krestel, R.: Toxic Comment Detection in Online Discussions. In: Agarwal, B., Nayak, R., Mittal, N., and Patnaik, S. (eds.) Deep Learning-Based Approaches for Sentiment Analysis. pp. 85–109. Springer (2020).
     
  • 9.
    Risch, J., Stoll, A., Ziegele, M., Krestel, R.: hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model. Proceedings of the 15th Conference on Natural Language Processing (KONVENS). pp. 403–408. German Society for Computational Linguistics & Language Technology, Erlangen, Germany (2019).
     
  • 10.
    Risch, J., Krebs, E., Löser, A., Riese, A., Krestel, R.: Fine-Grained Classification of Offensive Language. Proceedings of GermEval (co-located with KONVENS). pp. 38–44 (2018).
     
  • 11.
    van Aken, B., Risch, J., Krestel, R., Löser, A.: Challenges for Toxic Comment Classification: An In-Depth Error Analysis. Proceedings of the 2nd Workshop on Abusive Language Online (co-located with EMNLP). pp. 33–42 (2018).
     
  • 12.
    Risch, J., Krestel, R.: Aggression Identification Using Deep Learning and Data Augmentation. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (co-located with COLING). pp. 150–158 (2018).
     
  • 13.
    Risch, J., Krestel, R.: Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (co-located with COLING). pp. 166–176 (2018).
     
  • 14.
    Ambroselli, C., Risch, J., Krestel, R., Loos, A.: Prediction for the Newsroom: Which Articles Will Get the Most Comments?. Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). pp. 193–199. ACL, New Orleans, Louisiana, USA (2018).
     
  • 15.
    Godde, C., Lazaridou, K., Krestel, R.: Classification of German Newspaper Comments. Proceedings of the Conference Lernen, Wissen, Daten, Analysen. pp. 299–310. CEUR-WS.org (2016).