Ralf Krestel

You are here: Home > Publications > Workshop Papers > TRAC 20a

About Me
Publications
- Book Chapters
- Journal Articles
- Conference Papers
- Workshop Papers
  - PST 21
  - LCHANGE 21
  - WOAH 21
  - ESIDA 21
  - FAPER 20
  - LWDA 20
  - TRAC 20a
  - TRAC 20b
  - AI4HI 20
  - GermEval 19
  - MIDAS 19
  - TRAC 18a
  - ALW 18
  - GermEval 18
  - TRAC 18b
  - DSMM 18
  - BigVis 18
  - LWDA 17a
  - LWDA 17b
  - DSMM 17
  - LWDA 16
  - Q4APS 16
  - SBD 16
  - LWA 15
  - TempWeb 15
  - ENRICH 13
  - NLPFrame 10
  - TAC 09
  - DC 09
  - TAC 08
  - RSDC 08
  - LaTeCH 08
  - DUC 07
  - DUC 06
  - SD 05
  - DUC 05
- Posters & Demos
- Proceedings
- Others
Travels

TRAC 20a

Offensive Language Detection Explained

Abstract

Many online discussion platforms use a content moderation process, where human moderators check user comments for offensive language and other rule violations. It is the moderator's decision which comments to remove from the platform because of violations and which ones to keep. Research so far focused on automating this decision process in the form of supervised machine learning for a classification task. However, even with machine-learned models achieving better classification accuracy than human experts in some scenarios, there is still a reason why human moderators are preferred. In contrast to black-box models, such as neural networks, humans can give explanations for their decision to remove a comment. For example, they can point out which phrase in the comment is offensive or what subtype of offensiveness applies. In this paper, we analyze and compare four attribution-based explanation methods for different offensive language classifiers: an interpretable machine learning model (naive Bayes), a model-agnostic explanation method (LIME), a model-based explanation method (LRP), and a self-explanatory model (LSTM with an attention mechanism). We evaluate these approaches with regard to their explanatory power and their ability to point out which words are most relevant for a classifier's decision. We find that the more complex models achieve better classification accuracy while also providing better explanations than the simpler models.

Full Paper

TRAC20a.pdf

Workshop Homepage

TRAC 2020

BibTex Entry

@inproceedings{krestel-trac20a, title = {Offensive Language Detection Explained}, author = {Risch, Julian and Ruff, Robin and Krestel, Ralf}, booktitle = {Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC 2020). Workshop at LREC}, location = {Marseille, France}, OPTmonth = {May 16th}, pages = {137--143}, year = {2020} }

« prev| top| next »

News

Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Publication

Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.

New Photos

I added some photos from my trip to Hildesheim.