Hasso Plattner Institut
Imprint   Data Privacy

Ralf Krestel

You are here:   Home > Publications > Journal Articles > JLCL 20


Explaining Offensive Language Detection


Machine learning approaches have proven to be on or even above human-level accuracy for the task of offensive language detection. In contrast to human experts, however, they often lack the capability of giving explanations for their decisions. This article compares four different approaches to make offensive language detection explainable: an interpretable machine learning model (naive Bayes), a model-agnostic explainability method (LIME), a model-based explainability method (LRP), and a self-explanatory model (LSTM with an attention mechanism). Three different classification methods: SVM, naive Bayes, and LSTM are paired with appropriate explanation methods. To this end, we investigate the trade-off between classification performance and explainability of the respective classifiers. We conclude that, with the appropriate explanation methods, the superior classification performance of more complex models is worth the initial lack of explainability.

Full Paper


BibTex Entry


Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Photos

I added some photos from my trip to Hildesheim.

Powered by CMSimple| Template: ge-webdesign.de| Login