Julian Risch, Robin Ruff, Ralf Krestel
We are happy to announce that our article titled "Explaining Offensive Language Detection" was accepted as an article in the Journal for Language Technology and Computational Linguistics (JLCL). The work is based on the bachelor's thesis by Robin Ruff and is the result of a collaboration between HPI at the University of Potsdam and the University of Passau. A pre-print of the paper and the code are already published.
Abstract
Machine learning approaches have proven to be on or even above human-level accuracy for the task of offensive language detection. In contrast to human experts, however, they often lack the capability of giving explanations for their decisions. This article compares four different approaches to make offensive language detection explainable: an interpretable machine learning model (naive Bayes), a model-agnostic explainability method (LIME), a model-based explainability method (LRP), and a self-explanatory model (LSTM with an attention mechanism). Three different classification methods: SVM, naive Bayes, and LSTM are paired with appropriate explanation methods. To this end, we investigate the trade-off between classification performance and explainability of the respective classifiers. We conclude that, with the appropriate explanation methods, the superior classification performance of more complex models is worth the initial lack of explainability.