Betty van Aken, Julian Risch, Ralf Krestel, Alexander Löser
Our paper "Challenges for Toxic Comment Classification: An In-Depth Error Analysis" has been accepted for presenteation at the 2nd Workshop on Abusive Language Online, which is co-located with the Conference on Empirical Methods in Natural Language Processing (EMNLP). This paper fits into our comment analysis project and is a result of an ongoing collaboration with our colleagues at Beuth University of Applied Sciences. Together with Prof. Dr. Alexander Löser and the PhD student Betty van Aken, both from the Database Systems and Text-based Information Systems group (DATEXIS), we investigated different approaches for toxic comment classification. In particular, we conducted an error analysis to identify challenges and paths for future work. The abstract is below and the paper can be downloaded here.
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task’s challenges others still remain unsolved and directions for further research are needed. To this end, we compare different approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of- the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels.