23.03.2020

Winning the Shared Task on Aggression Identification

We won the Shared Task on Aggression Identification at the workshop on Trolling, Aggression and Cyberbullying (TRAC) in five out of six subtasks. In our paper "Bagging BERT Models for Robust Aggression Identification", we describe the winning system. A pre-print can be downloaded here. The paper is under review for publication as a workshop paper at the Language Resources and Evaluation Conference (LREC 2020).

Update: The paper has been accepted and is published here. A 15min video presentation of the paper is available on YouTube.

Bagging BERT Models for Robust Aggression Identification

Authors
Julian Risch, Ralf Krestel

Abstract

Modern transformer-based models with hundreds of millions of parameters, such as BERT, achieve impressive results at text classification tasks. This also holds for aggression identification and offensive language detection, where they consistently outperform less complex models, such as decision trees. While the complex models fit training data well (low bias), they also come with an unwanted high variance. Especially when fine-tuning them on small datasets, the classification performance varies significantly for slightly different training data. To overcome the high variance and provide more robust predictions, we propose an ensemble of multiple fine-tuned BERT models based on bootstrap aggregating (bagging). In this paper, we describe such an ensemble system and present our submission to the shared tasks on aggression identification 2020 (team name: Julian). Our submission is the best-performing system for five out of six subtasks. For example, we achieve a weighted F1-score of 80.3% for task A on the test dataset of English social media posts. In our experiments, we compare different model configurations and vary the number of models used in the ensemble. We find that the F1-score drastically increases when ensembling up to 15 models, but the returns diminish for more models.