Ralf Krestel, Julian Risch
Two workshop papers describing our ongoing work in the field of comment analysis have been accepted for presentation at the International Conference on Computational Linguistics (COLING). The conference will be held from August 20th through 26th 2018 in Santa Fe, New Mexico. The papers will be presented at the conference’s workshop on Trolling, Aggression and Cyberbullying and are titled “Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom” and “Aggression Identification Using Deep Learning and Data Augmentation” (Julian Risch, Ralf Krestel).
Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom
Comment sections of online news providers have enabled millions to share and discuss their opinions on news topics. Today, moderators ensure respectful and informative discussions by deleting not only insults, defamation, and hate speech, but also unverifiable facts. This process has to be transparent and comprehensive in order to keep the community engaged. Further, news providers have to make sure to not give the impression of censorship or dissemination of fake news. Yet manual moderation is very expensive and becomes more and more unfeasible with the increasing amount of comments. Hence, we propose a semi-automatic, holistic approach, which includes comment features but also their context, such as information about users and articles. For evaluation, we present experiments on a novel corpus of 3 million news comments annotated by a team of professional moderators.
Aggression Identification Using Deep Learning and Data Augmentation
Social media platforms allow users to share and discuss their opinions online. However, a minority of user posts is aggressive, thereby hinders respectful discussion, and — at an extreme level — is liable to prosecution. The automatic identification of such harmful posts is important, because it can support the costly manual moderation of online discussions. Further, the automation allows unprecedented analyses of discussion datasets that contain millions of posts.
This system description paper presents our submission to the First Shared Task on Aggression Identification. We propose to augment the provided dataset to increase the number of labeled comments from 15,000 to 60,000. Thereby, we introduce linguistic variety into the dataset. As a consequence of the larger amount of training data, we are able to train a special deep neural net, which generalizes especially well to unseen data. To further boost the performance, we combine this neural net with three logistic regression classifiers trained on character and word n-grams, and hand-picked syntactic features. This ensemble is more robust than the individual single models. Our team named “Julian” achieves an F1-score of 60% on both English datasets, 63% on the Hindi Facebook dataset, and 38% on the Hindi Twitter dataset.