Ralf Krestel

You are here: Home > Publications > Workshop Papers > WOAH 21

About Me
Publications
- Book Chapters
- Journal Articles
- Conference Papers
- Workshop Papers
  - PST 21
  - LCHANGE 21
  - WOAH 21
  - ESIDA 21
  - FAPER 20
  - LWDA 20
  - TRAC 20a
  - TRAC 20b
  - AI4HI 20
  - GermEval 19
  - MIDAS 19
  - TRAC 18a
  - ALW 18
  - GermEval 18
  - TRAC 18b
  - DSMM 18
  - BigVis 18
  - LWDA 17a
  - LWDA 17b
  - DSMM 17
  - LWDA 16
  - Q4APS 16
  - SBD 16
  - LWA 15
  - TempWeb 15
  - ENRICH 13
  - NLPFrame 10
  - TAC 09
  - DC 09
  - TAC 08
  - RSDC 08
  - LaTeCH 08
  - DUC 07
  - DUC 06
  - SD 05
  - DUC 05
- Posters & Demos
- Proceedings
- Others
Travels

WOAH 21

Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format

Abstract

With the rise of research on toxic comment classification, more and more annotated datasets have been released. The wide variety of the task (different languages, different labeling processes and schemes) has led to a large amount of heterogeneous datasets that can be used for training and testing very specific settings. Despite recent efforts to create web pages that provide an overview, most publications still use only a single dataset. They are not stored in one central database, they come in many different data formats and it is difficult to interpret their class labels and how to reuse these labels in other projects. To overcome these issues, we present a collection of more than forty datasets in the form of a software tool that automatizes downloading and processing of the data and presents them in a unified data format that also offers a mapping of compatible class labels. Another advantage of that tool is that it gives an overview of properties of available datasets, such as different languages, platforms, and class labels to make it easier to select suitable training and test data.

Full Paper

WOAH21.pdf

Workshop Homepage

WOAH 2021

BibTex Entry

@inProceedings{krestel-woah21, author = {Risch, Julian and Schmidt, Philipp and Krestel, Ralf}, booktitle = {Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH), Workshop at ACL}, title = {Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format}, location = {online}, OPTmonth = {August 6th}, year = {2021} }

« prev| top| next »

News

Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Publication

Our work on Measuring and Comparing Dimensionality Reduction Algorithms for Robust Visualisation of Dynamic Text Collections will be presented at CHIIR 2021.

New Photos

I added some photos from my trip to Hildesheim.