Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

GermEval 2019
Our paper hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model has been accepted at the GermEval Workshop at KONVENS 2019. It is a joint research project of the Hasso Plattner Institute at the University of Potsdam (HPI) and the Research Group on Deliberative Discussions in the Social Web at the Heinrich Heine University Düsseldorf (DEDIS). We published a preprint here and our source code on GitHub here.
Data Repeatability in Web Science
Our article Measuring and Facilitating Data Repeatability in Web Science (Link to pre-print) has been accepted for publication in the Datenbank-Spektrum Journal. We publish source code corresponding to this article here.
Toxic Comment Classification
We participated in the Toxic Comment Classification Challenge (Link), which was a Kaggle challenge with the goal to identify and classify toxic online comments. In collaboration with our colleagues from the DATEXIS group at Beuth Hochschule für Technik Berlin, we finished in the top 2% of the leaderboard and achieved 54th place out of 4551 teams.
Aggression Identification
We participated in the First Shared Task on Aggression Identification (Link), which is part of the First Workshop on Trolling, Aggression and Cyberbullying at the 27th International Conference of Computational Linguistics (COLING 2018). Our team achieved 2nd place out of 30 teams at the task of classifying social media posts as ‘Overtly Aggressive’, ‘Covertly Aggressive’, or ‘Non-aggressive’ on an unseen test dataset. Our implementation and our augmented dataset is published here. The paper is uploaded here.
Semi-Automated Comment Moderation
You can find code that accompanies our paper Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom here. The paper itself can be found here.

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
Fax: +49 (0)331 5509-287
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions