Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

KITQAR: Data Quality for AI Applications

Partners 
  • The German Association for ElectricalElectronic & Information Technologies (VDE)
  • The International Center for Ethics in Science (IZEW) at The University of Tübingen
  • University of Cologne
 
Term December 2021 until December 2023
FundingDenkfabrik Digitale Arbeitsgesellschaft at the German Federal Ministry of Labor and Social Affairs (BMAS)
Websitehttps://www.kitqar.de/

Background

Artificial intelligence (AI) has long been part of our everyday lives and continues to be a key technology of the future-more and more applications in business and everyday life are relying on it. To develop an AI application, a lot of data is needed: Training, test, and validation data. The quality of the used data is a growing concern: This data must not only be technically sound, but it must also ensure that the application operates in a non-discriminatory manner. It's also about the origin of the data, transparency, data protection, liability, and many other issues. In addition to the development of classical data cleaning methods, it is necessary to define the quality of data more generally and thus also consider ethical and legal boundaries. Until now, there have been hardly any uniform quality standards for this data, thus, the KITQAR research project aims to close this gap.

Goal

The KITQAR research project develops a “data quality framework” of standards to evaluate the quality of data used for AI applications. This includes quality requirements for AI test, validation, and training data. The project investigates these data in a scientific-technical consortium from informatics, ethical, and legal perspectives to cover the most diverse aspects of data quality and make them measurable and testable. To work in an application-oriented manner, the project draws on both data sets from practice and synthetic data.

Broad expertise

The project has a clear practical nature and requires an interdisciplinary exchange between industry and science to discuss use cases of training data and develop proposals for their future standardization. The VDE (Association for Electrical, Electronic & Information Technologies) is leading the project, which involves scientists from the European University Viadrina in Frankfurt/Oder, the University of Tübingen, and the Hasso Plattner Institute at the University of Potsdam. In addition, the project involves numerous stakeholders from companies, civil society, trade unions, and regulations to integrate expertise from all relevant areas.