Dr. Lisa Ehrlinger
High-quality data is the basis for decision-making in enterprises, making data quality assessment a critical concern for any organization. A few years ago, decision makers were still able to manually assess and interpret the quality of data at hand. However, with recent advances in digitalization and the deployment of artificial intelligence (AI) systems in practice, the amount of data being collected, stored, and consequently used for automated decision-making, exceeds the capabilities of humans to process it. Hence, an urgent need for automated data quality assessment and improvement methods has developed.
This lecture provides a comprehensive foundation in data quality assessment and improvement. Beginning with an overview of the field's development and various perspectives on data quality, we will explore each key data quality dimension in detail, including completeness, consistency, minimality, and diversity. For each dimension, you will learn assessment methods, measurement metrics, and the specific data error types associated with them. We will then examine different data quality tools and error pollution techniques used for evaluation purposes. The final session focuses on methodological approaches for managing data quality within organizational contexts.
This lecture is essential for future data science professionals working in companies and handling test and training data for AI systems. We will go beyond simple data preprocessing to cover comprehensive methods for managing data quality enterprise-wide.