Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

About the Talk

Data profiling comprises a broad range of methods to efficiently extract various metadata from a given dataset, including data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and various other data dependencies. The talk highlights the key insights behind recent state of the art methods and presents various use cases in the areas of data cleaning and data integration: violations of dependencies point to errors in the data; key discovery identifies the core entities of a data source; inclusion dependencies are candidates to join up multiple sources; and in general, data profiling results can be used to organize data lakes.

About the Speaker

Felix Naumann is a full professor and head of the Information Systems group at Hasso-Plattner Institute in Potsdam, Germany. He is an active member of the research community and frequently works on the organization board of conferences, such as VLDB and SIGMOD. His group focuses, among others, on the fields metadata management, information integration, data quality, data cleaning, and data profiling.