The Hasso Plattner Institute offers a practically-oriented computer science study program at an internationally recognized institute. This study includes the Germany-wide unique IT-Systems Engineering program and the five master programs Cybersecurity, Data Engineering, Digital Health, IT-Systems Engineering and Software Systems Engineering.

Our researchers at HPI benefit from an inspiring scientific environment as well as a collaborative and inclusive atmosphere. In this environment, they obtain insights and findings that achieve societal impact. Our scientific work is structured within research clusters. In addition, we work together with scientific institutions, companies, and public institutions in numerous research programs worldwide.

The Hasso Plattner Institute in Potsdam is unique on the German academic landscape. The institute's program continues to grow with the support of its founder Hasso Plattner and through international cooperation. Find out more about the founder, events and studies at HPI.

The Hasso Plattner Institute has educational programs for both high school students and working professionals. It operates its own IT learning platform - openHPI - which provides free online courses. The Youth Academy organizes computer science camps and events for high school students. Professionals can take advantage of educational opportunities in the field of Design Thinking at the HPI Academy.

The press area of the Hasso Plattner Institute provides you with the latest press material, news, information on our social media channels and contact details.

Approximate Data Profiling (Wintersemester 2022/2023)

Lecturer: Prof. Dr. Felix Naumann (Information Systems) , Tobias Bleifuß (Information Systems) , Youri Kaminsky
Course Website: https://hpi.de/en/naumann/teaching/current-courses/ws-22-23/approximate-data-profiling.html

General Information

Weekly Hours: 4
Credits: 6
Graded: yes
Enrolment Deadline: 01.10.2022 - 30.10.2022
Examination time §9 (4) BAMA-O: 08.12.2022
Teaching Form: Project seminar
Enrolment Type: Compulsory Elective Module
Course Language: English
Maximum number of participants: 6

Programs, Module Groups & Modules

IT-Systems Engineering MA

OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge

Data Engineering MA

DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
DANA: Data Analytics
- HPI-DANA-S Spezialisierung
CODS: Complex Data Systems
- HPI-CODS-K Konzepte und Methoden
CODS: Complex Data Systems
- HPI-CODS-T Techniken und Werkzeuge
CODS: Complex Data Systems
- HPI-CODS-S Spezialisierung

Software Systems Engineering MA

Description

Data profiling is the process of extracting metadata from datasets. One important aspect is the discovery of data dependencies, such as Functional Dependencies (FDs), Inclusion Dependencies (INDs) and Unique Column Combinations (UCCs). However, the increasing size of datasets presents a challenge to traditional approaches of data profiling. Therefore, this seminar focuses on sampling-based methods for approximate data profiling.

First, the students become familiar with related work as an inspiration. Afterwards, each student team develops own ideas. These can concern both the sampling process itself or the actual discovery in the sample.

The students turn their ideas into working algorithms. There are two main goals for each algorithm:
1) Find a set of dependencies that is close to the actual solution.
2) Minimize the required runtime.
Benchmark Datasets are provided to the students.
Finally, the students present their approaches and write a short report.

Literature

Data Profiling - Synthesis Lectures on Data Management Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock, Morgan Claypool, 2019.
Sampling for Big Data Profiling: A Survey. Zhicheng Liu and Aoqian Zhang, IEEE Access, 2020.

Learning

Project seminar with weekly meetings, talks, discussions and report writing

Examination

Presentation and report

Dates

See webpage.

Zurück

HPI Merch - Now available for online order

In our HPI Shop you can now find fair produced and certified merchandise like sweaters and t-shirts made of organic cotton, bags, water bottles, an erasable notebook and other sustainable accessories.

Approximate Data Profiling (Wintersemester 2022/2023)

General Information

Programs, Module Groups & Modules

Description

Literature

Learning

Examination

Dates

HPI Merch - Now available for online order

Events

07.10.2024 | Workshop "Synergizing Data Engineering for Healthcare Innovation"

12.05.2022 | Women in Tech Conference

03.12.2024 | AI @ HPI Conference 2024