Data Preparation for Science (Wintersemester 2018/2019)
Lecturer:
Prof. Dr. Felix Naumann
(Information Systems)
,
Lan Jiang
(Information Systems)
Course Website:
https://hpi.de/naumann/teaching/teaching/ws-1819/data-preparation-for-science-ps-master.html
General Information
- Weekly Hours: 4
- Credits: 6
- Graded:
yes
- Enrolment Deadline: 26.10.2018
- Teaching Form: Project seminar
- Enrolment Type: Compulsory Elective Module
- Course Language: German
Programs, Module Groups & Modules
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- DATA: Data Analytics
- HPI-DATA-K Konzepte und Methoden
- DATA: Data Analytics
- HPI-DATA-T Techniken und Werkzeuge
- DATA: Data Analytics
- HPI-DATA-S Spezialisierung
- PREP: Data Preparation
- HPI-PREP-K Konzepte und Methoden
- PREP: Data Preparation
- HPI-PREP-T Techniken und Werkzeuge
- PREP: Data Preparation
- HPI-PREP-S Spezialisierung
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-C Concepts and Methods
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-T Technologies and Tools
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-S Specialization
Description
Data preparation is the process of transforming data before serving them to downstream tasks, such as data analytics, data cleaning, and machine learning. Conducting experiments on data from various sources, and reproducing results of previous experiments are two tasks that data scientists frequently perform. However, much data do not meet the requirements of experiments, leading scientists to spend a lot time on data preparation. It is reported that preparing data is both labor-intensive and tedious work, which accounts for 50%-80% of the time spent for the whole data lifecycle.
In this seminar, students will learn about common data preparation methods used in scientific work. Following some introductive presentations in the beginning of the seminar, students will learn the difficulty of scientific data preparation by trying to repeat experiments. They will investigate about what contributes to an efficient and robust data preparation method for scientific tasks. Students will try to extend the data preparation system by integrating their proposals of more advanced preparation tasks.
Students will work in small groups or on their own on the following fields:
· Repeat experiments from existing literature.
· Form an investigative report about what components may contribute to better scientific data preparation.
· Extend a data preparation system by implementing one component (e.g., data provenance, preparation optimization, preparation suggestion, error handling)
Examination
Presentation (50%) and project-implementation (50%)
Zurück