Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Biostatistics & Epidemiological Data Analysis using R (Wintersemester 2021/2022)

Dozent: Dr. rer. nat. Stefan Konigorski (Digital Health - Machine Learning)
Website zum Kurs: https://moodle2.uni-potsdam.de/course/index.php?categoryid=2128

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2021 - 22.10.2021
  • Lehrform: Vorlesung / Übung
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 60

Studiengänge, Modulgruppen & Module

Digital Health MA
IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
Data Engineering MA


This course teaches (i) basic epidemiological concepts and (ii) biostatistical methods and their application for data analysis of large epidemiological datasets using the statistical software R (www.r-project.org) and the graphical interface RStudio (www.rstudio.com). To this aim, the class starts with an introduction to R and RStudio. R Markdown will be used as a tool for documentation and reporting of the analysis results. Next, the class covers data processing steps and introduces epidemiological study designs as well as theoretical and practical aspects of basic and more advanced biostatistical methods. In addition to classical biostatistical approaches such as linear and linear mixed models, newer methods how to deal with missing values, how to perform meta analyses, and for causal inference will be discussed and applied.


  • Introduction to R, RStudio
  • Documentation and report writing using R Markdown
  • Data setup: create, import, export datasets in R
  • Format datasets in R: transform variables and manipulate datasets
  • Descriptive statistics
  • Tables and graphics to visualize data and results
  • Epidemiological study designs and study planning
  • Introduction to statistical parameter estimation and hypothesis testing
  • Statistical methods for dealing with missing values
  • Linear and logistic regression models
  • Linear mixed models for the analysis of clustered and longitudinal data
  • Meta analysis
  • Survival analysis
  • Statistical methods for causal inference

Learning goals:

At the end of the course, the students will be able to

  • understand the main concepts of basic and more advanced biostatistical methods and select appropriate methods for data analysis of epidemiological studies
  • import and manipulate datasets in R for statistical analysis
  • perform a data analysis in R considering measurement error and missing values
  • document the analysis and report the results using R Markdown.


  • Laptop with installation of R (recommended: version 4.1.1) and RStudio (recommended: version 1.4.1717).
  • While the class is self-contained, any previous exposure to programming, data analysis, and statistics is helpful.

Lern- und Lehrformen

  • Lectures (via zoom) with interactive practical exercises in R
  • Video snippets (provided asynchronously) with additional information on the lecture content
  • Tutorials with discussion of homework


  • This class will also be open to students from the Icahn School of Medicine at Mount Sinai in New York.
  • To allow full access to the class for all students, also those that are not in Germany, the class will be performed fully virtual. All lectures and tutorials will be through zoom. They will also be recorded and made available afterwards (see Moodle for details).
  • Additionally, office hours will be offered at the HPI .
  • Condition for admission to final exam:  Hand in solutions to 10 of the 12 weekly assignments
  • Final grade: Open book take home final exam


Course times and Dates

  • Lectures: Thursdays, 3.15pm - 6.30pm (Potsdam time)
  • Tutorials: Tuesdays: 5.00pm - 6.30pm (Potsdam time)
  • In the first tutorial on October 26, 2021, problems with installing or setting up R, RStudio, or other formal/technical questions can be clarified. This is possible in person from 5pm - 6pm in room HS 2 at the HPI, or from 6pm - 7pm via zoom. It is not a regular class and no content of the class will be discussed.
  • All other lectures and tutorials will be zoom only.
  • The first lecture will be on October 28, 2021.
  • The final class will be on February 17, 2022. 

How to get access to the course

  • For obtaining the recurring zoom link, please register for the course in moodle, where the link will be posted, or write an email to Stefan Konigorski.

Time table