Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Mishaps in Statistics and ML (Sommersemester 2022)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics) , Dr. Katharina Baum (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 2
  • ECTS: 3
  • Benotet: Ja
  • Einschreibefrist: 01.04.2022 - 30.04.2022
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 6

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
Data Engineering MA
Cybersecurity MA
Digital Health MA
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-C Concepts and Methods
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-T Technologies and Tools
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-S Specialization
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-C Concepts and Methods
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-T Technologies and Tools
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-S Specialization


While we usually discuss how we can identify even better performing methods for data analysis, quite common in reality many data science projects fail since seemingly little steps are overlooked. Further, some of these fallacies are fairly obvious - when clearly presented. If well hidden in large datasets, they easily remain unidentified and lead to incorrect conclusions, even though purely technically all steps of a data analysis may have been performed correctly.

Within this seminar, we aim to identify these mishaps and pitfalls and discuss strategies to overcome them, both with regard to concrete examples, but also on a more general perspective.

Learning objectives

  • You learn to identify common mishaps in data analysis
  • You learn strategies to circumvent these mishaps
  • You learn to identify open challenges in data analysis
  • You can present a scientific manuscript in this field and lead a discussion


You should have some mathematical background (at least Mathe 1+2 of the ITSE bachelor or comparable) as well as have taken at least one class in statistics. Good knowledge of English is required to understand and discuss current literature.


Altman Douglas G, Bland J Martin. Statistics notes: Absence of evidence is not evidence of absence BMJ 1995; 311 :485
Efron, Bradley & Morris, Carl. (1977). Stein's Paradox in Statistics. Scientific American - SCI AMER. 236. 119-127. 10.1038/scientificamerican0577-119. 
Miguel A Hernán, David Clayton, Niels Keiding, The Simpson's paradox unraveled, International Journal of Epidemiology, Volume 40, Issue 3, June 2011, Pages 780–785
Pearl, J. (2014, 10 3). Lord’s paradox revisited – (Oh Lord! Kumbaya!). Technical Report.
Griffith, G.J., Morris, T.T., Tudball, M.J. et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun 11, 5749 (2020)
Daniel Westreich, Noah Iliinsky, Epidemiology Visualized: The Prosecutor's Fallacy, American Journal of Epidemiology, Volume 179, Issue 9, 1 May 2014, Pages 1125–1127
Robert, C. (2014). On the Jeffreys-Lindley Paradox. Philosophy of Science, 81(2), 216-232. doi:10.1086/675729
Whalen, S., Schreiber, J., Noble, W. S., & Pollard, K. S. (2021). Navigating the pitfalls of applying machine learning in genomics. Nature Reviews Genetics, 1-13.

Lern- und Lehrformen

  • Seminar for master students 
  • Language of instruction: English
  • Maximum number of participants: 7

Topics will be presented in the first session (April 25, 2022). For topic assignments, participants will have to write an e-mail by May 2nd, 2022 in which they can give preferences for up to three of the presented topics. Then, the topics will be assigned by us. In case of too many applicants, we will decide randomly. As first talks will be scheduled May 16th will be the last time point to de-register from the class.
The seminar will be conducted on site (with an hybrid option if needed). Please register in the moodle of the course (https://moodle.hpi.de/course/view.php?id=293) for further information.


In the seminar, each participant will give a presentation about a predefined topic within the research area and write a short report. The final grade consists of the following two parts:

  • Presentation and discussion (65%)
  • Written report (35%)


First session April 25, 2022.