Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Verzerrungen und Fehlerquellen in Statistik und Maschinellem Lernen (Wintersemester 2023/2024)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2023 - 31.10.2023
  • Lehrform: Projektseminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Deutsch
  • Maximale Teilnehmerzahl: 15

Studiengänge, Modulgruppen & Module

IT-Systems Engineering BA

Beschreibung

While we usually discuss how we can identify even better performing methods for data analysis, quite commonly in reality many data science projects fail since seemingly little steps are overlooked. Further, some of these fallacies are fairly obvious - when clearly presented. If well hidden in large datasets, they easily remain unidentified and lead to incorrect conclusions, even though purely technically all steps of a data analysis may have been performed correctly.

Within this project seminar, we aim to identify these mishaps and pitfalls and discuss strategies to overcome them, both with regard to concrete examples, but also on a more general perspective.

Learning objectives

  • You learn to identify common mishaps in data analysis
  • You learn strategies to circumvent these mishaps
  • You learn to identify open challenges in data analysis
  • You can present a scientific manuscript in this field and lead a discussion
  • You can practise skills presenting and writing in English and gain experience

Voraussetzungen

You should have some mathematical/statistical background (Math 3 or similar) and good knowledge of English (at least high school level).

Literatur

Altman Douglas G, Bland J Martin. Statistics notes: Absence of evidence is not evidence of absence BMJ 1995; 311 :485
Efron, Bradley & Morris, Carl. (1977). Stein's Paradox in Statistics. Scientific American - SCI AMER. 236. 119-127. 10.1038/scientificamerican0577-119.
Miguel A Hernán, David Clayton, Niels Keiding, The Simpson's paradox unraveled, International Journal of Epidemiology, Volume 40, Issue 3, June 2011, Pages 780–785
Pearl, J. (2014, 10 3). Lord’s paradox revisited – (Oh Lord! Kumbaya!). Technical Report.
Griffith, G.J., Morris, T.T., Tudball, M.J. et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun 11, 5749 (2020)
Daniel Westreich, Noah Iliinsky, Epidemiology Visualized: The Prosecutor's Fallacy, American Journal of Epidemiology, Volume 179, Issue 9, 1 May 2014, Pages 1125–1127
Robert, C. (2014). On the Jeffreys-Lindley Paradox. Philosophy of Science, 81(2), 216-232. doi:10.1086/675729
Whalen, S., Schreiber, J., Noble, W. S., & Pollard, K. S. (2021). Navigating the pitfalls of applying machine learning in genomics. Nature Reviews Genetics, 1-13.
https://towardsdatascience.com/be-careful-when-interpreting-predictive-models-in-search-of-causal-insights-e68626e664b6

Lern- und Lehrformen

  • Seminar for bachelor students 
  • Language of instruction: English
  • Maximum number of participants: 20

Topics will be presented in the first session (October 17. 2023). For topic assignments, participants will have to participate in a doodle by October 24, 2023. In case of too many applicants, we will decide randomly. As first talks will be scheduled November 6, October 31st  will be the last time point to de-register from the class.
Please register on the moodle of the course (XXX) for further information.

Leistungserfassung

In the seminar, each participant will give a presentation about a predefined topic and design a case study. The final grade consists of the following three parts:

  • Presentation and discussion (40%)
  • Case Study and Dataset (40%)
  • Discussion Participation (20%)

Termine

First session October 17, 2023.

Zurück