Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Applied data science on real-world hospital data II (Wintersemester 2023/2024)

Dozent: Prof. Dr. Bernhard Renard (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2023 - 31.10.2023
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 15

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
Data Engineering MA
Digital Health MA
  • DICR: Digitalization of Clinical and Research Processes
    • HPI-DICR-C Concepts and Methods
  • DICR: Digitalization of Clinical and Research Processes
    • HPI-DICR-T Technologies and Tools
  • DICR: Digitalization of Clinical and Research Processes
    • HPI-DICR-S Specialization
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-C Concepts and Methods
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-T Technologies and Tools
  • APAD: Acquisition, Processing and Analysis of Health Data
    • HPI-APAD-S Specialization
Cybersecurity MA
Software Systems Engineering MA


Do you want to make a real impact in the field of healthcare? We invite you to seize the opportunity to become an integral part of an innovative healthcare project in collaboration with the Ernst von Bergmann Clinic, Policlinic, and MvZ. Building upon the successful "Hospital Control Center" seminar held in the summer semester of 2023, this project harnesses the knowledge acquired through prior data analysis to enhance hospital planning and, consequently, bolster public health.

As we've all come to realize during the COVID-19 pandemic, early warnings regarding the emergence of new infection waves are crucial for the well-being of the entire population, with a special emphasis on hospital capacity planning. Preventing unnecessary shutdowns of clinical operations and treatment delays, as well as avoiding the overload of hospital resources, is paramount. To avert such scenarios, we are taking action on multiple fronts

You will be engaged in the prediction of infection waves in the regions of Brandenburg and Berlin, forecasting the expected hospitalization rates, estimating patient lengths of stay, and projecting medical staff sick leave. These efforts collectively contribute to a comprehensive bed capacity forecast spanning various hospital units.


Project Phases and Tasks:

1. Current State Analysis: Extract data from EVB databases to understand current hospital conditions, including patient numbers and scheduled procedures. Display this information on the HCC dashboard for easy access.


2. Length of Stay Prediction: Building upon previous efforts, employ methods like Random Forest and Decision Trees to forecast patient lengths of stay. Enhance predictions by incorporating additional parameters such as diagnoses, disease severity scores, and lab results. Update the clinical information system (KIS) with these predictions.


3. Emergency Department Prediction: Use historical data and results from the infection prediction, along with external factors like weather and holidays, to predict emergency department admissions.


4. Respiratory Disease Incidence Prediction: Develop a predictive model for respiratory diseases using data from public sources, RKI data, hospital records, and demographic data. Provide insights into prediction uncertainties and consider creating geographical visualizations.


5. Hospitalization due to Respiratory Diseases Prediction: Capitalize on incidence predictions to construct a model projecting hospitalization rate related to respiratory illnesses. Leverage MvZ physician practice data to assess illness severity before hospitalization.


6. Medical Staff Sick Leave Prediction: Predict medical staff sick leave based on incidence predictions. This essential step ensures the accuracy of bed capacity forecasts, facilitating effective patient care.


7. Bed Capacity Forecast: Combine all findings from Phases 1 to 6 to create a comprehensive bed capacity forecast for the upcoming two weeks. Verify forecast reliability using historical data. Design the bed capacity forecast to update automatically daily based on new data, ensuring real-time resource management.


8. Geographical Disease Incidence Mapping: Enhance public awareness and support local businesses by generating user-friendly geographical visualizations depicting disease incidence and risk predictions for the Potsdam/Brandenburg region, promoting informed decision-making and community well-being.


Why Should You Join?

  1. Cutting-Edge Research: Be at the forefront of healthcare innovation, applying advanced AI and data analysis techniques.
  2. Real-World Impact: Contribute to better healthcare planning, ensuring efficient use of resources and avoiding system overloads.
  3. Hands-On Learning: Gain practical experience in data analysis of often non-ideal real-world data, different machine learning approaches, healthcare management and public health question

Learning Objectives:

  • Gain insight into clinical workflows and the intricacies of clinical data management within a healthcare setting.
  • Develop an understanding of the challenges associated with working with real-world health data, including issues related to data quality, privacy, and ethical considerations.
  • Acquire practical experience in working with large databases, learning data retrieval, manipulation, and analysis techniques.
  • Learn to integrate and harmonize diverse data sources to improve the accuracy and robustness of analytical approaches.
  • Develop the ability to critically compare and evaluate various statistical and machine learning methodologies, enabling informed decision-making in selecting the most suitable techniques for specific healthcare data analysis tasks.


  • Students interested in the project should have strong programming skills of at least one programming language (e.g. Python), familiarity with SQL, Linux and command line interface.
  • Experience in data science, statistical analysis, and/or machine learning techniques is essential.
  • While previous participation in the HCC I Bachelors seminar is beneficial, it is not a mandatory requirement for joining this project.
  • A genuine interest in conducting research, including the review and writing of scientific publications, is highly desirable.
  • A good knowledge of German is a prerequisite for this project, as many of our project partners' staff members may not have a strong command of English.
  • Upon participation, a "Verpflichtungserklärung zur Vertraulichkeit und Rechteübertragung" must be concluded, which regulates, among other things, that the code must be published under the MIT license.

In order to successfully achieve our project objectives, a high level of motivation and a substantial time commitment will be required


Presentation (40%)

Final Report (60%)


Topics for projects are presented at the first meeting and students indicate their preferences by the end of the first week; assignments are made by the second meeting. Cancellation is possible until 28 October.

The first meeting will be held on Tuesday 10/17/2023 at 09:15 in room K-1.02.