Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

Dynamic Programming and Reinforcement Learning

General Information

  • Teaching staff: Dr. Rainer Schlosser, Alexander Kastius
  • 6 ECTS (graded), 4 Semesterwochenstunden (SWS)
  • Lecture format: VL/UE
  • Enrollment time: 01.04.2022 - 30.04.2022
  • Time: Monday 15.15-16.45 and Thursday 13.30-15.00
  • Room: Mon L-1.02 + Thu L-1.06 (online via Zoom https://zoom.us/j/7271364393 Password: 256757)
  • Language: English/German
  • Program:
    • IT-Systems Engineering: BPET, OSIS
    • Data Engineering: DATA
    • Digital Health: SCAD, DICR (see preconditions)
  • Next lecture: Monday May 23 (in person/hybrid)

Short Description

The need for automated decision-making is steadily increasing. Hence, data-driven decision-making techniques are essential. We assume a system that follows certain dynamics and has to be tuned or controlled over time such that certain constraints are satisfied and a specified objective is optimized. Typically, the current state of the system as well as the interplay of rewards and potential future states associated to certain actions have to be taken into account. The dynamics and state transitions may have to be estimated from data using suitable ML-based techniques.

As, in general, exact solution approaches of such dynamic optimization problems do not scale often heuristics have to be used (e.g., in case the number of states is too large, cf. curse of dimensionality). Besides classical approaches such as dynamic programming (DP) state-of-the-art heuristic optimization techniques such as approximate dynamic programming (ADP) or reinforcement learning (RL) are suitable alternatives.

Goals of the Course

Understand...

  • opportunities and challenges of decision-making
  • static deterministic problems
  • stochastic dynamic problems
  • optimization models and solution techniques

Do ...

  • work in small teams
  • set up suitable models, apply optimization techniques 
  • simulate controlled processes, compare performance results

Improve/Learn ...

  • mathematical, analytical, and modelling skills
  • optimization techniques
  • dynamic programming methods
  • reinforcement learning methods

Preconditions

  • interest in quantitative methods and stochastics
  • programming skills/experience
  • the number of participants is not restricted

Teaching and Learning Process

The course is a combination of a lecture and a practical part:

  • teachers impart relevant knowledge and methods
  • students work on a self-containing topic in a team of ca. 3 people
  • students present and document their work

Grading

Portfolio assessment for ITSE, DE, and DH-students consisting of:

  • (i) final presentation of project results (July 18)
  • (ii) project documentation at the end of the module (Sep 15)

Material/Preparation

Slides and Upcoming Topics

  • 1. Week: First Introduction (online) (April 21)
  • 2. Week: Finite Time Markov Decision Processes (online) + Infinite Time MDPs (in person) (April 25/28)
  • 3. Week: Approximate Dynamic Programming (ADP) + Implementation Exercise + (May 2/5)
  • 4. Week: Q-Learning (QL)D-E.9/10 (May 12, not 9)
  • 5. Week: Deep Q-Networks (DQN) + 2.Teil + DQN Extensions (May 16/19)
  • 6. Week: Implementations & Open AI Gym (May 23, not 26)
  • 7. Week: Policy Gradient Algorithms + Policy Gradient Algorithms 2 (May 30, June 2)
  • 8. Week: Project Assignments (June 9, not 6)
  • 9. Week: Project/Feedback (June 13/16)
  • 10. Week: Project/Feedback (June 20/23)
  • 11. Week: Project/Feedback (June 27/30)
  • 12. Week: Project/Feedback (July 4/7)
  • 13. Week: Project/Feedback (July 11/14)
  • 14. Week: Final Presentations (July 18/21)
  • 15. Week: Project/Feedback (July 25)
  • Documentations: Deadline September 15 (ca. 15 pages, e.g., LNCS)

Exercises:

Material: