Dynamic Programming and Reinforcement Learning

General Information

Teaching staff: Dr. Rainer Schlosser, Alexander Kastius
6 ECTS (graded), 4 Semesterwochenstunden (SWS)
Lecture format: SE / UE
Enrollment time: 01.04.2023 - 07.05.2023
Time: Tuesday 15.15-16.45 and Wednesday 13.30-15.00
Room: D-E.9/10 (new)
Language: English/German
Program:
- IT-Systems Engineering: BPET, OSIS
- Data Engineering: DANA
- Digital Health: SCAD, DICR (see preconditions)
- Software Systems Engineering: DSYS, MALA, MODA
Project phase: individual appointments (in person/hybrid)
Final Presentations: July 18/19
First meeting: Tue April 18
Final Presentations: July 19

Short Description

The need for automated decision-making is steadily increasing. Hence, data-driven decision-making techniques are essential. We assume a system that follows certain dynamics and has to be tuned or controlled over time such that certain constraints are satisfied and a specified objective is optimized. Typically, the current state of the system as well as the interplay of rewards and potential future states associated to certain actions have to be taken into account. The dynamics and state transitions may have to be estimated from data using suitable ML-based techniques.

As, in general, exact solution approaches of such dynamic optimization problems do not scale often heuristics have to be used (e.g., in case the number of states is too large, cf. curse of dimensionality). Besides classical approaches such as dynamic programming (DP) state-of-the-art heuristic optimization techniques such as approximate dynamic programming (ADP) or reinforcement learning (RL) are suitable alternatives.

Goals of the Course

Understand...

opportunities and challenges of decision-making
static deterministic problems
stochastic dynamic problems
optimization models and solution techniques

Do ...

work in small teams
set up suitable models, apply optimization techniques
simulate controlled processes, compare performance results

Improve/Learn ...

mathematical, analytical, and modelling skills
optimization techniques
dynamic programming methods
reinforcement learning methods

Preconditions

interest in quantitative methods and stochastics
programming skills/experience
the number of participants is not restricted

Teaching and Learning Process

The course is a combination of a lecture and a practical part:

teachers impart relevant knowledge and methods
students work on a self-containing topic in a team of ca. 3 people
students present and document their work

Grading

Portfolio assessment for ITSE, DE, DH, and SSE-students consisting of:

(i) final presentation of project results (July 18/19)
(ii) project documentation at the end of the module (Sep 15)

Material/Preparation

Slides and Upcoming Topics

Apr 18: Introduction
Apr 19: Finite Time Markov Decision Processes (MDP)
Apr 25: Infinite Time MDPs
Apr 26: Approximate Dynamic Programming (ADP)
May 2: Q-Learning (QL)
May 3: Deep Q-Networks (DQN)
May 9: Policy Gradient (PG) Algorithms
May 10: PG Algorithms extended
May 16: Project Assignments
May 17: No course
May 23: Implementations & Open AI Gym
May 24: Further RL Examples & Project/Feedback
May: Project Overview (30/31)
June: Project/Feedback (6/7, 13/14, 27/28)
June 20/21: Intermediate Presentations
July: Project/Feedback (4/5, 11/12, 18/19, 25/26)
Final Presentations (July 19)
Documentations: Deadline September 15 (ca. 15 pages, e.g., LNCS)

Exercises:

Exercise 1 (see last slide of April 19): send a mail with V_0(N) or use moodle (hand in until April 25 before the lecture)
Exercise 2 (see last slide of April 25): send a mail with V(10) or use moodle (hand in until May 2 before the lecture)
Exercise 3 (tbd) (see last slide of May 2): send a mail with max_a {Q(s,a)} -values or use moodle (hand in until May 9 before the lecture)

Material: