Meta-Reinforcement Learning for Self-Adaptive Systems (Wintersemester 2022/2023)

Dozent: Prof. Dr. Holger Giese (Systemanalyse und Modellierung) , Christian Medeiros Adriano (Systemanalyse und Modellierung) , He Xu (Systemanalyse und Modellierung) , Sona Ghahremani (Systemanalyse und Modellierung)

Allgemeine Information

Semesterwochenstunden: 4
ECTS: 6
Benotet: Ja
Einschreibefrist: 01.10.2022 - 31.10.2022
Prüfungszeitpunkt §9 (4) BAMA-O: 01.02.2023
Lehrform: Projektseminar
Belegungsart: Wahlpflichtmodul
Lehrsprache: Englisch

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA

SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-K Konzepte und Methoden
SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-T Techniken und Werkzeuge
SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung

Data Engineering MA

DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
DANA: Data Analytics
- HPI-DANA-S Spezialisierung
SYSE: Systems Engineering
- HPI-SYSE-K Konzepte und Methoden
SYSE: Systems Engineering
- HPI-SYSE-T Techniken und Werkzeuge
SYSE: Systems Engineering
- HPI-SYSE-S Spezialisierung

Digital Health MA

Software Systems Engineering MA

MALA: Machine Learning and Analytics
- HPI-MALA-C Concepts and Methods
MALA: Machine Learning and Analytics
- HPI-MALA-T Technologies and Tools
MODA: Models and Algorithms
- HPI-MODA-C Concepts and Methods
MODA: Models and Algorithms
- HPI-MODA-T Technologies and Tools
MODA: Models and Algorithms
- HPI-MODA-S Specialization

Beschreibung

Context

Reinforcement learning (RL) agents deployed in the real world have to face environments that are subject to non-stationary rewards or various types of constraints (e.g., limited action, intrinsic rewards). The traditional approaches are to transfer learning from other environments, specify new tasks for multi-task learning, or train the agent to be robust to certain types of perturbations (domain-adaptation). These approaches make assumptions about new tasks or environments, which might be difficult to anticipate. Conversely, meta-learning is restricted to a single environment that is subject to a determined (but unknown) set of challenges (non-stationarity and constraints). This is a different assumption, but interesting in a sense that it allows us to explore how the agent can learn at a more abstract level of unknown environment dynamics, i.e.,"learn how to learn".

Approach

Meta-RL methods enable learning to learn by adding an outer loop where a meta-agent is responsible to explore the hyperparameter space over a distribution of tasks within one same environment. In other words, there are no assumptions about new environments, perturbations or out-of-distribution data. Instead, Meta-RL relies on the optimization over the hyperparameter space as means to "learn how to learn", which has shown to be effective (convergence-wise) in non-stationary and constrained MDPs.

Goal: explore Meta-RL methods in a small proof-of-concept project in the context of systems that have to learn-to-learn to adapt to changes in their environments.

Topics:

Introduction to Deep Reinforcement Learning and Gradient Descent
Meta-learning vs Transfer learning, Domain adaptation, and Multi-task learning.
Meta-learning in stateless Models (rankings)
Short intro to Lagrangian multiplier, first, and second order methods
Meta-gradients methods (Deep Reinforcement Learning)
Intrinsic reward strategies (entropy-driven and diversity-driven)

Voraussetzungen

No particular requirements, except acquaintance with Deep Learning and Reinforcement Learning methods. We will provide an introduction to these topics before we get into specialized topics of Meta-Reinforcement Learning.

Literatur

Papers

Xu, Z., van Hasselt, H. P., Hessel, M., Oh, J., Singh, S., & Silver, D. (2020). Meta-gradient reinforcement learning with an objective discovered online. Advances in Neural Information Processing Systems, 33, 15254-15264.
Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. (2018, September). Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. In International Conference on Learning Representations.
Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., & Finn, C. (2021, July). Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning (pp. 7780-7791). PMLR.
Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2019, September). Meta-Q-Learning. In International Conference on Learning Representations.
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9), 5149-5169.
Sutton, R. S. (1992, July). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In AAAI (pp. 171-176).

Books

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization. Mit Press.

Lern- und Lehrformen

The course is a project seminar, which has an introductory phase comprising initial short lectures. After that, the students will work in groups on jointly identified experiments applying specific solutions to given problems and finally prepare a presentation and write a report about their findings concerning the experiments.

There will be an introductory phase to present basic concepts for the theme, including the necessary foundations.

Lectures will happen in a seminar room and the students interested can also join online via Zoom (credentials)*

Leistungserfassung

We will grade the group's reports (80%) and presentations (20%). Note that the report includes documenting the experiments and the obtained results. Therefore, the grading of the report includes the experiments. During the project phase, we will require participation in meetings and other groups' presentations in the form of questions and feedback to their peers.

Termine

The first lecture will take place on October 18, 2022 (Tuesday) from 17:00-18:30. The lectures will take place in room A-2.1 and remotely via Zoom (credentials)*

We will follow the recurrent schedule of:

Tuesdays from 17:00-18:30 in room A-2.1
Wednesdays from 17:00-18:30 in room A-2.1

* In case that you do not have access to GitLab, please email christian.adriano [at] hpi.de

Zurück