Meta-Reinforcement Learning for Self-Adaptive Systems (Wintersemester 2022/2023)
Dozent:
Prof. Dr. Holger Giese
(Systemanalyse und Modellierung)
,
Christian Medeiros Adriano
(Systemanalyse und Modellierung)
,
He Xu
(Systemanalyse und Modellierung)
,
Sona Ghahremani
(Systemanalyse und Modellierung)
Allgemeine Information
- Semesterwochenstunden: 4
- ECTS: 6
- Benotet:
Ja
- Einschreibefrist: 01.10.2022 - 31.10.2022
- Prüfungszeitpunkt §9 (4) BAMA-O: 01.02.2023
- Lehrform: Projektseminar
- Belegungsart: Wahlpflichtmodul
- Lehrsprache: Englisch
Studiengänge, Modulgruppen & Module
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-K Konzepte und Methoden
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-T Techniken und Werkzeuge
- SAMT: Software Architecture & Modeling Technology
- HPI-SAMT-S Spezialisierung
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
- DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
- DANA: Data Analytics
- HPI-DANA-S Spezialisierung
- SYSE: Systems Engineering
- HPI-SYSE-K Konzepte und Methoden
- SYSE: Systems Engineering
- HPI-SYSE-T Techniken und Werkzeuge
- SYSE: Systems Engineering
- HPI-SYSE-S Spezialisierung
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-C Concepts and Methods
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-T Technologies and Tools
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-S Specialization
- MALA: Machine Learning and Analytics
- HPI-MALA-C Concepts and Methods
- MALA: Machine Learning and Analytics
- HPI-MALA-T Technologies and Tools
- MODA: Models and Algorithms
- HPI-MODA-C Concepts and Methods
- MODA: Models and Algorithms
- HPI-MODA-T Technologies and Tools
- MODA: Models and Algorithms
- HPI-MODA-S Specialization
Beschreibung
Context
Reinforcement learning (RL) agents deployed in the real world have to face environments that are subject to non-stationary rewards or various types of constraints (e.g., limited action, intrinsic rewards). The traditional approaches are to transfer learning from other environments, specify new tasks for multi-task learning, or train the agent to be robust to certain types of perturbations (domain-adaptation). These approaches make assumptions about new tasks or environments, which might be difficult to anticipate. Conversely, meta-learning is restricted to a single environment that is subject to a determined (but unknown) set of challenges (non-stationarity and constraints). This is a different assumption, but interesting in a sense that it allows us to explore how the agent can learn at a more abstract level of unknown environment dynamics, i.e.,"learn how to learn".
Approach
Meta-RL methods enable learning to learn by adding an outer loop where a meta-agent is responsible to explore the hyperparameter space over a distribution of tasks within one same environment. In other words, there are no assumptions about new environments, perturbations or out-of-distribution data. Instead, Meta-RL relies on the optimization over the hyperparameter space as means to "learn how to learn", which has shown to be effective (convergence-wise) in non-stationary and constrained MDPs.
Goal: explore Meta-RL methods in a small proof-of-concept project in the context of systems that have to learn-to-learn to adapt to changes in their environments.
Topics:
- Introduction to Deep Reinforcement Learning and Gradient Descent
- Meta-learning vs Transfer learning, Domain adaptation, and Multi-task learning.
- Meta-learning in stateless Models (rankings)
- Short intro to Lagrangian multiplier, first, and second order methods
- Meta-gradients methods (Deep Reinforcement Learning)
- Intrinsic reward strategies (entropy-driven and diversity-driven)
Voraussetzungen
No particular requirements, except acquaintance with Deep Learning and Reinforcement Learning methods. We will provide an introduction to these topics before we get into specialized topics of Meta-Reinforcement Learning.
Literatur
Papers
- Xu, Z., van Hasselt, H. P., Hessel, M., Oh, J., Singh, S., & Silver, D. (2020). Meta-gradient reinforcement learning with an objective discovered online. Advances in Neural Information Processing Systems, 33, 15254-15264.
- Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. (2018, September). Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. In International Conference on Learning Representations.
- Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., & Finn, C. (2021, July). Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning (pp. 7780-7791). PMLR.
- Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2019, September). Meta-Q-Learning. In International Conference on Learning Representations.
- Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9), 5149-5169.
- Sutton, R. S. (1992, July). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In AAAI (pp. 171-176).
Books
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
- Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization. Mit Press.
Lern- und Lehrformen
The course is a project seminar, which has an introductory phase comprising initial short lectures. After that, the students will work in groups on jointly identified experiments applying specific solutions to given problems and finally prepare a presentation and write a report about their findings concerning the experiments.
There will be an introductory phase to present basic concepts for the theme, including the necessary foundations.
Lectures will happen in a seminar room and the students interested can also join online via Zoom (credentials)*
Leistungserfassung
We will grade the group's reports (80%) and presentations (20%). Note that the report includes documenting the experiments and the obtained results. Therefore, the grading of the report includes the experiments. During the project phase, we will require participation in meetings and other groups' presentations in the form of questions and feedback to their peers.
Termine
The first lecture will take place on October 18, 2022 (Tuesday) from 17:00-18:30. The lectures will take place in room A-2.1 and remotely via Zoom (credentials)*
We will follow the recurrent schedule of:
- Tuesdays from 17:00-18:30 in room A-2.1
- Wednesdays from 17:00-18:30 in room A-2.1
* In case that you do not have access to GitLab, please email christian.adriano [at] hpi.de
Zurück