Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
 

Meta-Reinforcement Learning for Self-Adaptive Systems (Wintersemester 2022/2023)

Dozent: Prof. Dr. Holger Giese (Systemanalyse und Modellierung) , Christian Medeiros Adriano (Systemanalyse und Modellierung) , He Xu (Systemanalyse und Modellierung) , Sona Ghahremani (Systemanalyse und Modellierung)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2022 - 31.10.2022
  • Prüfungszeitpunkt §9 (4) BAMA-O: 01.02.2023
  • Lehrform: Projektseminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-K Konzepte und Methoden
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-T Techniken und Werkzeuge
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
Data Engineering MA
Digital Health MA
Software Systems Engineering MA

Beschreibung

Context

Reinforcement learning (RL) agents deployed in the real world have to face environments that are subject to non-stationary rewards or various types of constraints (e.g., limited action, intrinsic rewards). The traditional approaches are to transfer learning from other environments, specify new tasks for multi-task learning, or train the agent to be robust to certain types of perturbations (domain-adaptation). These approaches make assumptions about new tasks or environments, which might be difficult to anticipate. Conversely, meta-learning is restricted to a single environment that is subject to a determined (but unknown) set of challenges (non-stationarity and constraints). This is a different assumption, but interesting in a sense that it allows us to explore how the agent can learn at a more abstract level of unknown environment dynamics, i.e.,"learn how to learn".

Approach 

Meta-RL methods enable learning to learn by adding an outer loop where a meta-agent is responsible to explore the hyperparameter space over a distribution of tasks within one same environment. In other words, there are no assumptions about new environments, perturbations or out-of-distribution data. Instead, Meta-RL relies on the optimization over the hyperparameter space as means to "learn how to learn", which has shown to be effective (convergence-wise) in non-stationary and constrained MDPs.

Goal: explore Meta-RL methods in a small proof-of-concept project in the context of systems that have to learn-to-learn to adapt to changes in their environments.

Topics:

  • Introduction to Deep Reinforcement Learning and Gradient Descent
  • Meta-learning vs Transfer learning, Domain adaptation, and Multi-task learning.
  • Meta-learning in stateless Models (rankings)
  • Short intro to Lagrangian multiplier, first, and second order methods
  • Meta-gradients methods (Deep Reinforcement Learning)
  • Intrinsic reward strategies (entropy-driven and diversity-driven)

Voraussetzungen

No particular requirements, except acquaintance with Deep Learning and Reinforcement Learning methods. We will provide an introduction to these topics before we get into specialized topics of Meta-Reinforcement Learning.

Literatur

Papers

  • Xu, Z., van Hasselt, H. P., Hessel, M., Oh, J., Singh, S., & Silver, D. (2020). Meta-gradient reinforcement learning with an objective discovered online. Advances in Neural Information Processing Systems33, 15254-15264.
  • Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. (2018, September). Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. In International Conference on Learning Representations.
  • Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., & Finn, C. (2021, July). Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning (pp. 7780-7791). PMLR.
  • Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2019, September). Meta-Q-Learning. In International Conference on Learning Representations.
  • Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence44(9), 5149-5169.
  • Sutton, R. S. (1992, July). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In AAAI (pp. 171-176).

Books

  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
  • Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization. Mit Press.

Lern- und Lehrformen

The course is a project seminar, which has an introductory phase comprising initial short lectures. After that, the students will work in groups on jointly identified experiments applying specific solutions to given problems and finally prepare a presentation and write a report about their findings concerning the experiments.

There will be an introductory phase to present basic concepts for the theme, including the necessary foundations.

Lectures will happen in a seminar room and the students interested can also join online via Zoom (credentials)*

Leistungserfassung

We will grade the group's reports (80%) and presentations (20%). Note that the report includes documenting the experiments and the obtained results. Therefore, the grading of the report includes the experiments. During the project phase, we will require participation in meetings and other groups' presentations in the form of questions and feedback to their peers.

Termine

The first lecture will take place on October 18, 2022 (Tuesday) from 17:00-18:30. The lectures will take place in room A-2.1 and remotely via Zoom (credentials)*

We will follow the recurrent schedule of:

  • Tuesdays from 17:00-18:30 in room A-2.1
  • Wednesdays from 17:00-18:30 in room A-2.1

* In case that you do not have access to GitLab, please email christian.adriano [at] hpi.de

Zurück