Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Multi-Agent Reinforcement Learning on Self-Adaptive Systems (Sommersemester 2022)

Dozent: Prof. Dr. Holger Giese (Systemanalyse und Modellierung) , Christian Medeiros Adriano (Systemanalyse und Modellierung) , He Xu (Systemanalyse und Modellierung)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.04.2022 - 30.04.2022
  • Prüfungszeitpunkt §9 (4) BAMA-O: 12.08.2022
  • Lehrform: Projektseminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch

Studiengänge, Modulgruppen & Module

Data Engineering MA
IT-Systems Engineering MA
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-K Konzepte und Methoden
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-T Techniken und Werkzeuge
  • SAMT: Software Architecture & Modeling Technology
    • HPI-SAMT-S Spezialisierung
Digital Health MA



Our motivation for multi-agent reinforcement learning (MARL) is that it provides a divide & conquer framework both at training and after deployment. This is necessary to scale self-adaptation over large and sparse state spaces. Our studies will leverage both on the MARL framework and the underlying causal structure of the self-adaptive systems. In our particular case, the causal structure corresponds to (1) an architectural configuration model, (2) a system utility model and (2) a failure event propagation model. 

The challenges of designing a MARL-based solution comprise how the learning happens in an asynchronous, distributed, and adversarial setting. While the agents need to cooperate to maximize global system utility and mitigate the effect of disturbances, the agents also have to compete for certain resources, like network bandwidth and server computing time. Therefore, we need to combine models that enable competitive decision-making and transfer learning to share knowledge in a way that maximizes scalability and robustness under various adversarial situations.



  • Multi-Agent Reinforcement Learning (MARL)
    • Model-Based Methods
    • Architectural Models
  • Transfer Learning for MARL
    • Representation Learning for MARL
    • Curriculum Learning for MARL
  • Adversarial Training (Robustness)
    • Distribution Shifts, Domain-Adaptation
    • Non-Stationary Environments
  • Game Theoretical Approaches
    • Cooperative Agents
    • Competitive Agents


The project will build on top of a MARL architecture that already supports a set of adversarial situations in an ad hoc manner. In this project we will substitute the ad hoc solution for a principled design that copes with adversarial settings by combining transfer learning, game theoretical approaches, and model-based methods.


Surveys, Applications, Theory

  • Hernandez-Leal, P., et al., 2018, Is multiagent deep reinforcement learning the answer or the question? A brief survey. learning 2122.
  • Gronauer, S. & Diepold, K., 2021, Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, pp.1-49.
  • Vithayathil V., et al., 2020, A survey of multi-task deep reinforcement learning. Electronics 9.9.
  • Zhang, K., et al., 2021, Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pp.321-384.
  • Zafar, H., et al., 2021, Transfer Learning in Multi-Agent Reinforcement Learning with Double Q-Networks for Distributed Resource Sharing in V2X Communication. arXiv:2107.06195.
  • Nguyen, T. T., et al., 2020, Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics 50.9, pp.3826-3839.
  • Rădulescu, Roxana, et al., 2020, Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems 34.1, pp.1-52.

Transfer Learning

  • Zhu, Z., et al., 2020, Transfer learning in deep reinforcement learning: A survey. arXiv:2009.07888.
  • Cheng, Z., et al., 2021, Multi‐agent reinforcement learning via knowledge transfer with differentially private noise. International Journal of Intelligent Systems 37.1, pp.799-828.
  • Herrera, Manuel, et al., 2020, Multi-agent systems and complex networks: Review and applications in systems engineering. Processes 8.3, 312.
  • Christianos, P., et al., 2020, Shared experience actor-critic for multi-agent reinforcement learning. arXiv:2006.07169.
  • Grimbly, St John, Jonathan Shock, and Arnu Pretorius. Causal Multi-Agent Reinforcement Learning: Review and Open Problems. arXiv:2111.06721.
  • Yang, T., et al., 2020, Transfer among Agents: An Efficient Multiagent Transfer Learning Framework. arXiv:2002.08030.
  • Silva, F. L. da &  Costa, A. H. R., 2021, Transfer Learning for Multiagent Reinforcement Learning Systems. Synthesis Lectures on Artificial Intelligence and Machine Learning 15.3, pp.1-129.

Robustness, Adversarial Training, Non-Stationarity

  • van der Heiden, T., et al., 2020, Robust Multi-Agent Reinforcement Learning with Social Empowerment for Coordination and Communication. arXiv:2012.08255.
  • Papoudakis, G., et al., 2019,  Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv:1906.04737.
  • Zhang, K., et al., 2020, Robust Multi-Agent Reinforcement Learning with Model Uncertainty. NeurIPS.

Lern- und Lehrformen

The course is a project seminar, which has an introductory phase comprising initial short lectures. After that, the students will work in groups on jointly identified experiments applying specific solutions to given problems and finally prepare a presentation and write a report about their findings concerning the experiments.

There will be an introductory phase to present basic concepts for the theme, including the necessary foundations.

Lectures will happen in the seminar room and the students interested can also join online via Zoom (credentials)*


We will grade the group's reports (80%) and presentations (20%). Note that the report includes documenting the experiments and the obtained results. Therefore, the grading of the report includes the experiments. During the project phase, we will require participation in meetings and other groups' presentations in the form of questions and feedback to their peers.


The first lecture will take place on April 26, 2022 (Tuesday) from 17:00-18:30. The lecture takes place remotely and in room A-2.1. An invitation to participate via Zoom will be published in good time in the HPI GitLab at credentials *

We will follow the recurrent schedule of:

  • Tuesdays from 17:00-18:30 in room A-1.1
  • Wednesdays from 17:00-18:30 in room A-1.1 

*In case that you do not have access to GitLab, please email christian.adriano@hpi.de