Reinforcement learning (RL) is a key technology for solving hard decision-making problems, surpassing human game play, and enabling conversational AI bots such as ChatGPT. Supporting RL workloads poses new challenges: RL jobs exhibit complex execution and communication patterns, and rely on large amounts of generated training data. Current RL systems fail to support RL algorithms efficiently on GPU clusters: they either hard-code algorithm-specific strategies for parallelisation and distribution; or they accelerate only parts of the computation on GPUs (e.g. DNN policy updates).
In this talk, I will argue that current RL systems lack an abstraction that decouples the definition of an RL algorithm from its strategy for distributed execution. I will describe our work on MSRL, a distributed RL training system that uses the new abstraction of a fragmented dataflow graph (FDG) to execute RL algorithms in a flexible way. An FDG maps functions from the RL training loop to independent parallel dataflow fragments. Fragment can execute on different devices through a low-level dataflow implementation, e.g., an operator graph of a DNN engine, a CUDA GPU kernel, or a multi-threaded CPU process. Our experiments show that MSRL exposes trade-offs between different execution strategies, while surpassing the performance of existing RL systems with fixed strategies.