Talk Deatils - Past, Present and Future Challenges in Datacenter and Cloud Netwo

Title: Past, Present and Future Challenges in Datacenter and Cloud Networking

Speakers: Marios Kogias, Imperial College London

Schedule: 21.05.2026, [11:30 - 12:30]

Zoom link: tu-darmstadt.zoom-x.de/j/68642659617

Abstract:​ Datacenter and cloud networks are undergoing a fundamental transition. Up until recently, datacenter networking has been driven by increasing link speeds, shrinking switch buffers, tight latency requirements, and in-network programmability. While such bespoke solutions are still hard to deploy in a multi-tenant cloud environment, hence are not widely accessible to cloud tenants, emerging AI workloads and agentic applications are shaking the requirements for datacenter and cloud networking by changing the traffic patterns and introducing new communication protocols.
In this talk, I will first present SIRD (NSDI’25), corresponding to the past challenges, a congestion control protocol designed for modern datacenter fabrics. SIRD revisits receiver-driven designs and shows how explicitly distinguishing between single-owner and shared links enables precise scheduling without sacrificing stability. By combining scheduling with reactive control, SIRD achieves high utilization and near-optimal latency while keeping queuing minimal, even at 100 Gbps.
Next, I will focus on current challenges that revolve around making all this research on datacenter infrastructure available to the public cloud. I will present KRAKENGUARD (NSDI'26), a policy-driven access control framework that enables safe, multi-tenant use of eBPF specifically for networking hooks such as XDP. KRAKENGUARD enforces fine-grained constraints on eBPF programs at load time based on exhaustive symbolic execution, preventing privilege abuse and unsafe interference between co-located network functions.
Finally, I will describe our ongoing effort (future challenges) towards designing a networking stack specifically targeting AI training workloads. I will explain how the specifics of the AI training communication patterns allow us to completely rethink the required mechanisms for routing, congestion control, and Quality of Service.


Biography: Marios Kogias is an Assistant Professor in the Department of Computing at Imperial College London, where he conducts research in operating systems, networking, and distributed systems, with a particular focus on tail-tolerant systems, datacenter networking, and cloud infrastructure. He received his PhD from EPFL. His PhD work has been recognised with the 2021 Dennis M. Ritchie Doctoral Dissertation Award and an honourable mention for the Roger Needham PhD Award, and supported by an IBM PhD Fellowship. His research has been published in top-tier systems conferences and has received a Best Student Paper Award at Eurosys and a Distinguished Artifact Award at ASPLOS. He is also, the recipient of an ERC Starting Grant.