02.04.2025

Paper on Benchmarking NVLink-Attached GPU Memory at HCDS’25

Our paper on benchmarking NVLink-attached GPU memory was accepted and presented at the HCDS workshop, co-located with ASPLOS & EuroSys.

Title: Towards Memory Disaggregation via NVLink C2C: Benchmarking CPU-Requested GPU Memory Access

Authors: Felix Werner, Marcel Weisgut, Tilmann Rabl

Abstract:

Memory disaggregation decouples compute and memory resources, enabling efficient use of resources. Several interconnect technologies provide cache-coherent access to remote memory regions, which eases the use of disaggregated memory. Recent NVIDIA-based systems use the NVLink C2C interconnect, which provides cache-coherent memory access between CPUs and GPUs and their memory. While GPUs and NVLink are widely used to accelerate complex workloads, NVLink’s viability for connecting memory-expansion devices to a CPU remains unexplored. In this work, we quantify the characteristics of NVIDIA’s Grace CPU when accessing GPU memory via NVLink to assess NVLink’s viability for memory expansion. We benchmark throughput and latency for memory accesses on an NVIDIA Grace-Hopper system. We evaluate memory expansion when the CPU accesses both CPU and GPU memory and quantify the performance of database index operations with data stored in GPU memory. Our experiments show a throughput of up to 168 GB/s and access latencies between about 800 ns and 1000 ns.

Download