Exploring NVIDIA's CDMM Mode for Enhanced Memory Management
Iris Coleman Oct 14, 2025 16:42
NVIDIA introduces Coherent Driver-based Memory Management (CDMM) to improve GPU memory control on hardware-coherent platforms, addressing issues faced by developers and cluster administrators.

NVIDIA has introduced a new memory management mode, Coherent Driver-based Memory Management (CDMM), designed to enhance the control and performance of GPU memory on hardware-coherent platforms such as GH200, GB200, and GB300. This development aims to address the challenges posed by non-uniform memory access (NUMA), which can lead to inconsistent system performance when applications are not fully NUMA-aware, according to NVIDIA.
NUMA vs. CDMM
NUMA mode, the current default for NVIDIA drivers on hardware-coherent platforms, exposes both CPU and GPU memory to the operating system (OS). This setup allows memory allocation through standard Linux and CUDA APIs, facilitating dynamic memory migration between CPU and GPU. However, this can also result in GPU memory being treated as a generic pool, potentially affecting application performance negatively.
In contrast, CDMM mode prevents GPU memory from being exposed to the OS as a software NUMA node. Instead, the NVIDIA driver directly manages GPU memory, providing more precise control and potentially boosting application performance. This approach is akin to the PCIe-attached GPU model, where GPU memory remains distinct from system memory.
Implications for Kubernetes
The introduction of CDMM is particularly significant for Kubernetes, a widely-used platform for managing large GPU clusters. In NUMA mode, Kubernetes may encounter unexpected behaviors, such as memory over-reporting and incorrect application of pod memory limits, which can lead to performance issues and application failures. CDMM mode helps mitigate these issues by ensuring better isolation and control over GPU memory.
Impact on Developers and System Administrators
For CUDA developers, CDMM mode affects how system-allocated memory is handled. While GPU can still access system-allocated memory across the NVLink chip-to-chip connection, memory pages will not migrate as they might in NUMA mode. This change requires developers to adapt their memory management strategies to fully leverage the capabilities of CDMM.
System administrators will find that tools like numactl
or mbind
are ineffective for GPU memory management in CDMM mode, as GPU memory is not presented to the OS. However, these tools can still be utilized for managing system memory.
Guidelines for Choosing Between CDMM and NUMA
When deciding between CDMM and NUMA modes, consider the specific memory management needs of your applications. NUMA mode is suitable for applications that rely on OS management of combined CPU and GPU memory. In contrast, CDMM mode is ideal for applications requiring direct GPU memory control, bypassing the OS for enhanced performance and control.
Ultimately, CDMM mode offers developers and administrators the ability to harness the full potential of NVIDIA's hardware-coherent memory architectures, optimizing performance for GPU-accelerated workloads. For those using platforms like GH200, GB200, or GB300, enabling CDMM mode could provide significant benefits, especially in Kubernetes environments.
Image source: Shutterstock