Nvidia has commenced production of Dynamo 1.0, an open-source operating system designed for large-scale AI inference.
Dynamo 1.0 is currently in use across a range of global cloud service providers, AI-native firms and enterprises.
Access deeper industry intelligence
Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.
The software is available immediately to developers worldwide.
Dynamo 1.0 works with the Nvidia Blackwell platform to manage GPU and memory resources for AI workloads across data centre clusters.
It divides inference tasks between GPUs and uses advanced traffic management tools to move data efficiently between GPUs and storage systems, reducing memory bottlenecks and computational overheads.
For agentic AI applications and processes involving lengthy prompts, the system routes requests to GPUs already containing relevant data from earlier steps, offloading this information when it becomes unnecessary.
US Tariffs are shifting - will you react or anticipate?
Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.
By GlobalDataRecent benchmarks indicate that Dynamo can increase the inference performance of Blackwell GPUs by as much as seven times, reducing the operational cost per token for users employing millions of GPUs.
As open-source software, Dynamo 1.0 aims to address challenges associated with scaling AI inference in data centres, where varying request sizes and unpredictable demand make resource orchestration complex.
Dynamo integrates natively with leading open-source AI frameworks such as LangChain, llm-d, LMCache, SGLang and vLLM through optimisations made possible by the Nvidia TensorRT-LLM library.
Core components of Dynamo, such as KVBM for memory management, NIXL for GPU-to-GPU data movement and Grove for scaling, are also released as standalone modules.
Nvidia has provided TensorRT-LLM CUDA kernels to the FlashInfer project to support their integration into further open source initiatives.
The Nvidia inference platform incorporating Dynamo is supported by major cloud providers, including Amazon Web Services (AWS), Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure (OCI).
Additional adoption includes providers such as Alibaba Cloud, CoreWeave, Together AI and Nebius; as well as companies like Cursor, Perplexity, Baseten, Deep Infra, Fireworks, ByteDance, Meituan, PayPal and Pinterest.
Nvidia founder and CEO Jensen Huang said: “Inference is the engine of intelligence, powering every query, every agent and every application.
“With Nvidia Dynamo, we’ve created the first-ever ‘operating system’ for AI factories. The rapid adoption across our ecosystem shows this next wave of agentic AI is here, and Nvidia is powering it at global scale.”
In related developments, Nvidia has released the Vera Rubin DSX AI Factory reference design alongside the general availability of the Omniverse DSX Blueprint.
These resources provide guidance for building integrated AI infrastructure and digital twins for large-scale design and operations of AI factories.
The reference design is developed with contributions from industry partners, including Cadence, Dassault Systèmes, Eaton, Jacobs, Nscale, Phaidra, Procore, PTC, Schneider Electric, Siemens, Switch, Trane Technologies and Vertiv.
The Vera Rubin DSX reference outlines procedures for constructing and managing all aspects of AI factory infrastructure, spanning compute platforms, Nvidia Spectrum-X Ethernet networking and storage, to enable scalable cluster performance.
The documentation includes best practices for implementing power supply systems, cooling solutions and hardware-software integration required for deployment at scale.
The Vera Rubin DSX software stack offers an open architecture that allows operators to select necessary components according to project requirements.
