Microsoft Azure has introduced the NDv6 GB300 VM series, a new virtual machine (VM) offering built around the NVIDIA GB300 NVL72 platform and Blackwell Ultra graphics processing units (GPUs), now deployed at supercomputing scale.

The latest Azure cluster incorporates more than 4,600 NVIDIA GB300 NVL72 systems, which Microsoft describes as the first large-scale production deployment of this type.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

These clusters are intended for OpenAI’s most computationally intensive AI inference workloads, using NVIDIA’s next-generation InfiniBand networking for interconnects.

Each NDv6 GB300 virtual machine runs on a rack-scale Nvidia GB300 NVL72 system, comprising 72 Blackwell Ultra GPUs and 36 Grace CPUs.

This arrangement provides each VM with 37 terabytes (TB) of memory and 1.44 exaflops of FP4 Tensor Core capability. The system is intended for AI models with high computational and memory requirements, including large-scale reasoning models and generative AI workloads.

Microsoft Azure AI infrastructure corporate vice president Nidhi Chappell said: “Delivering the industry’s first at-scale Nvidia GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon — it reflects Microsoft Azure and Nvidia’s shared commitment to optimise all parts of the modern AI data centre.

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

“Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed.”

The cluster uses a two-level network design. Within each rack, fifth-generation Nvidia NVLink Switches enable 130TB per second of bandwidth between the GPUs, creating a unified memory space.

For communication across racks, the Nvidia Quantum-X800 InfiniBand platform supplies 800 gigabits (Gbs) per second of bandwidth to each GPU.

The network also features adaptive routing, congestion management via telemetry, and implements Nvidia’s Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v4 for accelerated distributed operations.

Nvidia claimed that the Blackwell Ultra platform has shown up to five times higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 model compared to the previous Hopper generation, as measured in the MLPerf Inference v5.1 benchmarks.

The GB300 NVL72 also outperformed previous systems on newer benchmarks, such as Llama 3.1 405B, according to Nvidia.

Microsoft’s implementation required modifications at multiple layers of its data centre stack, including liquid cooling solutions, power distribution systems and orchestration software.

In September 2025, Microsoft integrated Anthropic’s Claude models into Copilot Studio, enhancing its support for OpenAI’s large language models.