Nvidia accelerates Google DeepMind's DiffusionGemma

Nvidia accelerates Google DeepMind’s DiffusionGemma

Nvidia reports up to four times faster local text generation with DiffusionGemma’s new parallel processing approach.

The model is built upon Gemma 4, in which only 3.8 billion parameters are activated per inference step. Credit: Koshiro K/Shutterstock.com.

Google DeepMind has released DiffusionGemma, an experimental open source text generation model now optimised for enhanced performance on Nvidia’s GeForce RTX GPUs, Nvidia RTX PRO platform and Nvidia DGX Spark systems.

This adaptation aims to support text generation tasks with significantly reduced latency across a range of local hardware configurations, from personal computers to cloud environments.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

DiffusionGemma differs from conventional large language models, which usually generate text sequentially and produce one token at a time based on the preceding word. In contrast, the new model can generate up to 256 tokens in parallel during each step, creating entire blocks of text at once.

This parallel approach is positioned to benefit developers, researchers and AI practitioners who conduct single-user workloads, such as interactive chat applications and on-device assistants, by offering faster response times.

The model is built upon Gemma 4, a 26-billion-parameter mixture-of-experts (MoE) architecture, in which only 3.8 billion parameters are activated per inference step. This configuration enables the model to fit within the memory constraints of high-end consumer GPUs, reportedly operating within 18GB of VRAM when quantised.

Nvidia has tailored DiffusionGemma to capitalise on its hardware strengths, citing compatibility with Nvidia Tensor Cores and the CUDA software environment.

As a result, the model achieves measurable speed gains: official figures indicate throughput of 1,000 tokens per second on a single Nvidia H100 Tensor Core GPU, 150 tokens per second on Nvidia DGX Spark and up to 2,000 tokens per second on Nvidia DGX Station.

The companies state these speeds are approximately four times faster than those of similar autoregressive models under single-user conditions.

DiffusionGemma uses bi-directional attention, which enables each token generated in a block to reference every other token within that same block. This approach may offer benefits in tasks that require non-linear outputs, such as code infilling or working with mathematical and amino acid sequences.

The architecture also incorporates an iterative self-correction mechanism, refining output across the entire block at each step.

Google DeepMind notes that DiffusionGemma is published under an Apache 2.0 license and supported from launch in platforms such as Hugging Face Transformers, vLLM, and Unsloth.

However, the model remains experimental and is recommended for applications prioritising speed and iterative interaction rather than maximum text quality, for which standard Gemma 4 remains preferred.

For high-throughput, cloud-based workloads, the company notes that traditional autoregressive models may retain efficiency advantages.

Sections

Sections

Sections

Sections

Nvidia accelerates Google DeepMind’s DiffusionGemma

Go deeper with GlobalData

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Industrial Automation Sector Scorecard - Thematic Intelligence

Data Insights

Access deeper industry intelligence

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Industrial Automation Sector Scorecard - Thematic Intelligence

Go deeper with GlobalData

From bytes to tokens: Uncovering value streams in the new age of AI intelligence

IBM to buy HRL Laboratories to enhance quantum computing

Alphabet's Q2 2026 net income surges 298% to $112bn

Paul Weiss, Kirkland & Ellis top TMT legal advisory for H1 2026

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Access deeper industry intelligence

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Go deeper with GlobalData

Access deeper industry intelligence

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing