Google has unveiled Gemma 4, a new suite of open AI models designed for deployment across a range of hardware platforms, including mobile devices and personal workstations.

The models, made available under an Apache 2.0 open-source licence, are aimed at supporting advanced reasoning, agent-based workflows, and multimodal data processing.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

According to a company blog written by Google DeepMind research vice president Clement Farabet and Google DeepMind group product manager Olivier Lacombe, Gemma 4 builds on the same foundational research and technology as Gemini 3.

Farabet and Lacombe wrote: “Built from the same world-class research and technology as Gemini 3, Gemma 4 is the most capable model family you can run on your hardware.”

The pair said that Gemma 4 aims to offer developers both open and proprietary tools across different use cases.

Gemma 4 is available in four model sizes, which include Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.

Google states that the 31B model ranks third and the 26B model sixth on the Arena AI text leaderboard, a widely used industry metric, with Gemma 4 outperforming some models 20 times its size. These models are designed for tasks involving complex logic, multi-step planning, and agentic operations.

The smaller E2B and E4B configurations are intended for deployment on mobile and edge devices, with emphasis placed on processing speed and hardware efficiency.

The larger Gemma 4 models, including the 26B and 31B versions, have been optimised to run on a single NVIDIA 80GB H100 graphics processing unit (GPU). Quantised versions are compatible with consumer GPUs.

These models provide support for agent-driven workflows and can execute code-related tasks locally. The E2B and E4B variants are engineered for mobile and IoT environments, focusing on memory and battery efficiency.

They are capable of running offline on devices such as Android smartphones, Raspberry Pi, and NVIDIA Jetson Orin Nano modules.

Android developers can access Gemma 4 models through the AICore Developer Preview, designed to ensure forward compatibility with Gemini Nano 4, as per Farabet and Lacombe’s blog.

The model family is trained on data in over 140 languages and can process images, video, and audio, with context windows extending to 256,000 tokens for long-form content.

Google describes Gemma 4 as supporting function calling, structured JSON outputs, and the creation of autonomous agents capable of workflow automation. The models can interleave text and images and handle tasks ranging from code generation to speech and visual data recognition.

Google and Nvidia have worked together to optimise Gemma 4 for NVIDIA GPUs, allowing efficient operation from the cloud to local workstations and edge devices, including RTX-powered PCs and Jetson Orin Nano modules.

Data for model performance were measured on desktop systems using GeForce RTX 5090 and Mac M3 Ultra hardware, specifically referring to throughput metrics on the llama.cpp and llama-bench tools.

The E2B and E4B models execute with near-zero latency for local inference, while the 26B and 31B models cater to higher-performance, developer-focused workflows. Google indicates that models have integration compatibility with agent platforms such as OpenClaw, which enable tasks based on local data and user workflows.

The release of Gemma 4 under the Apache 2.0 licence responds to community requests for open access and aims to provide developers with flexibility over data, infrastructure, and deployment environments.

Farabet and Lacombe wrote: “You gave us feedback, and we listened. Building the future of AI requires a collaborative approach, and we believe in empowering the developer ecosystem without restrictive barriers. That’s why Gemma 4 is released under a commercially permissive Apache 2.0 license.”

The collaboration with Nvidia focuses on making Gemma 4 accessible for a variety of computing environments, supporting a move towards AI models that function offline and leverage real-time context directly on user devices.

The intention is to remove infrastructure barriers and allow deployment for research, development, and end-user applications across different operating environments.

The introduction of Gemma 4 comes after Google released the Gemini 3.1 Pro model in February 2026.

Enterprise customers can access Gemini 3.1 Pro via Vertex AI and Gemini Enterprise, while consumers can use it through the Gemini app and NotebookLM. Higher usage limits are available to subscribers of the Google AI Pro and Ultra plans.