OpenAI, in collaboration with Broadcom, has introduced Jalapeño, a purpose-built processor designed for large language model (LLM) inference.
The chip is the first in a planned series of accelerators that the companies are developing jointly. It is intended to support advanced AI workloads and extend OpenAI’s in-house control over the key infrastructure behind its models and products.
Access deeper industry intelligence
Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.
Jalapeño was handed over to OpenAI CEO Sam Altman and president Greg Brockman by Broadcom president and CEO Hock Tan and semiconductor solutions president Charlie Kawwas.
Brockman, who is also a co-founder of OpenAI, said: “The world is moving to a compute-powered economy.
“Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems.
“By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI towards broader access.”
Jalapeño is a custom design, initiated specifically around OpenAI’s understanding of LLM operations, future model requirements, software infrastructure, and product needs.
The intent is to handle both current and future LLMs, including OpenAI’s deployed platforms like ChatGPT, Codex, its API, and upcoming agent-driven products.
OpenAI, Broadcom, and Celestica worked together on Jalapeño’s design process.
Broadcom provided chip implementation, networking capabilities, and silicon manufacturing, while Celestica contributed to board and rack system integration, networking, and scalable production system development.
Early engineering samples are currently running machine learning workloads at target frequencies and power envelopes in OpenAI’s labs. This includes running internal models such as GPT-5.3-Codex-Spark.
The processor’s architecture is described as blank-slate, meaning it is not adapted from previous AI accelerators or general-purpose solutions but has been tailored to the specific needs of LLM inference.
OpenAI stated that the processor is designed for flexibility across all LLMs, guided by insights into the specific requirements of inference on present and upcoming AI models across the sector.
Jalapeño is expected to deliver performance per watt substantially above current comparable hardware. Initial testing is ongoing, with OpenAI planning to release a detailed technical performance report in the coming months.
The architecture aims to limit unnecessary data movement and balance compute, memory, and networking resources, in pursuit of closing the gap between theoretical and realised peak performance of the processor.
Broadcom’s Tomahawk networking silicon forms part of the infrastructure enabling the Jalapeño platform to scale for large deployments.
Jalapeño’s development cycle took nine months from initial design to manufacturing tape-out, a timeline which OpenAI and Broadcom note may be one of the fastest for a high-end ASIC in the sector. This acceleration was supported by close software-hardware co-development and the use of OpenAI’s own AI models to assist in parts of the chip design and optimisation.
The companies describe Jalapeño as the first element of a multi-generation compute platform. It is planned for initial deployment by the end of 2026.
Broadcom president and CEO Hock Tan said: “Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI.
“This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026.”
Jalapeño is expected to scale further, combining OpenAI-designed accelerators, Broadcom’s silicon, networking, and connectivity technologies, and Celestica’s rack, system, and networking expertise.
The design goal is to match the compute power and throughput of current AI accelerators while achieving latency levels closer to specialised inference solutions, targeting scalable interactive LLM products.
OpenAI also stated that models used to serve customers are being utilised internally to improve the hardware and systems running future AI workloads. The company points out that if AI can assist engineers in creating chips more rapidly and efficiently, it could lower compute costs across the industry and broaden access to advanced AI infrastructure.
Earlier this month, OpenAI submitted a confidential filing with the US Securities and Exchange Commission (SEC) for a possible initial public offering. The ChatGPT developer has not yet finalised the number of shares, potential pricing, or timeline, and indicated that internal discussions and planning are still ongoing.
