OpenAI and Broadcom launch Jalapeño chip for LLM inference

TL;DR: OpenAI and Broadcom have launched Jalapeño, a custom chip for LLM inference. It aims to reduce costs and dependence on NVIDIA. Full technical details are not yet available.

What happened?

OpenAI and Broadcom have officially unveiled Jalapeño, a custom chip (ASIC) optimized for inference of large language models (LLMs). According to OpenAI's official blog, the chip is designed to deliver superior performance and energy efficiency compared to general-purpose GPUs, such as those from NVIDIA, which currently dominate the AI inference market. This announcement comes at a critical time: demand for AI inference has surged with the mass adoption of ChatGPT and other generative models, and the energy costs associated with traditional GPUs have become a bottleneck. Although detailed technical specifications have not been released, sources close to the company indicate that the chip uses a high-bandwidth memory (HBM) architecture and a custom interconnect that reduces latency by 40% compared to current GPUs, according to preliminary estimates leaked to outlets like The Information.

Why is it important?

The launch of Jalapeño is significant for several reasons. First, it represents a strategic move by OpenAI to reduce its dependence on external hardware suppliers, especially NVIDIA, whose GPUs are expensive and often scarce. Recall that in 2023, OpenAI spent approximately $700 million on cloud computing services, primarily on NVIDIA GPUs rented from Azure, according to Bernstein estimates. By developing its own chip, OpenAI can optimize both hardware and software for its models, potentially leading to lower operating costs and greater scalability. Second, the collaboration with Broadcom, a semiconductor giant with experience in ASICs for networking and data centers, suggests the chip is designed for mass production and enterprise adoption, not just internal use. Broadcom has already supplied custom chips to Google (TPU) and Apple, which underscores the industrial viability of Jalapeño.

Consequences and context

Historically, LLM inference has been dominated by NVIDIA GPUs, but growing demand and supply chain bottlenecks have led companies like Google (TPU), Amazon (Trainium/Inferentia), and now OpenAI to develop their own chips. This move is part of a broader trend of vertical integration in the AI industry, where major players seek to control the entire technology stack. For example, Google has been using its TPUs for inference since 2016, and Amazon launched Inferentia in 2019. However, OpenAI is the first purely software company to take this step, which could pressure NVIDIA to innovate faster or lower prices. According to a Gartner report, the AI chip market will grow at a compound annual rate of 25% through 2027, and custom ASICs could capture 30% of that market. If Jalapeño proves significantly more efficient, it could democratize access to LLM inference, allowing startups and mid-sized companies to deploy advanced models without incurring prohibitive costs. Currently, the inference cost for models like GPT-4 can exceed $0.10 per query, limiting their use in high-volume applications.

What readers should know

For now, technical details of Jalapeño are limited. OpenAI has not disclosed specifications such as transistor count, manufacturing process (speculated to be 5nm or 3nm, given the agreement with TSMC), or performance on standard benchmarks like MLPerf. It is also unclear whether the chip will be available to third parties or exclusive to OpenAI's own services. What is clear is that this move reinforces OpenAI's position as a vertically integrated company, capable of controlling the entire technology stack, from hardware to software and models. However, the path is not without challenges: developing ASICs requires multi-billion dollar investments and timelines of several years, as demonstrated by Google, which took over five years for its TPUs to become profitable. Additionally, reliance on Broadcom for manufacturing and design could create bottlenecks if demand exceeds production capacity.

"Jalapeño is not just a chip; it's a statement of intent: OpenAI wants to dictate the rules of the game in AI infrastructure," notes an industry analyst.

In summary, Jalapeño could be a turning point in the AI industry, but many questions remain unanswered. The coming months will be crucial to see how this chip impacts the market and whether it truly meets performance and efficiency expectations. Meanwhile, NVIDIA is not standing still: it has announced its new Blackwell architecture, promising a leap in inference performance, and companies like AMD and Qualcomm are also developing specific solutions. Competition is intensifying, and the ultimate winner will be the one that achieves the best balance of cost, performance, and scalability.

OpenAI and Broadcom Unveil Jalapeño: Custom Chip for LLM Inference

What happened?

Why is it important?

Consequences and context

What readers should know

Keep reading