Google rations Gemini to Meta due to compute shortage: signs of an AI thirsty for hardware
Lack of processing capacity forces Google to limit Meta's access to its Gemini models, affecting internal projects and revealing AI infrastructure bottlenecks.
July 1, 2026 · 4 min read
TL;DR: Google has rationed Meta's access to its Gemini models due to its inability to provide sufficient computing capacity. The move affects Meta's internal projects and reveals AI infrastructure bottlenecks, with implications for the entire industry.
What happened?
According to the Financial Times and reported by The Next Web, Google has begun rationing access to its Gemini artificial intelligence models for several clients, with Meta being the most affected. The reason: Google cannot provide the computing capacity that Meta demanded. This has caused a domino effect on Mark Zuckerberg's company's internal projects, which rely on Gemini for various applications. Although the exact terms of the agreement between the two companies have not been made public, sources close to the matter indicate that Meta had contracted large-scale inference and training services for its recommendation systems and virtual assistants, and the cap imposed by Google has forced Meta to redirect part of its workload to its own data centers and internal chips.
Why is this important?
This episode highlights that the escalation of generative artificial intelligence is hitting physical limitations: the availability of GPUs and other AI accelerators is not growing at the same pace as demand. Companies like Google, which offer AI cloud services, are forced to prioritize their most strategic clients or their own products, leaving others with fewer resources. Meta, which invests massively in AI (estimated to spend over $35 billion in 2024 on infrastructure alone), partly relies on external infrastructure while developing its own chips (Meta Training and Inference Accelerator, MTIA). This case recalls the GPU shortage that affected OpenAI and Microsoft in 2023, when ChatGPT demand exceeded Azure server capacity, causing wait times and the need to prioritize enterprise clients. The difference now is that rationing occurs between two tech giants, underscoring that even companies with the greatest bargaining power are not exempt from these restrictions.
Market consequences
- Slowdown in AI projects: Companies relying on third-party APIs may see delayed launches or degraded service quality. For example, startups using Gemini for content generation or chatbots could experience latency or stricter rate limits. Meta, for its part, has had to slow down the deployment of AI-based features on its platforms (Facebook, Instagram, WhatsApp).
- Incentive to diversify: Companies like Meta will seek to accelerate the development of their own hardware or turn to alternatives such as AWS (with its Trainium and Inferentia chips), Azure (with Maia NPUs), or chip startups like Cerebras, Groq, or SambaNova. Meta has already announced it is testing its own accelerators for inference and plans to reduce its dependence on external suppliers in the medium term.
- Pressure on prices: The compute shortage could translate into higher costs for AI cloud service customers. Google Cloud has already increased prices for its GPU instances by 10-15% in recent months, according to Bernstein analysts. Additionally, long-term contracts (reservations) are becoming more common, which can exclude startups with tight budgets.
- Reinforcement of vertical integration: Tech giants with the ability to manufacture their own chips (Apple, Google, Amazon, Microsoft) gain an advantage over those dependent on external suppliers. Google, for example, has its TPUs (Tensor Processing Units), which it uses both for its own services and to offer to external clients, but internal demand (including from its own search engine and YouTube) also competes for those resources. Amazon, with its Trainium and Inferentia chips, and Microsoft, with Maia, are following the same strategy. This could create a two-speed market: those with their own chips and those without.
What should readers know?
This is not an isolated case. Already in 2023, OpenAI and Microsoft faced capacity issues to meet ChatGPT demand, leading to waitlists and prioritization of enterprise clients. In 2024, the shortage of NVIDIA's H100 GPUs has been a recurring theme, with delivery times exceeding 6 months. AI infrastructure has become a strategic resource as valuable as data. For startups and companies planning to integrate AI into their products, it is crucial to evaluate the availability and scalability of providers, as well as consider multi-model strategies (using multiple providers) or using lighter models (such as SLM models or distilled versions). It is also advisable to negotiate contracts with guaranteed capacity clauses and explore edge AI options to reduce cloud dependency. The race for AI supremacy is not only algorithmic but also logistical and chip manufacturing. As Jensen Huang, CEO of NVIDIA, noted, 'demand for GPUs is so high that we are building AI factories.' The question is whether supply can keep pace with demand, and this episode between Google and Meta suggests that the answer, at least in the short term, is no.