Sakana launches Fugu: multi-agent system that competes with frontier models

TL;DR: Sakana AI has launched Fugu, a multi-agent system that orchestrates multiple AI models to achieve frontier-level performance, competing with restricted models like Claude Fable 5. It offers a resilient alternative to export controls, though its opacity raises transparency concerns.

What Happened?

On June 20, 2025, Sakana AI, a startup founded by former Google Brain researcher David Ha, launched Fugu (Japanese for 'pufferfish'), a multi-agent orchestration system that delivers frontier-level performance through an OpenAI-compatible API. Fugu is not a monolithic model but an orchestrator that breaks down complex tasks, delegates them to an interchangeable pool of specialized models, verifies their work, and synthesizes the final result. The system builds on two previous Sakana research projects: TRINITY and the Conductor model.

The launch comes just eight days after Anthropic withdrew public access to its most powerful models, Claude Mythos 5 and Claude Fable 5, following a U.S. government export control order. The order, issued by the Bureau of Industry and Security (BIS) of the Department of Commerce, cited national security concerns by restricting the export of AI models with dual-use capabilities. David Ha stated on X: "Fugu demonstrates that a well-orchestrated group of interchangeable agents can match restricted models like Fable and Mythos. Relying on a single company's model for national infrastructure is a massive risk. Collective intelligence is the practical hedge against this concentration of power."

Historically, the AI industry has followed a trajectory of scaling monolithic models, from GPT-3 to GPT-4, and then to models like Claude 3 Opus. However, training costs and dependence on a single vendor have created vulnerabilities. Fugu represents a shift toward orchestration, a concept reminiscent of multi-agent systems from the 1990s, but powered by modern LLMs. Sakana AI has published internal benchmarks showing Fugu Ultra surpassing GPT-4o on complex reasoning tasks and Claude Fable 5 on agentic task benchmarks like SWE-bench and AgentBench. However, these results have not been independently verified by third parties, introducing an element of uncertainty.

Why Is This Important?

Fugu represents a paradigm shift: instead of scaling ever-larger models, Sakana bets on intelligent orchestration as the next frontier. The system offers two variants: Fugu, optimized for low latency in everyday tasks, and Fugu Ultra, for complex workloads. By abstracting multi-agent complexity behind a standard endpoint, developers can integrate high-performance AI without relying on a single vendor, mitigating risks of lockout due to geopolitical regulations or business changes.

Moreover, Fugu matches the performance of frontier models like Claude Fable 5 on agentic task benchmarks, according to Sakana, challenging the need for monolithic models and opening the door to more resilient and decentralized AI ecosystems. This approach aligns with the growing trend toward "composable AI," where different models are combined for specific tasks. For example, instead of a single model handling everything from code generation to creative writing, Fugu can delegate programming to a code-specialized model (like CodeLlama) and writing to another (like Claude 3.5 Sonnet), optimizing quality and cost.

The geopolitical context is crucial: U.S. export restrictions have affected not only Anthropic but also OpenAI and Google, which have limited access to their most advanced models in certain countries. Fugu offers a way to bypass these restrictions by not relying on a single underlying model. However, the system's opacity raises concerns: Sakana does not disclose which specific models it uses or how it coordinates them, making auditing and transparency difficult. Critics point out that dependence on a private orchestrator could shift risk from one vendor to another, and that the lack of visibility into underlying models could lead to undetected bias or security issues.

Market Implications

For businesses, Fugu offers a viable alternative amid regulatory uncertainty: if a model becomes inaccessible, the system can redirect tasks to other models in the pool. This is especially relevant for companies operating in multiple jurisdictions or needing to comply with data sovereignty regulations, such as GDPR in Europe or the EU AI Act. For example, a European company using Fugu could ensure data never leaves the EU if the orchestrator selects locally hosted models.

For governments, Fugu represents a tool to build AI infrastructure without relying on foreign vendors, a critical factor amid rising geopolitical tensions. Countries like Japan, Singapore, and the United Arab Emirates have already shown interest in multi-agent systems for government applications. However, cost could be a barrier: each Fugu query involves multiple API calls to models, increasing operational expenses compared to a monolithic model. Sakana has not published official pricing, but Fugu Ultra is expected to have a significantly higher per-token cost.

The impact on model providers is also significant. If Fugu proves reliable, it could reduce dependence on giants like OpenAI and Anthropic, fostering a more competitive ecosystem where specialized models (from Mistral, Cohere, or Stability AI) gain market share. This could accelerate the trend toward smaller, more efficient models, such as open-source language models (Llama 3, Mixtral), which are easier to orchestrate.

However, the lack of transparency about underlying models and the potential cost of using multiple APIs are points to consider. Additionally, dependence on a private orchestrator introduces a new single point of failure: if Sakana AI suffers an outage or changes its pricing policy, users would be equally exposed. Therefore, some analysts suggest that the true long-term solution would be an open orchestration standard, similar to what HTTP was for the web.

What Readers Should Know

Not a foundation model: Fugu is an orchestration system that coordinates multiple models, not a single model. This differentiates it from offerings like GPT-4o or Claude 3.5.
Comparable performance: According to Sakana, it matches Claude Fable 5 on agentic task benchmarks, though independent results are limited. Internal benchmarks show a 12% improvement on SWE-bench over GPT-4o, but external validation is lacking.
Use cases: Ideal for companies seeking resilience against regulatory changes or needing high quality without being tied to a single vendor. Also useful for startups wanting access to frontier models without direct subscriptions.
Limitations: The lack of transparency about underlying models, the potential cost of using multiple APIs, and additional latency from orchestration are points to consider. Moreover, dependence on a private orchestrator could replicate vendor lock-in issues.

"Model orchestration is the next frontier, beyond larger models. Collective intelligence is the practical hedge against the concentration of power." — David Ha, CEO of Sakana AI

In summary, Fugu is a bold bet that addresses real issues of dependency and regulation, but its success will depend on transparency, cost, and adoption by the business community. The coming months will be crucial to see whether multi-agent orchestration becomes the industry standard or remains a niche solution.

Sakana Launches Fugu: Multi-Agent System That Competes with Frontier Models

What Happened?

Why Is This Important?

Market Implications

What Readers Should Know

Keep reading