The pufferfish is poisonous, but when handled correctly it becomes one of the most prized delicacies in Japanese cuisine. A perfect name for an AI product that does exactly the same thing with language models: it takes them all, orchestrates them with precision, and produces something better than any single ingredient on its own.
Sakana Fugu is the new system from Sakana AI, a startup founded in Tokyo in 2023 by former researchers from Google DeepMind and Google Brain. And what they have built is conceptually different from anything we have seen so far.
It’s not a model. It’s a conductor.
Fugu is not a single LLM. It is a multi-agent system that, for each task received, dynamically decides which models to activate, in what role, and how to make them collaborate. The architecture derives from two papers published at ICLR 2026:
- TRINITY: a lightweight evolved coordinator that assigns models the roles of Thinker, Worker, or Verifier, adapting work delegation across coding, math, reasoning and knowledge tasks
- Conductor: a system trained with reinforcement learning that learns natural-language coordination strategies
The key insight is that no human designed the collaboration strategies. The system learned them on its own.
The numbers that raise eyebrows
Fugu is available in two versions — Fugu and Fugu Ultra — and the benchmarks published by Sakana AI are remarkable:
| Benchmark | Fugu Ultra | Opus 4.8 | Gemini 3.1 Pro | GPT 5.5 |
|---|---|---|---|---|
| SWE Bench Pro | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 82.1 | 74.6 | 70.3 | 78.2 |
| LiveCodeBench | 93.2 | 87.8 | 88.5 | 85.3 |
| Humanity’s Last Exam | 50.0 | 49.8 | 44.4 | 41.4 |
| GPQA-D | 95.5 | 92.0 | 94.3 | 93.6 |
And the most striking claim: Sakana AI explicitly states that Fugu stands shoulder to shoulder with Fable 5 and Mythos Preview — Anthropic’s most powerful models — even though neither is in Fugu’s agent pool (they are not publicly accessible).
The concrete use cases that impress
Benchmarks are one thing. But the qualitative tests published are even more striking:
Rubik’s Cube from scratch: Fugu Ultra wrote a solver in pure Python (no libraries) that solved 300 randomly scrambled cubes with an average of 19.72 moves. Competing models either crashed (0/300 solutions) or were marginally worse.
Reading 17th-century Japanese manuscripts: given a manuscript in scattered kana writing (chirashigaki), Fugu Ultra reconstructed the reading order with a NED score of 0.80 versus 0.24 for the best competitor — a task that challenges even experts in Japanese palaeography.
AutoResearch on an H100 GPU: over 14 hours and 123 experiments, Fugu Ultra autonomously improved the training recipe of a GPT model, achieving the best mean BPB among all competitors.
50-week stock trading: starting from $10,000, Fugu Ultra achieved a mean return of +19.43% versus less than 15% for all other models.
The detail that changes everything for European companies
There is a note on the Sakana Fugu website that deserves careful reading:
“Frontier capability without the risk of export controls.”
Fugu is a Japanese system. It is not subject to US export controls that restrict access to the most advanced American models in certain regulatory contexts. For European or Asian companies operating in sensitive sectors, this is not a minor detail.
That said, there is another important note: Fugu is not currently available in the EU/EEA, while Sakana AI works towards GDPR compliance. So for now, for our European users, it is a product to watch — not yet one to use.
What this means for the enterprise AI world
Fugu confirms a direction that is becoming increasingly clear: the future is not the biggest model, but the smartest orchestration.
Sakana AI’s approach — teaching the system to coordinate agents rather than manually designing workflows — is the same principle behind why AIDeskPro introduced automatic fallback between providers. It’s not about choosing the single best model. It’s about having the right system make the right choice at the right moment.
We will be watching Sakana Fugu closely. When it arrives in Europe, it will be an interesting conversation.
Original source: Sakana Fugu — sakana.ai

