23 June 2026 Sakana AI →

Sakana Fugu: the Japanese model that orchestrates other models (and rivals Fable 5)

From Japanese startup Sakana AI comes Fugu: not a single model, but a multi-agent system that dynamically coordinates the world's best LLMs. Benchmarks put it shoulder to shoulder with Fable 5 and Mythos Preview. And it is not subject to US export controls.

The pufferfish is poisonous, but when handled correctly it becomes one of the most prized delicacies in Japanese cuisine. A perfect name for an AI product that does exactly the same thing with language models: it takes them all, orchestrates them with precision, and produces something better than any single ingredient on its own.

Sakana Fugu is the new system from Sakana AI, a startup founded in Tokyo in 2023 by former researchers from Google DeepMind and Google Brain. And what they have built is conceptually different from anything we have seen so far.

It’s not a model. It’s a conductor.

Fugu is not a single LLM. It is a multi-agent system that, for each task received, dynamically decides which models to activate, in what role, and how to make them collaborate. The architecture derives from two papers published at ICLR 2026:

TRINITY: a lightweight evolved coordinator that assigns models the roles of Thinker, Worker, or Verifier, adapting work delegation across coding, math, reasoning and knowledge tasks
Conductor: a system trained with reinforcement learning that learns natural-language coordination strategies

The key insight is that no human designed the collaboration strategies. The system learned them on its own.

The numbers that raise eyebrows

Fugu is available in two versions — Fugu and Fugu Ultra — and the benchmarks published by Sakana AI are remarkable:

Benchmark	Fugu Ultra	Opus 4.8	Gemini 3.1 Pro	GPT 5.5
SWE Bench Pro	73.7	69.2	54.2	58.6
TerminalBench 2.1	82.1	74.6	70.3	78.2
LiveCodeBench	93.2	87.8	88.5	85.3
Humanity’s Last Exam	50.0	49.8	44.4	41.4
GPQA-D	95.5	92.0	94.3	93.6

And the most striking claim: Sakana AI explicitly states that Fugu stands shoulder to shoulder with Fable 5 and Mythos Preview — Anthropic’s most powerful models — even though neither is in Fugu’s agent pool (they are not publicly accessible).

The concrete use cases that impress

Benchmarks are one thing. But the qualitative tests published are even more striking:

Rubik’s Cube from scratch: Fugu Ultra wrote a solver in pure Python (no libraries) that solved 300 randomly scrambled cubes with an average of 19.72 moves. Competing models either crashed (0/300 solutions) or were marginally worse.

Reading 17th-century Japanese manuscripts: given a manuscript in scattered kana writing (chirashigaki), Fugu Ultra reconstructed the reading order with a NED score of 0.80 versus 0.24 for the best competitor — a task that challenges even experts in Japanese palaeography.

AutoResearch on an H100 GPU: over 14 hours and 123 experiments, Fugu Ultra autonomously improved the training recipe of a GPT model, achieving the best mean BPB among all competitors.

50-week stock trading: starting from $10,000, Fugu Ultra achieved a mean return of +19.43% versus less than 15% for all other models.

The detail that changes everything for European companies

There is a note on the Sakana Fugu website that deserves careful reading:

“Frontier capability without the risk of export controls.”

Fugu is a Japanese system. It is not subject to US export controls that restrict access to the most advanced American models in certain regulatory contexts. For European or Asian companies operating in sensitive sectors, this is not a minor detail.

That said, there is another important note: Fugu is not currently available in the EU/EEA, while Sakana AI works towards GDPR compliance. So for now, for our European users, it is a product to watch — not yet one to use.

What this means for the enterprise AI world

Fugu confirms a direction that is becoming increasingly clear: the future is not the biggest model, but the smartest orchestration.

Sakana AI’s approach — teaching the system to coordinate agents rather than manually designing workflows — is the same principle behind why AIDeskPro introduced automatic fallback between providers. It’s not about choosing the single best model. It’s about having the right system make the right choice at the right moment.

We will be watching Sakana Fugu closely. When it arrives in Europe, it will be an interesting conversation.

Original source: Sakana Fugu — sakana.ai