Sakana AI's banner

Sakana AI

@SakanaAILabs • 72,726 subscribers

Sakana AI is an AI R&D company based in Tokyo. Try Sakana Chat → https://t.co/1m2lSgnfB2

Shorts

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our understanding of emergence, evolution, and intelligence–core principles that can inspire the next generation of AI systems! We proudly collaborated with MIT, OpenAI, Swiss AI Lab IDSIA, and Ken Stanley on this exciting project. Full Paper (Website): Full Paper (arxiv): Code: In this work, we propose a new algorithm called Automated Search for Artificial Life (“ASAL”) to automate the discovery of artificial life using vision-language foundation models. Instead of tediously hand-designing every tiny rule of an Alife simulation, simply describe the space of simulations to search over, and ASAL will automatically discover the most interesting and open-ended artificial lifeforms! Because of the generality of foundation models, ASAL can discover new lifeforms across a diverse range of seminal ALife simulations, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. ASAL even discovered novel cellular automata rules that are more open-ended and expressive than the original Conway’s Game of Life. We believe this new paradigm may reignite ALife research by overcoming the bottleneck of manually designed simulations, thus advancing beyond the limits of human ingenuity.

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our understanding of emergence, evolution, and intelligence–core principles that can inspire the next generation of AI systems! We proudly collaborated with MIT, OpenAI, Swiss AI Lab IDSIA, and Ken Stanley on this exciting project. Full Paper (Website): Full Paper (arxiv): Code: In this work, we propose a new algorithm called Automated Search for Artificial Life (“ASAL”) to automate the discovery of artificial life using vision-language foundation models. Instead of tediously hand-designing every tiny rule of an Alife simulation, simply describe the space of simulations to search over, and ASAL will automatically discover the most interesting and open-ended artificial lifeforms! Because of the generality of foundation models, ASAL can discover new lifeforms across a diverse range of seminal ALife simulations, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. ASAL even discovered novel cellular automata rules that are more open-ended and expressive than the original Conway’s Game of Life. We believe this new paradigm may reignite ALife research by overcoming the bottleneck of manually designed simulations, thus advancing beyond the limits of human ingenuity.

750,610 views

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: Code: Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report:

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: Code: Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report:

359,537 views

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: Code: Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: Code: Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.

403,145 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Sakana AIは、日本の美を学んだAIとして、浮世絵風画像生成モデルEvo-Ukiyoeと、浮世絵カラー化モデルEvo-Nishikieを公開します。ブログ → Evo-Ukiyoe デモ → Evo-Nishikie デモ → 浮世絵は日本の代表的な美術として世界的に人気があるため、画像生成の世界でも「浮世絵」というキーワードがプロンプトによく使われています。しかし生成画像は日本風イラストレーションになりがちで、あまり浮世絵らしくありません。そこで、Sakana AIは、実際の浮世絵により近い画像を生成するため、立命館大学アート・リサーチセンター（ARC）立命館大学アート・リサーチセンター (Art Research Center) のご協力をいただき、ARC所蔵浮世絵作品のデジタル画像を学習した画像生成モデルを開発しました。今回公開するモデルは、プロンプトから画像を生成するEvo-Ukiyoeと、古典籍の挿絵をカラー化するEvo-Nishikieモデルです。これらのモデルが、歴史や文化を学ぶための新たなコンテンツ作成に利用され、浮世絵に関する興味を増すことにつながり、日本や世界の人々が浮世絵や日本文化に興味を持つきっかけを生み出すことを期待しています。なお、Evo-Nishikieでカラー化した古典籍『絵本玉かつら』の全丁は、Center for Open Data in the Humanities (CODH) のページで公開しています。また、モデルは以下のページで公開しています。 Evo-Ukiyoe モデル → Evo-Nishikie モデル →

Sakana AIは、日本の美を学んだAIとして、浮世絵風画像生成モデルEvo-Ukiyoeと、浮世絵カラー化モデルEvo-Nishikieを公開します。ブログ → Evo-Ukiyoe デモ → Evo-Nishikie デモ → 浮世絵は日本の代表的な美術として世界的に人気があるため、画像生成の世界でも「浮世絵」というキーワードがプロンプトによく使われています。しかし生成画像は日本風イラストレーションになりがちで、あまり浮世絵らしくありません。そこで、Sakana AIは、実際の浮世絵により近い画像を生成するため、立命館大学アート・リサーチセンター（ARC）立命館大学アート・リサーチセンター (Art Research Center) のご協力をいただき、ARC所蔵浮世絵作品のデジタル画像を学習した画像生成モデルを開発しました。今回公開するモデルは、プロンプトから画像を生成するEvo-Ukiyoeと、古典籍の挿絵をカラー化するEvo-Nishikieモデルです。これらのモデルが、歴史や文化を学ぶための新たなコンテンツ作成に利用され、浮世絵に関する興味を増すことにつながり、日本や世界の人々が浮世絵や日本文化に興味を持つきっかけを生み出すことを期待しています。なお、Evo-Nishikieでカラー化した古典籍『絵本玉かつら』の全丁は、Center for Open Data in the Humanities (CODH) のページで公開しています。また、モデルは以下のページで公開しています。 Evo-Ukiyoe モデル → Evo-Nishikie モデル →

16,695,407 views • 2 years ago

数週間分の戦略リサーチを、数時間で。あなたのVirtual CSOとして働く、Sakana Marlin。最初のテーマを渡してみる： 🐟

数週間分の戦略リサーチを、数時間で。あなたのVirtual CSOとして働く、Sakana Marlin。最初のテーマを渡してみる： 🐟

48,721 views • 5 days ago

Use Case 3: One-Shot Blindfold Chess Can an AI hold an entire game state in memory without drifting? To test Fugu Ultra’s persona stability and sustained memory, we had it play 4 back-to-back games of blindfold chess. Every model played the same way: no board shown, requiring them to hold the full game state entirely in memory. We matched Fugu Ultra against 3 leading frontier models and a 2100-Elo Stockfish engine. The Results: Fugu Ultra outplayed all 4 opponents. Where the other models eventually drifted or lost track of the board state, Fugu remained accurate, ending every single game in checkmate. Watch the full sequence below to see Fugu capitalize the moment the other models slip.

Use Case 3: One-Shot Blindfold Chess Can an AI hold an entire game state in memory without drifting? To test Fugu Ultra’s persona stability and sustained memory, we had it play 4 back-to-back games of blindfold chess. Every model played the same way: no board shown, requiring them to hold the full game state entirely in memory. We matched Fugu Ultra against 3 leading frontier models and a 2100-Elo Stockfish engine. The Results: Fugu Ultra outplayed all 4 opponents. Where the other models eventually drifted or lost track of the board state, Fugu remained accurate, ending every single game in checkmate. Watch the full sequence below to see Fugu capitalize the moment the other models slip.

125,318 views • 26 days ago

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog Paper Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here:

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog Paper Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here:

292,994 views • 2 months ago

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: Code:

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: Code:

257,494 views • 3 months ago

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production. We believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient! We are excited to publish our paper, The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition. We also release a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer. Paper: Kernel Archive Webpage: HuggingFace Dataset: The AI CUDA Engineer utilizes evolutionary LLM-driven code optimization to autonomously improve the runtime of machine learning operations. Our system is not only able to convert PyTorch code into CUDA kernels, but through the use of evolution, it can also optimize the runtime performance of CUDA kernels, fuse multiple operations, and even discover novel solutions for writing efficient CUDA operations by learning from past innovations! We believe The AI CUDA Engineer opens a new era of AI-driven acceleration of AI and automated inference time optimization. We (Robert Lange, Aaditya Prasad 🇺🇸, sssss, Maxence Faldor, Yujin Tang, hardmaru) are excited to continue Sakana AI's mission of leveraging AI to improve AI.

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production. We believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient! We are excited to publish our paper, The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition. We also release a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer. Paper: Kernel Archive Webpage: HuggingFace Dataset: The AI CUDA Engineer utilizes evolutionary LLM-driven code optimization to autonomously improve the runtime of machine learning operations. Our system is not only able to convert PyTorch code into CUDA kernels, but through the use of evolution, it can also optimize the runtime performance of CUDA kernels, fuse multiple operations, and even discover novel solutions for writing efficient CUDA operations by learning from past innovations! We believe The AI CUDA Engineer opens a new era of AI-driven acceleration of AI and automated inference time optimization. We (Robert Lange, Aaditya Prasad 🇺🇸, sssss, Maxence Faldor, Yujin Tang, hardmaru) are excited to continue Sakana AI's mission of leveraging AI to improve AI.

1,159,053 views • 1 year ago

江戸時代の古文風テキストで会話できるチャットボット「からまる」を公開ブログ：デモ： Sakana AIが江戸時代のテキストで学習した「からまる」は現代日本語で質問すると、江戸時代の世界観と当時の古文風テキストで回答してくれます。「からまる」は、独自に構築した江戸テキストデータセットを元に、一貫して江戸時代の世界観を反映したテキストで回答します。このデータセットは、数千点以上の江戸時代の書物などを元に構築したもので、人間が作成したデータに加えて、今までテキスト化されてこなかった1,000冊以上の書物にもAIくずし字OCRを適用し、新たなデータを作成しました。この膨大なデータをもとに、江戸の世界に関して「からまる」が何を記憶し、回答できるようになったか、ぜひデモで会話しながら確かめてください。また、本モデルは、分野に特化した大規模言語モデルの一例として、数千万文字規模の継続学習でも十分に有用な成果が得られることを示しています。これは、他の分野での同様のニーズへの展開可能性を示唆しています。現代の知識を持ちながら江戸時代の世界観で自然に応答することは、人間には非常に困難です。しかし「からまる」は、それを実現し、時代を超えて過去の文化を身近に体感できるため、研究や教育分野での幅広い活用が期待されます。

江戸時代の古文風テキストで会話できるチャットボット「からまる」を公開ブログ：デモ： Sakana AIが江戸時代のテキストで学習した「からまる」は現代日本語で質問すると、江戸時代の世界観と当時の古文風テキストで回答してくれます。「からまる」は、独自に構築した江戸テキストデータセットを元に、一貫して江戸時代の世界観を反映したテキストで回答します。このデータセットは、数千点以上の江戸時代の書物などを元に構築したもので、人間が作成したデータに加えて、今までテキスト化されてこなかった1,000冊以上の書物にもAIくずし字OCRを適用し、新たなデータを作成しました。この膨大なデータをもとに、江戸の世界に関して「からまる」が何を記憶し、回答できるようになったか、ぜひデモで会話しながら確かめてください。また、本モデルは、分野に特化した大規模言語モデルの一例として、数千万文字規模の継続学習でも十分に有用な成果が得られることを示しています。これは、他の分野での同様のニーズへの展開可能性を示唆しています。現代の知識を持ちながら江戸時代の世界観で自然に応答することは、人間には非常に困難です。しかし「からまる」は、それを実現し、時代を超えて過去の文化を身近に体感できるため、研究や教育分野での幅広い活用が期待されます。

663,790 views • 1 year ago

「Sakana Marlin（サカナ・マーリン）」は、数時間に及ぶ長期の自律推論を特徴としたビジネス向けの自律型リサーチアシスタントです。調査テーマを指示するだけで、最大約8時間にわたって自律的にリサーチを進め、構造化されたサマリースライドと数十ページの調査レポートを生成します。 CSO（Chief Strategy Officer）が数人のチームと数週間かけて行うような戦略調査を、AIが担うことを目指して設計しました。ユーザーが行うのは、最初のテーマ設定だけです。あとは Sakana Marlinが仮説の立案・情報収集・検証を自律的に繰り返しながら、膨大な情報の中から論点を掘り下げていきます。その土台にあるのは、Sakana AIがこれまで研究してきた長期推論や、複数のモデルを協調させて推論能力を高めるAB-MCTSといった技術です。同時に Sakana Marlinは、机上の研究だけでなく、国内の各産業でAIエージェントを実装してきた事業開発の現場から生まれたプロダクトでもあります。研究と実務の両方で培ってきたものが、ひとつのプロダクトになりました。セルフサーブで即日ご利用いただけます。月額無料のPay per useから、Pro・Team・Enterpriseまでのプランをご用意しています。 Sakana Marlinを皮切りに、Sakana AIは今後も、様々なプロダクトをリリースしていきます。詳細はこちら：

「Sakana Marlin（サカナ・マーリン）」は、数時間に及ぶ長期の自律推論を特徴としたビジネス向けの自律型リサーチアシスタントです。調査テーマを指示するだけで、最大約8時間にわたって自律的にリサーチを進め、構造化されたサマリースライドと数十ページの調査レポートを生成します。 CSO（Chief Strategy Officer）が数人のチームと数週間かけて行うような戦略調査を、AIが担うことを目指して設計しました。ユーザーが行うのは、最初のテーマ設定だけです。あとは Sakana Marlinが仮説の立案・情報収集・検証を自律的に繰り返しながら、膨大な情報の中から論点を掘り下げていきます。その土台にあるのは、Sakana AIがこれまで研究してきた長期推論や、複数のモデルを協調させて推論能力を高めるAB-MCTSといった技術です。同時に Sakana Marlinは、机上の研究だけでなく、国内の各産業でAIエージェントを実装してきた事業開発の現場から生まれたプロダクトでもあります。研究と実務の両方で培ってきたものが、ひとつのプロダクトになりました。セルフサーブで即日ご利用いただけます。月額無料のPay per useから、Pro・Team・Enterpriseまでのプランをご用意しています。 Sakana Marlinを皮切りに、Sakana AIは今後も、様々なプロダクトをリリースしていきます。詳細はこちら：

47,149 views • 1 month ago

この度、新手法「TAID」を用いて学習された小規模日本語言語モデル「TinySwallow-1.5B」を公開しました。私たちは、大規模言語モデル（LLM）の知識を効率的に小規模モデルへ転移させる新しい知識蒸留手法「TAID (Temporally Adaptive Interpolated Distillation)」を開発しました。この手法では、小規模モデルの学習進度に合わせて大規模モデルの知識を転移させることで、効果的な知識転移を実現します。この研究は機械学習分野の国際会議ICLR 2025に採択されました。論文: GitHub: そして、TAIDを用いて32BパラメータのLLMから約1/20の大きさの1.5Bパラメータの小規模言語モデルへ知識転移を行い、同規模のモデルの中で最高性能となる日本語モデル「TinySwallow-1.5B」を作り出すことに成功しました。小規模サイズである「TinySwallow-1.5B」は、外部APIなどを介さずお手元のスマートフォンやPCで完結したチャットが可能です。下記のウェブアプリのリンクから、ブラウザ上で動作するチャットアプリをお試しいただけます。デモ: GitHub: モデル:

この度、新手法「TAID」を用いて学習された小規模日本語言語モデル「TinySwallow-1.5B」を公開しました。私たちは、大規模言語モデル（LLM）の知識を効率的に小規模モデルへ転移させる新しい知識蒸留手法「TAID (Temporally Adaptive Interpolated Distillation)」を開発しました。この手法では、小規模モデルの学習進度に合わせて大規模モデルの知識を転移させることで、効果的な知識転移を実現します。この研究は機械学習分野の国際会議ICLR 2025に採択されました。論文: GitHub: そして、TAIDを用いて32BパラメータのLLMから約1/20の大きさの1.5Bパラメータの小規模言語モデルへ知識転移を行い、同規模のモデルの中で最高性能となる日本語モデル「TinySwallow-1.5B」を作り出すことに成功しました。小規模サイズである「TinySwallow-1.5B」は、外部APIなどを介さずお手元のスマートフォンやPCで完結したチャットが可能です。下記のウェブアプリのリンクから、ブラウザ上で動作するチャットアプリをお試しいただけます。デモ: GitHub: モデル:

560,194 views • 1 year ago

“When AI Discovers the Next Transformer” Robert Lange (Sakana AI) joins Tim Scarfe (Machine Learning Street Talk) to discuss Shinka Evolve, a framework that combines LLMs with evolutionary algorithms to do open-ended program search. Full Video:

“When AI Discovers the Next Transformer” Robert Lange (Sakana AI) joins Tim Scarfe (Machine Learning Street Talk) to discuss Shinka Evolve, a framework that combines LLMs with evolutionary algorithms to do open-ended program search. Full Video:

125,064 views • 4 months ago

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research (now part of BOLD)) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research (now part of BOLD)) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

555,922 views • 2 years ago

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by Akarsh Kumar Full Paper (Website): Full Paper (arxiv): Code:

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by Akarsh Kumar Full Paper (Website): Full Paper (arxiv): Code:

143,831 views • 6 months ago

Introducing Continuous Thought Machines New Blog: Modern AI is powerful, but it’s still distinct from human-like flexible intelligence. We believe neural timing is key. Our Continuous Thought Machine is built from the ground up to use neural dynamics as a powerful representation for intelligence. Thought takes time, and reasoning is a process. Biological brains inspire us with their complex neural activity, where neural timing is critical to intelligence. We’re exploring how to bring that power to AI. The Continuous Thought Machine (CTM) incorporates neuron-level temporal processing and neural synchronization, moving beyond current AI limitations. Our approach has two core innovations: (1) neuron-level temporal processing, where each neuron uses unique parameters to process a history of incoming signals for fine-grained temporal dynamics, and (2) neural synchronization, used as a direct latent representation to modulate data and produce outputs, encoding information directly in the timing of neural activity. Learn more about our approach: Interactive Report: Full Paper: GitHub :

Introducing Continuous Thought Machines New Blog: Modern AI is powerful, but it’s still distinct from human-like flexible intelligence. We believe neural timing is key. Our Continuous Thought Machine is built from the ground up to use neural dynamics as a powerful representation for intelligence. Thought takes time, and reasoning is a process. Biological brains inspire us with their complex neural activity, where neural timing is critical to intelligence. We’re exploring how to bring that power to AI. The Continuous Thought Machine (CTM) incorporates neuron-level temporal processing and neural synchronization, moving beyond current AI limitations. Our approach has two core innovations: (1) neuron-level temporal processing, where each neuron uses unique parameters to process a history of incoming signals for fine-grained temporal dynamics, and (2) neural synchronization, used as a direct latent representation to modulate data and produce outputs, encoding information directly in the timing of neural activity. Learn more about our approach: Interactive Report: Full Paper: GitHub :

289,680 views • 1 year ago

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

237,195 views • 1 year ago

GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9x9 puzzle, we've been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved--approximately 2x the previous leader--and is the first LLM we've tested to solve a 9x9 Sudoku variant. However, with 67% of the much harder puzzles remaining unsolved, Sudoku-Bench continues to present significant challenges for AI reasoning. Modern Sudoku variants require models to first understand novel rulesets through meta-reasoning, then maintain global consistency across long reasoning chains. Our experiments with GRPO fine-tuning on Qwen2.5-7b and "Thought Cloning" (training on expert human reasoning from Cracking the Cryptic) show that current approaches still struggle with the spatial reasoning and creative "break-in" points that human solvers use naturally. We believe new approaches are required to solve our benchmark. These results highlight persistent gaps between computational problem-solving and human-like reasoning, particularly in tasks requiring integrated mathematical logic, spatial awareness, and creative insight. Read more about our update here: 🔗 Blogpost →

GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9x9 puzzle, we've been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved--approximately 2x the previous leader--and is the first LLM we've tested to solve a 9x9 Sudoku variant. However, with 67% of the much harder puzzles remaining unsolved, Sudoku-Bench continues to present significant challenges for AI reasoning. Modern Sudoku variants require models to first understand novel rulesets through meta-reasoning, then maintain global consistency across long reasoning chains. Our experiments with GRPO fine-tuning on Qwen2.5-7b and "Thought Cloning" (training on expert human reasoning from Cracking the Cryptic) show that current approaches still struggle with the spatial reasoning and creative "break-in" points that human solvers use naturally. We believe new approaches are required to solve our benchmark. These results highlight persistent gaps between computational problem-solving and human-like reasoning, particularly in tasks requiring integrated mathematical logic, spatial awareness, and creative insight. Read more about our update here: 🔗 Blogpost →

154,645 views • 8 months ago

Introducing SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Pre-Training Corpora What lies within a trillion-scale pre-training corpus? Can you truly guarantee your benchmarks are uncontaminated simply because there are no exact string matches? Alongside several research institutions in Japan, Sakana AI is proud to have collaborated in the development of SoftMatcha 2, an ultra-fast and flexible search tool that enables search over trillion-scale natural language corpora in under 0.3 seconds, even while handling semantic variations (substitution, insertion, and deletion). No existing tool meets all these criteria, including infini-gram-mini (EMNLP’25 Best Paper) or the original SoftMatcha (ICLR’25). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. As a practical application, we demonstrate that SoftMatcha 2 identifies potential benchmark contamination in pre-training corpora that existing exact-match approaches miss. You can try searching through a 100B-scale corpus via our online demo. The system remains blazingly fast even on trillion-token corpora, so we encourage you to host it yourself for larger scales. Demo: Paper: Code: This work is a collaboration with researchers from the University of Tokyo, NII, Kyoto University, SOKENDAI, NINJAL, Tohoku University, and RIKEN.

Introducing SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Pre-Training Corpora What lies within a trillion-scale pre-training corpus? Can you truly guarantee your benchmarks are uncontaminated simply because there are no exact string matches? Alongside several research institutions in Japan, Sakana AI is proud to have collaborated in the development of SoftMatcha 2, an ultra-fast and flexible search tool that enables search over trillion-scale natural language corpora in under 0.3 seconds, even while handling semantic variations (substitution, insertion, and deletion). No existing tool meets all these criteria, including infini-gram-mini (EMNLP’25 Best Paper) or the original SoftMatcha (ICLR’25). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. As a practical application, we demonstrate that SoftMatcha 2 identifies potential benchmark contamination in pre-training corpora that existing exact-match approaches miss. You can try searching through a 100B-scale corpus via our online demo. The system remains blazingly fast even on trillion-token corpora, so we encourage you to host it yourself for larger scales. Demo: Paper: Code: This work is a collaboration with researchers from the University of Tokyo, NII, Kyoto University, SOKENDAI, NINJAL, Tohoku University, and RIKEN.

101,614 views • 5 months ago

We are excited to share that “Continuous Thought Machines” has been accepted as a Spotlight at #NeurIPS2025! 🧠✨ The CTM is an AI that mimics biological brains by using neural dynamics & synchronization to think over time. It can solve complex mazes by building internal maps, gaze around images to classify them, and learn algorithms—all emergent from its core design. This is just the beginning. A hint of what we're exploring next… (video attached!) The team: Luke Darlow Ciaran@ICML🇰🇷 Sebastian Risi Jeffrey Seely Llion Jones

We are excited to share that “Continuous Thought Machines” has been accepted as a Spotlight at #NeurIPS2025! 🧠✨ The CTM is an AI that mimics biological brains by using neural dynamics & synchronization to think over time. It can solve complex mazes by building internal maps, gaze around images to classify them, and learn algorithms—all emergent from its core design. This is just the beginning. A hint of what we're exploring next… (video attached!) The team: Luke Darlow Ciaran@ICML🇰🇷 Sebastian Risi Jeffrey Seely Llion Jones

168,620 views • 9 months ago

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

179,276 views • 1 year ago

経済産業省/NEDO主導の生成AI開発支援プロジェクト『GENIAC』が完了し、Sakana AIは全10事業者の中から『GENIAC 新規モデル賞』を受賞しました。昨日の成果報告会での録画から、我々の取り組みについてご覧下さい。

経済産業省/NEDO主導の生成AI開発支援プロジェクト『GENIAC』が完了し、Sakana AIは全10事業者の中から『GENIAC 新規モデル賞』を受賞しました。昨日の成果報告会での録画から、我々の取り組みについてご覧下さい。

271,510 views • 1 year ago

We are honored to be featured in the latest @TwoMinutePapers video! You all can watch the full video here: Here’s a short clip from it:

We are honored to be featured in the latest @TwoMinutePapers video! You all can watch the full video here: Here’s a short clip from it:

38,953 views • 2 months ago