Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

alphaXiv

32,147 subscribers

52,395 views • 9 months ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

STARBOY perceives its environment through a camera, microphone, temperature sensor, and accelerometer. We trained multiple tiny image models that run locally on the device, letting it recognize human faces and hand gestures.

STARBOY perceives its environment through a camera, microphone, temperature sensor, and accelerometer. We trained multiple tiny image models that run locally on the device, letting it recognize human faces and hand gestures.

Daniel Kuntz

115,240 views • 4 months ago

We just shipped support for tool use and JSON mode for DeepSeek R1 Distil-Llama 70B on Groq Inc. 🛠️ The coolest part is that our API now has a `reasoning_format` parameter that lets you control how the model outputs its thought process.

We just shipped support for tool use and JSON mode for DeepSeek R1 Distil-Llama 70B on Groq Inc. 🛠️ The coolest part is that our API now has a `reasoning_format` parameter that lets you control how the model outputs its thought process.

Hatice Ozen

43,225 views • 1 year ago

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

AK

50,995 views • 1 year ago

Average US AI lab: > builds the strongest model > gets scared of its own model > spends months arguing about safety > fights other AI labs over who has the strongest model > delays the release for more safety testing > regulators ask for even more safety > switches users to a dumber model > keeps it closed source Average Chinese AI lab: > trains Kimi K3 > builds a 2.8T parameter monster > gives it a 1M context window > beats GPT-5.6 and Claude Fable 5 on some tasks > open-sources the weights > publishes the architecture > refuses to elaborate > leaves

Average US AI lab: > builds the strongest model > gets scared of its own model > spends months arguing about safety > fights other AI labs over who has the strongest model > delays the release for more safety testing > regulators ask for even more safety > switches users to a dumber model > keeps it closed source Average Chinese AI lab: > trains Kimi K3 > builds a 2.8T parameter monster > gives it a 1M context window > beats GPT-5.6 and Claude Fable 5 on some tasks > open-sources the weights > publishes the architecture > refuses to elaborate > leaves

Tornado guy

160,113 views • 12 days ago

Microsoft made 100B parameter models run on a single CPU. bitnet.cpp: The official inference framework for 1-bit LLMs. The math behind 1-bit LLMs is what makes them revolutionary. Traditional LLMs use 16-bit floating point weights. Every parameter is a number like 0.0023847 or -1.4729. When you run inference, you multiply these floats together. Billions of times. That's why you need GPUs, they're optimized for floating point matrix multiplication. BitNet b1.58 uses ternary weights: {-1, 0, 1}. That's not a simplification. That's a fundamental change in the math. When your weights are only -1, 0, or 1: → Multiply by 1 = keep the value → Multiply by -1 = flip the sign → Multiply by 0 = skip entirely Matrix multiplication becomes addition and subtraction. No floating point operations. No GPU required. This is why bitnet.cpp achieves: → 2.37x to 6.17x speedup on x86 CPUs → 1.37x to 5.07x speedup on ARM CPUs → 71.9% to 82.2% energy reduction on x86 → 55.4% to 70.0% energy reduction on ARM The speedups scale with model size. Larger models see bigger gains because there are more operations to simplify. A 100B parameter model running at human reading speed (5-7 tokens/second) on a single CPU. That's not optimization. That's a different paradigm. Why 1.58 bits? Because log₂(3) ≈ 1.58. Three possible values = 1.58 bits of information per weight. The key insight: These models aren't quantized after training. They're trained from scratch with ternary weights. The model learns to work within the constraint. No precision loss. No quality tradeoff.

Microsoft made 100B parameter models run on a single CPU. bitnet.cpp: The official inference framework for 1-bit LLMs. The math behind 1-bit LLMs is what makes them revolutionary. Traditional LLMs use 16-bit floating point weights. Every parameter is a number like 0.0023847 or -1.4729. When you run inference, you multiply these floats together. Billions of times. That's why you need GPUs, they're optimized for floating point matrix multiplication. BitNet b1.58 uses ternary weights: {-1, 0, 1}. That's not a simplification. That's a fundamental change in the math. When your weights are only -1, 0, or 1: → Multiply by 1 = keep the value → Multiply by -1 = flip the sign → Multiply by 0 = skip entirely Matrix multiplication becomes addition and subtraction. No floating point operations. No GPU required. This is why bitnet.cpp achieves: → 2.37x to 6.17x speedup on x86 CPUs → 1.37x to 5.07x speedup on ARM CPUs → 71.9% to 82.2% energy reduction on x86 → 55.4% to 70.0% energy reduction on ARM The speedups scale with model size. Larger models see bigger gains because there are more operations to simplify. A 100B parameter model running at human reading speed (5-7 tokens/second) on a single CPU. That's not optimization. That's a different paradigm. Why 1.58 bits? Because log₂(3) ≈ 1.58. Three possible values = 1.58 bits of information per weight. The key insight: These models aren't quantized after training. They're trained from scratch with ternary weights. The model learns to work within the constraint. No precision loss. No quality tradeoff.

Tech with Mak

23,036 views • 3 months ago

its the way snghwa only said “that looks good” and hngjng immediately started scanning through the menu looking for the mussels and asked snghwa if he would like it….. to be loved is to be noticed and cared for quietly….. (also snghwa’s tiny nods) 🥺🥺🥺

its the way snghwa only said “that looks good” and hngjng immediately started scanning through the menu looking for the mussels and asked snghwa if he would like it….. to be loved is to be noticed and cared for quietly….. (also snghwa’s tiny nods) 🥺🥺🥺

sankitty

18,028 views • 1 year ago

Today, we are releasing Stable Video Diffusion, our first foundation model for generative AI video based on the image model, Stable Diffusion. As part of this research preview, the code, weights, and research paper are now available. Additionally, today you can sign up for our waitlist to access a new upcoming web experience featuring a Text-To-Video interface. To access the model & sign up for our waitlist, visit our website here:

Today, we are releasing Stable Video Diffusion, our first foundation model for generative AI video based on the image model, Stable Diffusion. As part of this research preview, the code, weights, and research paper are now available. Additionally, today you can sign up for our waitlist to access a new upcoming web experience featuring a Text-To-Video interface. To access the model & sign up for our waitlist, visit our website here:

Stability AI

1,024,532 views • 2 years ago

🚀 Self-speculation brings 6.75x real speedup for LLM generation with SGLang inference! Same model drafts future tokens in Diffusion mode → then verifies them in AR (causal) mode. One model and one KV cache. Just different attention masks. Thanks to perfect alignment, we get 2× longer acceptance lengths than MTP techniques (Eagle-3, MTP, dFlash). We run 2 forward passes… but the 2× higher acceptance means we break even - and with zero overhead from extra drafter, KV cache, or LM head that comes with MTP - those are not free. Last week we released Nemotron-Labs-Diffusion + Tri-mode LLMs! We did continued pre-training on Ministral-3 models by switching attention patterns (block causal bidirectional). Result: one model that runs AR mode, Diffusion mode, and Self-Speculation. Diffusion mode already shows high benchmark accuracy - excited to see what happens when someone beats left-to-right acceptance! 🔥 Github: Paper: SGLang inference: Try the models on HF:

🚀 Self-speculation brings 6.75x real speedup for LLM generation with SGLang inference! Same model drafts future tokens in Diffusion mode → then verifies them in AR (causal) mode. One model and one KV cache. Just different attention masks. Thanks to perfect alignment, we get 2× longer acceptance lengths than MTP techniques (Eagle-3, MTP, dFlash). We run 2 forward passes… but the 2× higher acceptance means we break even - and with zero overhead from extra drafter, KV cache, or LM head that comes with MTP - those are not free. Last week we released Nemotron-Labs-Diffusion + Tri-mode LLMs! We did continued pre-training on Ministral-3 models by switching attention patterns (block causal bidirectional). Result: one model that runs AR mode, Diffusion mode, and Self-Speculation. Diffusion mode already shows high benchmark accuracy - excited to see what happens when someone beats left-to-right acceptance! 🔥 Github: Paper: SGLang inference: Try the models on HF:

Pavlo Molchanov

66,554 views • 2 months ago

🇯🇵 A JAPANESE LAB SAYS IT MATCHED FABLE 5 LEVELS. While the world braced for China to crack the frontier, a Tokyo lab did something stranger. Sakana AI, co-founded by one of the authors of the original Transformer paper, launched Fugu today. It's not a bigger model. It's a conductor. One API that quietly assembles a team of other models, splits the work, verifies it, and hands you one answer. Sakana claims its top tier matches Anthropic's Fable 5 and Mythos.

🇯🇵 A JAPANESE LAB SAYS IT MATCHED FABLE 5 LEVELS. While the world braced for China to crack the frontier, a Tokyo lab did something stranger. Sakana AI, co-founded by one of the authors of the original Transformer paper, launched Fugu today. It's not a bigger model. It's a conductor. One API that quietly assembles a team of other models, splits the work, verifies it, and hands you one answer. Sakana claims its top tier matches Anthropic's Fable 5 and Mythos.

CryptoGoos

27,439 views • 1 month ago

🚨VITALIK: AGI MAY MARK THE POINT OF NO RETURN FOR HUMAN DOMINANCE! Vitalik warns in a new thread that AI could soon cover nearly all human capabilities, leaving us without unique advantages. He defines AGI as systems that could independently sustain civilization if humans disappeared, and stresses we're gambling on whether LLMs and tools will get there, or if deeper integration keeps human agency alive. He favors slowdowns and open weights to navigate safely.

🚨VITALIK: AGI MAY MARK THE POINT OF NO RETURN FOR HUMAN DOMINANCE! Vitalik warns in a new thread that AI could soon cover nearly all human capabilities, leaving us without unique advantages. He defines AGI as systems that could independently sustain civilization if humans disappeared, and stresses we're gambling on whether LLMs and tools will get there, or if deeper integration keeps human agency alive. He favors slowdowns and open weights to navigate safely.

Crypto Banter

53,713 views • 8 days ago

I'll always root for a team that open-sources its best work, and Robbyant just did it properly. Robbyant, Ant Group's embodied-AI company, released LingBot-Vision, a vision foundation model for robots, and the part I love is the data. They trained it on 161M images, filtered down from 2B raw ones and mostly pulled straight from the open web, with no human labels, no edge detectors, no depth sensors anywhere in the loop. It learns the exact edges of objects from raw pixels. That's roughly a tenth of the data DINOv3 saw, and under a third of the training. And it shows in the results. On depth, working out how far away things are, the 1B model edges out a 7B on NYU-Depth. It also powers LingBot-Depth 2.0, which reads the surfaces cameras usually choke on, glass and mirrors, and halves indoor depth error. LingBot-Vision is fully open. Weights from the 1.1B flagship down to a tiny 21M version, code, and the paper. This is the timeline I want more of. Robbyant

I'll always root for a team that open-sources its best work, and Robbyant just did it properly. Robbyant, Ant Group's embodied-AI company, released LingBot-Vision, a vision foundation model for robots, and the part I love is the data. They trained it on 161M images, filtered down from 2B raw ones and mostly pulled straight from the open web, with no human labels, no edge detectors, no depth sensors anywhere in the loop. It learns the exact edges of objects from raw pixels. That's roughly a tenth of the data DINOv3 saw, and under a third of the training. And it shows in the results. On depth, working out how far away things are, the 1B model edges out a 7B on NYU-Depth. It also powers LingBot-Depth 2.0, which reads the surfaces cameras usually choke on, glass and mirrors, and halves indoor depth error. LingBot-Vision is fully open. Weights from the 1.1B flagship down to a tiny 21M version, code, and the paper. This is the timeline I want more of. Robbyant

Chubby♨️

48,249 views • 20 days ago

Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens (either with a random token or a semantically similar token). I wondered how they were getting such high throughput with the latter noising process, since I believed that optimizing inference with KVCache approximation would be more difficult (for various reasons). I visualized this noising process with tiny-diffusion and compared it to normal unmasking, and was very surprised to see how fast the generation “settles” into a reasonable output, and then only slightly refines afterwards, requiring much fewer steps in total. Unmasking (where tokens are never remasked, the typical implementation) is inherently limited in generation speed by the fact that an increase in tokens decoded per step leads to more errors due to the mismatch between individual and marginal token probability distributions we sample from. The token replacement noising process seems to have a much different set of characteristics. Because we sample each token per step, every token makes “progress” towards the final output each iteration (in addition to *potentially* giving other tokens more information in future steps). Generally, masking has outperformed other noising processes, which is probably why most research focused on it (using smaller models). But the paper referred to in the retweet shows that random replacement as a noising process may scale better as model size increases. Big labs might have noticed these results much earlier (due to having drastically more training resources and being able to test larger models), which may explain the discrepancy in the choice of noising process. I’m gonna test this with larger models, since tiny-diffusion only has 10M parameters.

Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens (either with a random token or a semantically similar token). I wondered how they were getting such high throughput with the latter noising process, since I believed that optimizing inference with KVCache approximation would be more difficult (for various reasons). I visualized this noising process with tiny-diffusion and compared it to normal unmasking, and was very surprised to see how fast the generation “settles” into a reasonable output, and then only slightly refines afterwards, requiring much fewer steps in total. Unmasking (where tokens are never remasked, the typical implementation) is inherently limited in generation speed by the fact that an increase in tokens decoded per step leads to more errors due to the mismatch between individual and marginal token probability distributions we sample from. The token replacement noising process seems to have a much different set of characteristics. Because we sample each token per step, every token makes “progress” towards the final output each iteration (in addition to potentially giving other tokens more information in future steps). Generally, masking has outperformed other noising processes, which is probably why most research focused on it (using smaller models). But the paper referred to in the retweet shows that random replacement as a noising process may scale better as model size increases. Big labs might have noticed these results much earlier (due to having drastically more training resources and being able to test larger models), which may explain the discrepancy in the choice of noising process. I’m gonna test this with larger models, since tiny-diffusion only has 10M parameters.

nathan (in sf)

40,440 views • 6 months ago

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: PS: I think the Harness[Model] notation is kind of nice.

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: PS: I think the Harness[Model] notation is kind of nice.

Leandro von Werra

97,181 views • 2 months ago

Many of you asked for code & weights for π₀, we are happy to announce that we are releasing π₀ and pre-trained checkpoints in our new openpi repository! We tested the model on a few public robots, and we include code for you to fine-tune it yourself.

Many of you asked for code & weights for π₀, we are happy to announce that we are releasing π₀ and pre-trained checkpoints in our new openpi repository! We tested the model on a few public robots, and we include code for you to fine-tune it yourself.

Physical Intelligence

441,382 views • 1 year ago

I was looking for a cheaper way to test Seedance 2.0. Then I found ToAPIs It gives access to AI models like gpt image 2, Seedance, Kling, and more through one API gateway — with selected models up to 80% cheaper. Seedance 2.0 starts at around $0.06/s for 720p！ The best part? Pay as you go. No subscription.

I was looking for a cheaper way to test Seedance 2.0. Then I found ToAPIs It gives access to AI models like gpt image 2, Seedance, Kling, and more through one API gateway — with selected models up to 80% cheaper. Seedance 2.0 starts at around $0.06/s for 720p！ The best part? Pay as you go. No subscription.

MrDejie

27,346 views • 1 month ago

Moonshot AI is casually giving developers free daily access to Kimi K3 😳 no subscription no upfront payment just sign in and start using one of the largest open AI models available what you get for $0: - Kimi K3 with 2.8T parameters - 1M token context window - strong coding and reasoning performance - native vision capabilities - free daily credits that refresh automatically why this is worth checking: > access a frontier model without paying API fees > long context for large codebases and documents > works on web, desktop, mobile, and CLI getting started takes less than 2 minutes: 1. go to 2. create a free account 3. Kimi K3 is available as the default model 4. start chatting or coding with your daily free credits bonus: Moonshot Together lets you invite friends for a chance to earn 3, 7, 15, 30, or even 365 days of Kimi Membership through its rewards program benchmark highlights: > 2.8T parameter MoE model > 1M context window > strong performance across coding, browsing, and reasoning benchmarks important: free credits reset daily, rate limits apply on the free tier, and the open-weight release is expected on July 27 A simple way to try one of the latest frontier AI models without paying for API access

Moonshot AI is casually giving developers free daily access to Kimi K3 😳 no subscription no upfront payment just sign in and start using one of the largest open AI models available what you get for $0: - Kimi K3 with 2.8T parameters - 1M token context window - strong coding and reasoning performance - native vision capabilities - free daily credits that refresh automatically why this is worth checking: > access a frontier model without paying API fees > long context for large codebases and documents > works on web, desktop, mobile, and CLI getting started takes less than 2 minutes: 1. go to 2. create a free account 3. Kimi K3 is available as the default model 4. start chatting or coding with your daily free credits bonus: Moonshot Together lets you invite friends for a chance to earn 3, 7, 15, 30, or even 365 days of Kimi Membership through its rewards program benchmark highlights: > 2.8T parameter MoE model > 1M context window > strong performance across coding, browsing, and reasoning benchmarks important: free credits reset daily, rate limits apply on the free tier, and the open-weight release is expected on July 27 A simple way to try one of the latest frontier AI models without paying for API access

K2S

22,890 views • 10 days ago

Sora 2, “ family guy dark humor” My personal thoughts: This is clearly the best AI video model for replicating animated shows. This truly gives credence to the fact that within the decade AI generated short films/shows is looking like a real possibility. My important caveats. Model trained only on video can get visually indistinguishable for most viewers most of the time. It can match texture, lighting, motion, and camera language so well that only careful inspection gives it away. However For sustained quality at the level of a full scene or an episode, video alone hits a ceiling IMO. Pure video likelihood drives the model toward what is frequent, not toward the rare timing and payoff choices that make the best jokes land. It has weak grasp of long arc causality, character memory, and joke structure. It also does not see intent, off screen context, or prosody unless you give it those signals. So you get something that looks right but drifts on beats that matter. I don’t know what the potential solution would be other than to have an AGI just animate the show for me. Any others ? Credit for the sora clip: figure

Sora 2, “ family guy dark humor” My personal thoughts: This is clearly the best AI video model for replicating animated shows. This truly gives credence to the fact that within the decade AI generated short films/shows is looking like a real possibility. My important caveats. Model trained only on video can get visually indistinguishable for most viewers most of the time. It can match texture, lighting, motion, and camera language so well that only careful inspection gives it away. However For sustained quality at the level of a full scene or an episode, video alone hits a ceiling IMO. Pure video likelihood drives the model toward what is frequent, not toward the rare timing and payoff choices that make the best jokes land. It has weak grasp of long arc causality, character memory, and joke structure. It also does not see intent, off screen context, or prosody unless you give it those signals. So you get something that looks right but drifts on beats that matter. I don’t know what the potential solution would be other than to have an AGI just animate the show for me. Any others ? Credit for the sora clip: figure

Chris

150,286 views • 9 months ago

Hey influencers! 🌻✨ Do you love puzzles, exploration and plants? Together with Streamers Connected we are looking for streamers and content creators that would like to play Botany Manor a little ahead of its release! Feel free to share and retweet, more details in the tweet below 👇

Hey influencers! 🌻✨ Do you love puzzles, exploration and plants? Together with Streamers Connected we are looking for streamers and content creators that would like to play Botany Manor a little ahead of its release! Feel free to share and retweet, more details in the tweet below 👇

Botany Manor

27,587 views • 2 years ago

The recent Massachusetts Institute of Technology (MIT) CSAIL paper published on Recursive Language Models is a fascinating look into how AI systems reason in 2026. You can check it out here ➡️ → The paper notes that even frontier models suffer from “context rot” as inputs grow longer. More tokens don’t mean more understanding. Instead of compressing or summarising, RLMs “treat long prompts as part of an external environment” …and let the model programmatically inspect, decompose, and recursively requery itself over precise sections. This matters in Web3. Why? Smart contracts are long, stateful, and brittle. One missed assumption = unnecessary hassle. In the video below, we applied RLM principles to improve the prompt shown. The old prompt would try to force multi-step pauses. Therefore breaking the AI's workflow. Now it decomposes requirements, scores confidence for each component, verifies the logic, and then synthesises. Result = Production-ready contracts with flagged risks, not blind single-pass outputs. Try it out with our Smart Contract Generator today!

The recent Massachusetts Institute of Technology (MIT) CSAIL paper published on Recursive Language Models is a fascinating look into how AI systems reason in 2026. You can check it out here ➡️ → The paper notes that even frontier models suffer from “context rot” as inputs grow longer. More tokens don’t mean more understanding. Instead of compressing or summarising, RLMs “treat long prompts as part of an external environment” …and let the model programmatically inspect, decompose, and recursively requery itself over precise sections. This matters in Web3. Why? Smart contracts are long, stateful, and brittle. One missed assumption = unnecessary hassle. In the video below, we applied RLM principles to improve the prompt shown. The old prompt would try to force multi-step pauses. Therefore breaking the AI's workflow. Now it decomposes requirements, scores confidence for each component, verifies the logic, and then synthesises. Result = Production-ready contracts with flagged risks, not blind single-pass outputs. Try it out with our Smart Contract Generator today!

ChainGPT

82,385 views • 6 months ago