正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Let me explain the agent loop, simple It's the core of every agentic system, and the part most people overcomplicate It's just this: 1. Send messages to the model 2. Model responds, maybe calls a tool 3. You run the tool 4. Append the result back to messages 5.... show more

Daniel San

32,970 subscribers

12,514 次观看 • 26 天前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

let me save you 3 hours of head scratching. if you're running local models like Qwen3.5-35B-A3B through Claude Code via llama.cpp's Anthropic endpoint, the chain will break every 3 to 5 minutes. tool call fails. flow stops. you reprompt. it recovers. 2 minutes later it stops again. the model is fine. the harness chokes on local inference latency. switch to OpenCode. same localhost endpoint. same model. same GPU. the chain doesn't break. the tradeoff: OpenCode sometimes loops. the model forgets what it already read and repeats the same tool call. but a loop you can interrupt. a broken chain kills your momentum and you start over. watch both side by side. proprietary agent vs open source agent. same 3B model. different failure modes. pick your poison.

let me save you 3 hours of head scratching. if you're running local models like Qwen3.5-35B-A3B through Claude Code via llama.cpp's Anthropic endpoint, the chain will break every 3 to 5 minutes. tool call fails. flow stops. you reprompt. it recovers. 2 minutes later it stops again. the model is fine. the harness chokes on local inference latency. switch to OpenCode. same localhost endpoint. same model. same GPU. the chain doesn't break. the tradeoff: OpenCode sometimes loops. the model forgets what it already read and repeats the same tool call. but a loop you can interrupt. a broken chain kills your momentum and you start over. watch both side by side. proprietary agent vs open source agent. same 3B model. different failure modes. pick your poison.

Sudo su

72,501 次观看 • 4 个月前

ANTHROPIC JUST DROPPED THE OFFICIAL GUIDE TO PROMPTING FABLE 5. This is the most important prompting framework I've seen. Bookmark this before you forget. Most people treat Fable 5 like a chatbot. That's the mistake. > don't over-engineer prompts — it degrades output. > use /loop for autonomous multi-step work. > give it the goal, not step-by-step commands. > add a memory file. it learns from past runs. > spin up 50+ subagents for complex tasks. Fable 5 isn't an assistant. it's a consultant that leads the work. Read it before you write another prompt. Claude → Fable 5 → Autonomous Work → Real Output → Money

ANTHROPIC JUST DROPPED THE OFFICIAL GUIDE TO PROMPTING FABLE 5. This is the most important prompting framework I've seen. Bookmark this before you forget. Most people treat Fable 5 like a chatbot. That's the mistake. > don't over-engineer prompts — it degrades output. > use /loop for autonomous multi-step work. > give it the goal, not step-by-step commands. > add a memory file. it learns from past runs. > spin up 50+ subagents for complex tasks. Fable 5 isn't an assistant. it's a consultant that leads the work. Read it before you write another prompt. Claude → Fable 5 → Autonomous Work → Real Output → Money

Kirill

392,673 次观看 • 3 天前

this is the next $100B opportunity in ai , most will miss it's harness engineering what this agentic engineer reveals is insane >The model is almost irrelevant. The harness is everything >every failure is a signal about what the environment needs. >when agent throughput far exceeds human attention, corrections are cheap and waiting is expensive most people will ignore and bookmark. be different.

this is the next $100B opportunity in ai , most will miss it's harness engineering what this agentic engineer reveals is insane >The model is almost irrelevant. The harness is everything >every failure is a signal about what the environment needs. >when agent throughput far exceeds human attention, corrections are cheap and waiting is expensive most people will ignore and bookmark. be different.

Avid

497,428 次观看 • 3 个月前

Do you use Runway Gen-2 to make AI videos? This is for you! I built a simple tool to help "extend" Runway videos beyond the 4 second clip limit. Just choose a video, and the tool will give you the final frame -- so that you can drop this back into Runway, and generate another AI video based on it. Let me know if this is useful to you! Link is at bottom of the video and it's live to try right now.

Do you use Runway Gen-2 to make AI videos? This is for you! I built a simple tool to help "extend" Runway videos beyond the 4 second clip limit. Just choose a video, and the tool will give you the final frame -- so that you can drop this back into Runway, and generate another AI video based on it. Let me know if this is useful to you! Link is at bottom of the video and it's live to try right now.

Benjamin De Kraker

68,384 次观看 • 2 年前

this is the important part to understand what is going on. you can see the front wheel lose traction then the back of the car dips due to weight transfer. it was AFTER that the ICE agent draws his weapon.

this is the important part to understand what is going on. you can see the front wheel lose traction then the back of the car dips due to weight transfer. it was AFTER that the ICE agent draws his weapon.

Phil Labonte 🇺🇸

85,919 次观看 • 5 个月前

This is how you unlock the next billion software developers. The new Replit ⠕ Agent 3 (they just launched) is the most advanced vibe-coding agent in the world. 1. Smarter than any other vibe-coding model (10x more autonomous than the previous version). 2. It thinks harder and lasts longer than any other model (up to 200 minutes running fully autonomously). 3. The agent can now use an actual browser to test and fix its own code. 4. 3x faster and 10x more cost-effective than any other "Computer Use" for testing. 5. It can build other agents and automations to take care of repetitive tasks. Seeing the agent test the application autonomously is science fiction!

This is how you unlock the next billion software developers. The new Replit ⠕ Agent 3 (they just launched) is the most advanced vibe-coding agent in the world. 1. Smarter than any other vibe-coding model (10x more autonomous than the previous version). 2. It thinks harder and lasts longer than any other model (up to 200 minutes running fully autonomously). 3. The agent can now use an actual browser to test and fix its own code. 4. 3x faster and 10x more cost-effective than any other "Computer Use" for testing. 5. It can build other agents and automations to take care of repetitive tasks. Seeing the agent test the application autonomously is science fiction!

Santiago

167,056 次观看 • 9 个月前

Probably you know them already, the tips works to me are -Simplify the shapes of the character and animate first that simple forms -Create model sheets and trace it -Use the key draws to create the inbetweens -Sculpt a simple model, pose it and copy what you see (1/2) +

Probably you know them already, the tips works to me are -Simplify the shapes of the character and animate first that simple forms -Create model sheets and trace it -Use the key draws to create the inbetweens -Sculpt a simple model, pose it and copy what you see (1/2) +

Tlauz - Open Comms

357,382 次观看 • 1 年前

Don't train the model, evolve the harness. I read a brilliant blog post from Hugging Face where they took a frozen open model scoring 0% on a hard legal agent benchmark, left its weights alone, and let an automated loop rewrite only the code around it. That code layer is the harness, the runtime wrapper that feeds the model context, runs its tool calls, and decides when a run ends. By the time the loop finished, the system had essentially matched Sonnet 4.6 on the benchmark's headline metric, at roughly 7x lower cost per task. Zero weights changed. The gain existed because of where the model was failing. The judge only grades files saved in the right place under the exact requested filename, and the model kept doing the legal analysis correctly, then saving it under the wrong name, dropping it in a scratch folder, or never writing it at all. So the 0% was never measuring legal reasoning. It was measuring the harness. Hand-tuning that layer is slow and model-specific, so they automated it. A Claude proposer adds exactly one mechanism per iteration, and an outer loop keeps it only if it clearly beats the current best, so accepted mechanisms compound. What the loop discovered says a lot about where agents actually fail. → The biggest single gain was file handling, not intelligence. An automatic step that lands the deliverable exactly where the judge expects it beat every prompt change, with zero extra model tokens. → Code fixes transferred across models, prompt playbooks did not. The same harness lifted a smaller model from the same family by 14 points, but the tuned prompts hurt a different model family on tasks it could already finish. → The harness mattered more than anything else. Same model, same judge, same tasks, and five different harnesses scored anywhere between 3.5% and 80.1%. The gains do eventually flatten, and the remaining misses look like real capability gaps. At some point the wrapper runs out of tricks and the model has to carry the work. But the lesson holds. A benchmark score measures the model and its harness together, and until the harness is fixed, it's impossible to know which one failed. I highly recommend reading this: I also wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. The article is quoted below.

Don't train the model, evolve the harness. I read a brilliant blog post from Hugging Face where they took a frozen open model scoring 0% on a hard legal agent benchmark, left its weights alone, and let an automated loop rewrite only the code around it. That code layer is the harness, the runtime wrapper that feeds the model context, runs its tool calls, and decides when a run ends. By the time the loop finished, the system had essentially matched Sonnet 4.6 on the benchmark's headline metric, at roughly 7x lower cost per task. Zero weights changed. The gain existed because of where the model was failing. The judge only grades files saved in the right place under the exact requested filename, and the model kept doing the legal analysis correctly, then saving it under the wrong name, dropping it in a scratch folder, or never writing it at all. So the 0% was never measuring legal reasoning. It was measuring the harness. Hand-tuning that layer is slow and model-specific, so they automated it. A Claude proposer adds exactly one mechanism per iteration, and an outer loop keeps it only if it clearly beats the current best, so accepted mechanisms compound. What the loop discovered says a lot about where agents actually fail. → The biggest single gain was file handling, not intelligence. An automatic step that lands the deliverable exactly where the judge expects it beat every prompt change, with zero extra model tokens. → Code fixes transferred across models, prompt playbooks did not. The same harness lifted a smaller model from the same family by 14 points, but the tuned prompts hurt a different model family on tasks it could already finish. → The harness mattered more than anything else. Same model, same judge, same tasks, and five different harnesses scored anywhere between 3.5% and 80.1%. The gains do eventually flatten, and the remaining misses look like real capability gaps. At some point the wrapper runs out of tricks and the model has to carry the work. But the lesson holds. A benchmark score measures the model and its harness together, and until the harness is fixed, it's impossible to know which one failed. I highly recommend reading this: I also wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. The article is quoted below.

Akshay 🚀

229,774 次观看 • 2 天前

Alright, now that we know *what* an agent is, how does it actually work? When you ask for help on a task, the agent plans a series of steps and executes them directly in the application on your behalf, using the tools it has access to. Say you are booking a local service or trying to organize your inbox (which typically takes multiple steps): the AI model first plans how to achieve the task using its existing knowledge and then interacts with your inbox to execute the task. The agent will continue until it is confident the task has been successfully completed.

Alright, now that we know what an agent is, how does it actually work? When you ask for help on a task, the agent plans a series of steps and executes them directly in the application on your behalf, using the tools it has access to. Say you are booking a local service or trying to organize your inbox (which typically takes multiple steps): the AI model first plans how to achieve the task using its existing knowledge and then interacts with your inbox to execute the task. The agent will continue until it is confident the task has been successfully completed.

Google AI

22,487 次观看 • 7 个月前

Visualizer of our MultiAgentRouter 🤖 The MultiAgentRouter is an all-new multi-agent structure that leverages a hierarchical pattern to select the most specialized agent for your task. Here's how it works: Step 1. You give a task. Step 2. The Boss Agent Routes your task to the most specialized Agent Step 3. The selected agent returns your response! Get started with it now below ⬇️ Thanks to WE!SS for the visualizer!!

Visualizer of our MultiAgentRouter 🤖 The MultiAgentRouter is an all-new multi-agent structure that leverages a hierarchical pattern to select the most specialized agent for your task. Here's how it works: Step 1. You give a task. Step 2. The Boss Agent Routes your task to the most specialized Agent Step 3. The selected agent returns your response! Get started with it now below ⬇️ Thanks to WE!SS for the visualizer!!

swarms

32,313 次观看 • 1 年前

seedance 2.0 + my v2 AI UGC prompting system is giving insane results i spent the last 24 hours generating over 200 seedance 2.0 videos to figure out the best prompting framework system for AI UGC this video was made with 1 prompt and 1 tool, no editing was done to the video this was just a prompt to a video this is by far the best model i've ever used and the craziest part is that it can be fully automated this is the first time we can actually automate high quality ai ugc at this level bytedance owns tiktok so this model is trained on millions of high quality ugc videos. you just need to know how to extract that and call it in your prompt. we are so early... it's insane

seedance 2.0 + my v2 AI UGC prompting system is giving insane results i spent the last 24 hours generating over 200 seedance 2.0 videos to figure out the best prompting framework system for AI UGC this video was made with 1 prompt and 1 tool, no editing was done to the video this was just a prompt to a video this is by far the best model i've ever used and the craziest part is that it can be fully automated this is the first time we can actually automate high quality ai ugc at this level bytedance owns tiktok so this model is trained on millions of high quality ugc videos. you just need to know how to extract that and call it in your prompt. we are so early... it's insane

Miko

81,141 次观看 • 4 个月前

GPT-5.5 + OpenMed Agent planning a 64-step clinical workflow. Watch plans, sub-plans, and tool calls materialize, every step visible, every finalization gated. Medical intelligence on Hugging Face. The loop is the product.

GPT-5.5 + OpenMed Agent planning a 64-step clinical workflow. Watch plans, sub-plans, and tool calls materialize, every step visible, every finalization gated. Medical intelligence on Hugging Face. The loop is the product.

Maziyar PANAHI

25,417 次观看 • 1 个月前

2.2 million TikTok Shop Views Just one of the 348+ videos my AI Agent made today The account is @seedsynergy This is a real life money printer 1. like 2. comment "AI AGENT" & 3. RT I'll send the info on the tool to you. (must be following me Ellie Jones )

2.2 million TikTok Shop Views Just one of the 348+ videos my AI Agent made today The account is @seedsynergy This is a real life money printer 1. like 2. comment "AI AGENT" & 3. RT I'll send the info on the tool to you. (must be following me Ellie Jones )

Ellie Jones

21,882 次观看 • 10 个月前

The architecture of this new world model is one of the most interesting things I've seen lately: Let me first explain how most world models work: They predict and render one frame at a time. If you are navigating in one of these worlds, and you look left, the model draws whatever looks right in the moment. Every time you change your viewpoint, the model has to imagine what should be there again, so it's very common for these models to "forget" what's in the world. For example, if you put a toy on the table, look away, then look back, the toy might not be there anymore. Tripo AI is releasing its Project Eden model, which works very differently: The model builds the world first, and then renders it based on that map. That map holds the real state of the world: the geometry, every object, where things are, what's already happened. The picture you see on screen gets generated from the map. This architecture flips the whole thing. Now, you get the following: 1. The world stops forgetting. Leave, come back, and the toy is still on the table because it lives in the map, not in the last frame you saw. 2. You can edit the world, and those changes persist for anyone who enters later. 3. Multiple people and AI agents can coexist in the world and see it from different perspectives. This is early research, but it's looking really promising. They just raised nearly $200M across two rounds to build it out. Tripo will be at SIGGRAPH 2026 (July 19–23, Los Angeles Convention Center). If you work in 3D, embodied AI, simulation, or anything spatial, go connect with them there.

The architecture of this new world model is one of the most interesting things I've seen lately: Let me first explain how most world models work: They predict and render one frame at a time. If you are navigating in one of these worlds, and you look left, the model draws whatever looks right in the moment. Every time you change your viewpoint, the model has to imagine what should be there again, so it's very common for these models to "forget" what's in the world. For example, if you put a toy on the table, look away, then look back, the toy might not be there anymore. Tripo AI is releasing its Project Eden model, which works very differently: The model builds the world first, and then renders it based on that map. That map holds the real state of the world: the geometry, every object, where things are, what's already happened. The picture you see on screen gets generated from the map. This architecture flips the whole thing. Now, you get the following: 1. The world stops forgetting. Leave, come back, and the toy is still on the table because it lives in the map, not in the last frame you saw. 2. You can edit the world, and those changes persist for anyone who enters later. 3. Multiple people and AI agents can coexist in the world and see it from different perspectives. This is early research, but it's looking really promising. They just raised nearly $200M across two rounds to build it out. Tripo will be at SIGGRAPH 2026 (July 19–23, Los Angeles Convention Center). If you work in 3D, embodied AI, simulation, or anything spatial, go connect with them there.

Santiago

30,104 次观看 • 10 天前

Grok 4.20 with the full 4-agent system is now live inside the X app. Select it right from the model dropdown.

Grok 4.20 with the full 4-agent system is now live inside the X app. Select it right from the model dropdown.

tetsuo

16,788 次观看 • 4 个月前

Good UX design is more important than ever for today’s AI. A model cannot achieve its full potential without the most fluid and intuitive interface. Here’s a first step towards the future of AI-in-the-loop artistic creation. Imagine making every tool in Photoshop feel like this.

Good UX design is more important than ever for today’s AI. A model cannot achieve its full potential without the most fluid and intuitive interface. Here’s a first step towards the future of AI-in-the-loop artistic creation. Imagine making every tool in Photoshop feel like this.

Jim Fan

181,955 次观看 • 3 年前

The hate is justified. This whole run it back gimmick is bad and brings down the whole momentum of the show. Even in Kayfabe , You have to be some kind of a dumbass to run it back instead of eliminating people. Roman knocked the yeet of your face and eliminated you from the Rumble and you are glazing him the next day ?? All that crashout and “im not just an entrance” is literally just a lie. You are the worst wrestler of all time brother.

The hate is justified. This whole run it back gimmick is bad and brings down the whole momentum of the show. Even in Kayfabe , You have to be some kind of a dumbass to run it back instead of eliminating people. Roman knocked the yeet of your face and eliminated you from the Rumble and you are glazing him the next day ?? All that crashout and “im not just an entrance” is literally just a lie. You are the worst wrestler of all time brother.

Popplayzz

735,152 次观看 • 5 个月前

Fable 5 comes back！It can now build playable game prototypes. I think it is actually a signal for where AI coding is going. Making a game is not just “write some code.” Even a small browser game needs: game loop；character movement；collision logic；scoring system；UI states；physics tuning；visual feedback；bug fixing；playtesting This is why game prototyping is a great test for AI models. A model cannot fake it with a pretty answer. Either the game runs, or it does not. What impressed me about Fable 5 is that it is useful for the messy middle: turning an idea into mechanics, turning mechanics into code, debugging broken interactions, and iterating until the prototype feels playable. But here is the practical part: I would not use the strongest model for every step. For game building, I would split the workflow: 1. Fable 5 for game design + architecture 2. a fast coding model for routine implementation 3. a vision-capable model for screenshot/UI feedback 4. a cheaper model for docs, test cases, and small fixes 5. fallback when latency, cost, or output quality becomes a problem That is the real AI coding stack. Not “one magic model does everything.” More like: the right model, for the right task, at the right cost, with fallback when things break. This is why I’ve been looking at ZenMux ZenMux. ZenMux gives developers one gateway to access multiple leading AI models, with OpenAI / Anthropic / Google Vertex compatible APIs, cost tracking, quality benchmarks, auto-routing, and compensation when output quality, latency, or throughput falls short. If AI can now make games, the next question is not just “which model is strongest?” It is:how do we manage the whole model workflow Fable 5 shows the creative ceiling. ZenMux is closer to the infrastructure layer you need when AI coding becomes a real production habit.

Fable 5 comes back！It can now build playable game prototypes. I think it is actually a signal for where AI coding is going. Making a game is not just “write some code.” Even a small browser game needs: game loop；character movement；collision logic；scoring system；UI states；physics tuning；visual feedback；bug fixing；playtesting This is why game prototyping is a great test for AI models. A model cannot fake it with a pretty answer. Either the game runs, or it does not. What impressed me about Fable 5 is that it is useful for the messy middle: turning an idea into mechanics, turning mechanics into code, debugging broken interactions, and iterating until the prototype feels playable. But here is the practical part: I would not use the strongest model for every step. For game building, I would split the workflow: 1. Fable 5 for game design + architecture 2. a fast coding model for routine implementation 3. a vision-capable model for screenshot/UI feedback 4. a cheaper model for docs, test cases, and small fixes 5. fallback when latency, cost, or output quality becomes a problem That is the real AI coding stack. Not “one magic model does everything.” More like: the right model, for the right task, at the right cost, with fallback when things break. This is why I’ve been looking at ZenMux ZenMux. ZenMux gives developers one gateway to access multiple leading AI models, with OpenAI / Anthropic / Google Vertex compatible APIs, cost tracking, quality benchmarks, auto-routing, and compensation when output quality, latency, or throughput falls short. If AI can now make games, the next question is not just “which model is strongest?” It is:how do we manage the whole model workflow Fable 5 shows the creative ceiling. ZenMux is closer to the infrastructure layer you need when AI coding becomes a real production habit.

Rachel🥥

57,766 次观看 • 2 天前

you need to tattoo this Boris Cherny quote into your brain: "coding is the easy part, it's knowing the domain that's the hard part" every week a new startup drops a launch video saying they "killed influencer marketing" or something but they don't get it. creating the thing is NOT the hard part it's understanding what thing you have to create, at what moment, and in what way and that takes years of pattern recognition from actually being in the arena and seeing what works AI can't shortcut that for you

you need to tattoo this Boris Cherny quote into your brain: "coding is the easy part, it's knowing the domain that's the hard part" every week a new startup drops a launch video saying they "killed influencer marketing" or something but they don't get it. creating the thing is NOT the hard part it's understanding what thing you have to create, at what moment, and in what way and that takes years of pattern recognition from actually being in the arena and seeing what works AI can't shortcut that for you

Ole Lehmann

47,738 次观看 • 1 个月前

$We were taught the derivative as a formula to memorise. A definition to recite. A rule to apply. Something that "gives you the slope." But nobody told us what the formula was actually saying. Every symbol is a sentence. Every fraction is a question. Every limit is a story about getting closer and closer to something you can never quite touch. The top of the fraction? That's a change. A difference. A before and after. The bottom? That's how long you waited to see it. The limit? That's you, zooming in, refusing to settle for an approximation - chasing the truth all the way down to an interval so small it almost disappears. Put it all together, and you get the most honest question in calculus: How fast is something changing - right now, in this exact instant? Not on average. Not over a minute. Not eventually. Right now. That's it. That's the derivative. It's not a trick. It's not a rule. It's a beautifully precise way of asking a very human question: what's happening, in this moment? We spent years solving these. Maybe it's time we actually understood them.$

We were taught the derivative as a formula to memorise. A definition to recite. A rule to apply. Something that "gives you the slope." But nobody told us what the formula was actually saying. Every symbol is a sentence. Every fraction is a question. Every limit is a story about getting closer and closer to something you can never quite touch. The top of the fraction? That's a change. A difference. A before and after. The bottom? That's how long you waited to see it. The limit? That's you, zooming in, refusing to settle for an approximation - chasing the truth all the way down to an interval so small it almost disappears. Put it all together, and you get the most honest question in calculus: How fast is something changing - right now, in this exact instant? Not on average. Not over a minute. Not eventually. Right now. That's it. That's the derivative. It's not a trick. It's not a rule. It's a beautifully precise way of asking a very human question: what's happening, in this moment? We spent years solving these. Maybe it's time we actually understood them.

The Math Flow

22,498 次观看 • 1 个月前