Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

🚀 Introducing Cover-Agent 🧪 An open-source tool that includes a reimplementation of Meta's TestGen-LLM for automatically enhancing test suites. Manager: "We must improve old test suites for better code coverage. Can you handle it?" Me: "Sure, my favorite task... (Not!) 🤷‍♂️" Meta's team had the idea of using LLMs... show more

Itamar Friedman

6,187 subscribers

139,228 Aufrufe • vor 2 Jahren •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Itamar Friedman

Itamar Friedmanvor 2 Jahren

Original TestGen-LLM paper: Cover-Agent open-source that reimplements TestGen-LLM by Meta:

Profilbild von Itamar Friedman

Itamar Friedmanvor 2 Jahren

Some of the excitement about TestGen-LLM: Check out our blog for more info about TestGen-LLM and Cover-Agent:

Profilbild von Itamar Friedman

Itamar Friedmanvor 2 Jahren

Cover-Agent in action: ‣ Go example: ‣ Python example:

Profilbild von Qodo (formerly Codium)

Qodo (formerly Codium)vor 2 Jahren

Here is a short how-to and demo video:

Profilbild von Dennis

Dennisvor 2 Jahren

this is dope

Profilbild von Eyal Cohen

Eyal Cohenvor 2 Jahren

Will try it out!

Profilbild von Gil Elbaz

Gil Elbazvor 2 Jahren

Looks great! Thnx @itamar_mar

Profilbild von Itamar Friedman

Itamar Friedmanvor 2 Jahren

Thank you! Counting on you to open Issues and maybe even PRs 😃

Profilbild von Hvipublik

Hvipublikvor 2 Jahren

Awesome tool! Can ChatGPT swapped out for some locally running LLM (Mistral of something) for added IP safety?

Profilbild von Itamar Friedman

Itamar Friedmanvor 2 Jahren

Many are asking for it, we will enable it! You are, of course, welcome to contribute a solution that enables that. There are a few examples -- we will follow the solution implemented in PR-Agent

Ähnliche Videos

The first open-source implementation of the paper that will change automatic test generation is now available! In February, Meta published a paper introducing a tool to automatically increase test coverage, guaranteeing improvements over an existing code base. This is a big deal, but Meta didn't release the code. Fortunately, we now have Cover-Agent, an open-source tool you can install that implements Meta's paper to generate unit tests automatically: I recorded a quick video showing Cover-Agent in action. There are two things I want to mention: 1. Automatically generating unit tests is not new, but doing it right is difficult. If you ask ChatGPT to do it, you'll get duplicate, non-working, and meaningless tests that don't improve your code. Meta's solution only generates unique tests that run and increase code coverage. 2. People who write tests before writing the code (TDD) will find this less helpful. That's okay. Not everyone does TDD, but we all need to improve test coverage. There are many good and bad applications of AI, but this is one I'm looking forward to make part of my life.

The first open-source implementation of the paper that will change automatic test generation is now available! In February, Meta published a paper introducing a tool to automatically increase test coverage, guaranteeing improvements over an existing code base. This is a big deal, but Meta didn't release the code. Fortunately, we now have Cover-Agent, an open-source tool you can install that implements Meta's paper to generate unit tests automatically: I recorded a quick video showing Cover-Agent in action. There are two things I want to mention: 1. Automatically generating unit tests is not new, but doing it right is difficult. If you ask ChatGPT to do it, you'll get duplicate, non-working, and meaningless tests that don't improve your code. Meta's solution only generates unique tests that run and increase code coverage. 2. People who write tests before writing the code (TDD) will find this less helpful. That's okay. Not everyone does TDD, but we all need to improve test coverage. There are many good and bad applications of AI, but this is one I'm looking forward to make part of my life.

Santiago

774,488 Aufrufe • vor 2 Jahren

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

Andrew Ng

126,406 Aufrufe • vor 1 Jahr

IBM dropped CUGA, open-source enterprise agent to automate boring tasks 🔥 > given workspace files, it writes and executes code to accomplish any task 🤯 > comes with a ton of tools built for enterprise tasks, supports MCPs > plug in your favorite LLM 👏 here's a small demo where it retrieves info from a file, calculates revenue by writing code, and drafts an e-mail 🤯 they release code, a blog and a demo 🙌🏻 you can run this locally

IBM dropped CUGA, open-source enterprise agent to automate boring tasks 🔥 > given workspace files, it writes and executes code to accomplish any task 🤯 > comes with a ton of tools built for enterprise tasks, supports MCPs > plug in your favorite LLM 👏 here's a small demo where it retrieves info from a file, calculates revenue by writing code, and drafts an e-mail 🤯 they release code, a blog and a demo 🙌🏻 you can run this locally

merve

32,923 Aufrufe • vor 7 Monaten

The #1 problem with coding agents right now: Ask them to solve one problem, and they will make 10 other changes you didn't want. This happens to me every day. It happens to everyone I talk to as well. We have a solution for this now. The team Augment Code released a "Task List" feature for their coding assistant that solves this problem. Augment Code is partnering with me on this post. In case you haven't used them before: • Augment Code is a fully-fledged coding assistant • Their specialty are large projects • Fastest coding indexing I've seen • Has a free forever community edition Now, you can ask their coding agent to generate a Task List before doing anything. This will give you a plan you can review, edit, and augment if you need to. You can export this plan, load it on a different session, or even share it across projects. It makes a huge difference: The task list constrains the agent so you won't get any "unintended" changes anymore. It also puts you in control of everything the agent does. Check the video to see the agent working through a task list. You can also try this 100% free: (By the way, they also have support for remote agents. You can basically have those agents write your code while you are sleeping.)

The #1 problem with coding agents right now: Ask them to solve one problem, and they will make 10 other changes you didn't want. This happens to me every day. It happens to everyone I talk to as well. We have a solution for this now. The team Augment Code released a "Task List" feature for their coding assistant that solves this problem. Augment Code is partnering with me on this post. In case you haven't used them before: • Augment Code is a fully-fledged coding assistant • Their specialty are large projects • Fastest coding indexing I've seen • Has a free forever community edition Now, you can ask their coding agent to generate a Task List before doing anything. This will give you a plan you can review, edit, and augment if you need to. You can export this plan, load it on a different session, or even share it across projects. It makes a huge difference: The task list constrains the agent so you won't get any "unintended" changes anymore. It also puts you in control of everything the agent does. Check the video to see the agent working through a task list. You can also try this 100% free: (By the way, they also have support for remote agents. You can basically have those agents write your code while you are sleeping.)

Santiago

41,738 Aufrufe • vor 1 Jahr

🔎🤖LangSmith Insights Agent Really excited to launch our first in-product agent This agent lives inside LangSmith and combs through traces, giving you insights into: 🧑‍🤝‍🧑how users are using your agent ⁉️how your agent may be messing up 🛃{your custom insight here} The problem we saw was that people were launching agents... and didn't know how their users were actually using them! You put a chat box in front of people, and they may ask it anything - the surface area for agents is often super wide In addition - agents would fail silently. They could give a bad response - this wouldn't show up in error logs, but its good to know. If you know what look for, you can set up LLM as a judge evaluators. But what if you don't? (most people don't initially) The best way to figure this out - as Hamel Husain says - "look at your data". But LLMs are really good at looking at your data! So can they do it for you? This is exactly what insights agent attempts to do. It's live in LangSmith today. You can read more about it here:

🔎🤖LangSmith Insights Agent Really excited to launch our first in-product agent This agent lives inside LangSmith and combs through traces, giving you insights into: 🧑‍🤝‍🧑how users are using your agent ⁉️how your agent may be messing up 🛃{your custom insight here} The problem we saw was that people were launching agents... and didn't know how their users were actually using them! You put a chat box in front of people, and they may ask it anything - the surface area for agents is often super wide In addition - agents would fail silently. They could give a bad response - this wouldn't show up in error logs, but its good to know. If you know what look for, you can set up LLM as a judge evaluators. But what if you don't? (most people don't initially) The best way to figure this out - as Hamel Husain says - "look at your data". But LLMs are really good at looking at your data! So can they do it for you? This is exactly what insights agent attempts to do. It's live in LangSmith today. You can read more about it here:

Harrison Chase

98,520 Aufrufe • vor 9 Monaten

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

TBPN

11,445 Aufrufe • vor 1 Jahr

Introducing Slide. Your AI agent for discovering and DMing creators on TikTok and IG. Manual outreach takes 10+ hours a week. Influencer databases cost $3,000/month and grow stale. Slide ( grabs top creators in your niche and automatically DMs them from your account. In fact, we used Slide to source the creator for this video! If you retweet this post and comment “slide” I’ll DM you a code for a month for free! (must be following so we can message you)

Introducing Slide. Your AI agent for discovering and DMing creators on TikTok and IG. Manual outreach takes 10+ hours a week. Influencer databases cost $3,000/month and grow stale. Slide ( grabs top creators in your niche and automatically DMs them from your account. In fact, we used Slide to source the creator for this video! If you retweet this post and comment “slide” I’ll DM you a code for a month for free! (must be following so we can message you)

Andros

10,966 Aufrufe • vor 1 Jahr

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

Santiago

79,974 Aufrufe • vor 1 Jahr

Open source/weight models are often used in regulated industries like Health Care or Financial Services, where they handle personally identifiable data, and can't send it to proprietary LLM providers. We recently chatted to Vaibhav (VB) Srivastav about the partnership VS Visual Studio Code and Hugging Face inference providers have to let you use open weight models directly in your IDE! Something that surprised me was just how fast inference providers like Cerebras make it to generate code! In this episode we made a journaling CLI tool using Qwen! I personally love open source and hope more of these models will become small enough to run locally!

Open source/weight models are often used in regulated industries like Health Care or Financial Services, where they handle personally identifiable data, and can't send it to proprietary LLM providers. We recently chatted to Vaibhav (VB) Srivastav about the partnership VS Visual Studio Code and Hugging Face inference providers have to let you use open weight models directly in your IDE! Something that surprised me was just how fast inference providers like Cerebras make it to generate code! In this episode we made a journaling CLI tool using Qwen! I personally love open source and hope more of these models will become small enough to run locally!

Marlene Mhangami

18,215 Aufrufe • vor 8 Monaten

Today we are introducing 2 key features to JAIGP: AI Review & Open Prompting. AI review is part of 5-step process where papers get feedback from & are evaluated based on their ability to address that feedback. This means they are not stuck in an endless AI review loop. Open prompting is a newer idea. We are making all prompts we used to create JAIGP open, and also, opening up the journal's rules to the community for suggestions. You can suggest the prompt we should run next! So, if you have opinions about AI generated papers, you can share them directly with us at

Today we are introducing 2 key features to JAIGP: AI Review & Open Prompting. AI review is part of 5-step process where papers get feedback from & are evaluated based on their ability to address that feedback. This means they are not stuck in an endless AI review loop. Open prompting is a newer idea. We are making all prompts we used to create JAIGP open, and also, opening up the journal's rules to the community for suggestions. You can suggest the prompt we should run next! So, if you have opinions about AI generated papers, you can share them directly with us at

César A. Hidalgo

13,453 Aufrufe • vor 4 Monaten

This is not a drill - if you’re a VC, please look at what I just created. An agent that looks at my CRM and automatically reaches out to people I have not spoken to in over 1 year. Literally network watering on autopilot with Hyperagent. Here’s how it works: - Scans your CRM for last call date - Generates reconnect emails for each person completely personalized to the last convo we had - Automatically puts in your Gmail draft folder with their email for you to send (you can have it send without review too) - Reruns this every quarter to make sure I’m staying on top of my relationships Download the agent and use it yourself (+$200 credits): And let me know what you think!! #hyperagentpartner

This is not a drill - if you’re a VC, please look at what I just created. An agent that looks at my CRM and automatically reaches out to people I have not spoken to in over 1 year. Literally network watering on autopilot with Hyperagent. Here’s how it works: - Scans your CRM for last call date - Generates reconnect emails for each person completely personalized to the last convo we had - Automatically puts in your Gmail draft folder with their email for you to send (you can have it send without review too) - Reruns this every quarter to make sure I’m staying on top of my relationships Download the agent and use it yourself (+$200 credits): And let me know what you think!! #hyperagentpartner

Nicole DeTommaso 🪄

219,898 Aufrufe • vor 1 Tag

I created a demo of a bioautomation system that uses LLMs, Opentrons, and lua to create a dynamic programming environment for cloud labs or robot/human clusters. It can reason about its own code based off of lab measurements. Most importantly, I actually fucking implemented it, and it is open source. Took about 4 days for this rough draft, and it is very much a draft. The user inputs their task, the system creates code, and then executes it. The code defines control flow from data generated in the lab. Not only can it create code, but it can reason about things that could have gone wrong, run analysis using an internal sandbox, and then create new code based off of that analysis for execution. Timestamps: 0:00 - intro and code generation 3:04 - homebrewed replacement for Opentrons API for running all this code 5:26 - dynamic control flow using data 6:10 - LLM reasoning about a biological protocol and fixing it 10:15 - rant on the future of cloud labs and bioautomation I made this as a demo for how I think we should be thinking about building and scaling biology. I believe we can encode the tacit knowledge of a laboratory into the knowledge of an LLM, that we can do reinforcement learning off of results it creates, and that we must do that by leveraging a sufficient quantity of unique, useful, verifiable protocols. That doesn't come from just doing drug screens - it comes from doing basic everyday experiments and doing them well. Through the elimination of tacit knowledge necessary to physically operate a lab + proper batching + models writing code, I think we can make building biotechnology 10x-100x cheaper and easier than it is nowadays.

I created a demo of a bioautomation system that uses LLMs, Opentrons, and lua to create a dynamic programming environment for cloud labs or robot/human clusters. It can reason about its own code based off of lab measurements. Most importantly, I actually fucking implemented it, and it is open source. Took about 4 days for this rough draft, and it is very much a draft. The user inputs their task, the system creates code, and then executes it. The code defines control flow from data generated in the lab. Not only can it create code, but it can reason about things that could have gone wrong, run analysis using an internal sandbox, and then create new code based off of that analysis for execution. Timestamps: 0:00 - intro and code generation 3:04 - homebrewed replacement for Opentrons API for running all this code 5:26 - dynamic control flow using data 6:10 - LLM reasoning about a biological protocol and fixing it 10:15 - rant on the future of cloud labs and bioautomation I made this as a demo for how I think we should be thinking about building and scaling biology. I believe we can encode the tacit knowledge of a laboratory into the knowledge of an LLM, that we can do reinforcement learning off of results it creates, and that we must do that by leveraging a sufficient quantity of unique, useful, verifiable protocols. That doesn't come from just doing drug screens - it comes from doing basic everyday experiments and doing them well. Through the elimination of tacit knowledge necessary to physically operate a lab + proper batching + models writing code, I think we can make building biotechnology 10x-100x cheaper and easier than it is nowadays.

Keoni Gandall

21,412 Aufrufe • vor 1 Jahr

This is a very clever idea to use an LLM with your SQL data. SQL + AI has been tried before, but one of the best parts of this solution is getting the same exact OpenAI's completion API. In other words: You are now talking to an LLM that knows everything about your database and you don't even notice it! I recorded a 2-min video to show you how it works. Thanks to the mindshub team for the collaboration, listening to my feedback, and helping me understand what they built. Go to to try this yourself.

This is a very clever idea to use an LLM with your SQL data. SQL + AI has been tried before, but one of the best parts of this solution is getting the same exact OpenAI's completion API. In other words: You are now talking to an LLM that knows everything about your database and you don't even notice it! I recorded a 2-min video to show you how it works. Thanks to the mindshub team for the collaboration, listening to my feedback, and helping me understand what they built. Go to to try this yourself.

Santiago

230,496 Aufrufe • vor 1 Jahr

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

forloop

226,054 Aufrufe • vor 4 Monaten

AG-UI makes building agentic applications dramatically easier. Here's how it works. This is a model for a simple chatbot: User → LLM → Response But interactive agents that render UI, pause for approvals, and ask users for input need a much more complex model. When building these agents, a response from the LLM will include a series of state changes as the agent runs: • Agent started a task • Agent called a tool • Agent updated its state • Agent streams these tokens • Agent is waiting on a human • Agent is resuming the task The Agent-User Interaction Protocol (AG-UI) treats the LLM response as a stream of events rather than a text endpoint. In practice, here is what you get as an agent runs: 1. Lifecycle events so your UI knows where the agent is. 2. Text messages that stream tokens. 3. Tool calls so your UI can prefill a form with any required arguments. 4. State updates that keep your UI in sync with the agent. 5. Special events for human approvals, rich media, and custom needs. All of these events travel over standard transports (SSE, WebSockets, or plain HTTP) as JSON. As a result, you can build a frontend that stays in sync with the agent's progress without having to invent a custom process to make this happen. For example, building a human-in-the-loop workflow becomes an off-the-shelf component you can integrate rather than build from scratch. CopilotKit🪁 is the creator of AG-UI, and you can use it when building frontend applications pretty much anywhere: • React • Angular • Vue • React Native • Slack • Teams • Discord • WhatsApp • Telegram Here is the link for you to check it out: Thanks to the CopilotKit team for partnering with me on this post.

AG-UI makes building agentic applications dramatically easier. Here's how it works. This is a model for a simple chatbot: User → LLM → Response But interactive agents that render UI, pause for approvals, and ask users for input need a much more complex model. When building these agents, a response from the LLM will include a series of state changes as the agent runs: • Agent started a task • Agent called a tool • Agent updated its state • Agent streams these tokens • Agent is waiting on a human • Agent is resuming the task The Agent-User Interaction Protocol (AG-UI) treats the LLM response as a stream of events rather than a text endpoint. In practice, here is what you get as an agent runs: 1. Lifecycle events so your UI knows where the agent is. 2. Text messages that stream tokens. 3. Tool calls so your UI can prefill a form with any required arguments. 4. State updates that keep your UI in sync with the agent. 5. Special events for human approvals, rich media, and custom needs. All of these events travel over standard transports (SSE, WebSockets, or plain HTTP) as JSON. As a result, you can build a frontend that stays in sync with the agent's progress without having to invent a custom process to make this happen. For example, building a human-in-the-loop workflow becomes an off-the-shelf component you can integrate rather than build from scratch. CopilotKit🪁 is the creator of AG-UI, and you can use it when building frontend applications pretty much anywhere: • React • Angular • Vue • React Native • Slack • Teams • Discord • WhatsApp • Telegram Here is the link for you to check it out: Thanks to the CopilotKit team for partnering with me on this post.

Santiago

17,438 Aufrufe • vor 23 Tagen

Zuckerberg says Meta's Llama race changes in the next 12-18 months: most of the code will be written by AI - not autocomplete, but agents that run tests and improve the model loop. "We're not trying to build a general developer tool. We are trying to build a coding agent and an AI research agent that basically advances Llama research specifically." "Sometime in the next 12 to 18 months, we'll reach the point where most of the code that's going towards these efforts is written by AI." "And I don't mean autocomplete." "I'm talking more like you give it a goal, it can run tests, it can improve things, it can find issues." "It writes higher quality code than the average very good person on the team already." The hidden bottleneck is not chat UX. It is closing the loop between AI research, code, tests, and self-improvement inside the lab.

Zuckerberg says Meta's Llama race changes in the next 12-18 months: most of the code will be written by AI - not autocomplete, but agents that run tests and improve the model loop. "We're not trying to build a general developer tool. We are trying to build a coding agent and an AI research agent that basically advances Llama research specifically." "Sometime in the next 12 to 18 months, we'll reach the point where most of the code that's going towards these efforts is written by AI." "And I don't mean autocomplete." "I'm talking more like you give it a goal, it can run tests, it can improve things, it can find issues." "It writes higher quality code than the average very good person on the team already." The hidden bottleneck is not chat UX. It is closing the loop between AI research, code, tests, and self-improvement inside the lab.

Karl Mehta

36,337 Aufrufe • vor 1 Monat