Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📊 Learn how to observe & evaluate agents on LangChain Academy 📊 Testing applications is essential to the development lifecycle, but LLM systems are non-deterministic – you can’t always predict how they will behave. Add multi-turn interactions and tool-calling agents, and testing agents becomes even more complex than traditional... show more

LangChain

255,442 subscribers

13,518 Aufrufe • vor 6 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

🚀 New LangChain Academy Course: Building Reliable Agents 🚀 Shipping agents to production is hard. Traditional software is deterministic – when something breaks, you check the logs and fix the code. But agents rely on non-deterministic models. Add multi-step reasoning, tool use, and real user traffic, and building reliable agents becomes far more complex than traditional system design. The goal of this course is to teach you how to take an agent from first run to production-ready system through iterative cycles of improvement. You’ll learn how to do this with LangSmith, our agent engineering platform for observing, evaluating, and deploying agents. Enroll for free ➡️

🚀 New LangChain Academy Course: Building Reliable Agents 🚀 Shipping agents to production is hard. Traditional software is deterministic – when something breaks, you check the logs and fix the code. But agents rely on non-deterministic models. Add multi-step reasoning, tool use, and real user traffic, and building reliable agents becomes far more complex than traditional system design. The goal of this course is to teach you how to take an agent from first run to production-ready system through iterative cycles of improvement. You’ll learn how to do this with LangSmith, our agent engineering platform for observing, evaluating, and deploying agents. Enroll for free ➡️

LangChain

30,744 Aufrufe • vor 4 Monaten

Our latest LangChain Academy course – Building Ambient Agents with LangGraph – is now available! Most agents today handle one request at a time through chat interfaces. But as models have improved, agents can now run in the background – and take on long-running, complex tasks. LangGraph is built for these “ambient agents,” with support for human-in-the-loop workflows and memory. LangGraph Platform provides the infrastructure to run these agents at scale, and LangSmith helps you observe, evaluate, and improve your agents. Together, they make it easier to build reliable, production-ready agents. In this course, you’ll: 📩 Learn the fundamentals of LangGraph and how to build your own email agent 📊 Evaluate your agent using LangSmith ✅ Add human-in-the-loop for reviews, and memory so your agent adapts over time 🚀 Deploy your agent with LangGraph Platform and connect it to Gmail

Our latest LangChain Academy course – Building Ambient Agents with LangGraph – is now available! Most agents today handle one request at a time through chat interfaces. But as models have improved, agents can now run in the background – and take on long-running, complex tasks. LangGraph is built for these “ambient agents,” with support for human-in-the-loop workflows and memory. LangGraph Platform provides the infrastructure to run these agents at scale, and LangSmith helps you observe, evaluate, and improve your agents. Together, they make it easier to build reliable, production-ready agents. In this course, you’ll: 📩 Learn the fundamentals of LangGraph and how to build your own email agent 📊 Evaluate your agent using LangSmith ✅ Add human-in-the-loop for reviews, and memory so your agent adapts over time 🚀 Deploy your agent with LangGraph Platform and connect it to Gmail

LangChain

33,192 Aufrufe • vor 1 Jahr

🔗 New LangChain Academy Course: Introduction to LangChain (Python) 🔗 Learn how to build with LangChain – our open source framework that makes it easy to start building agents with any model provider. In this course, you’ll create agents that can reason, use tools, and take action, and learn how to debug their behavior with LangSmith. Along the way, you’ll: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug your agent with LangSmith Observability & Studio By the end of the course, you’ll have assembled a full team of personal assistants. Enroll for free ➡️

🔗 New LangChain Academy Course: Introduction to LangChain (Python) 🔗 Learn how to build with LangChain – our open source framework that makes it easy to start building agents with any model provider. In this course, you’ll create agents that can reason, use tools, and take action, and learn how to debug their behavior with LangSmith. Along the way, you’ll: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug your agent with LangSmith Observability & Studio By the end of the course, you’ll have assembled a full team of personal assistants. Enroll for free ➡️

LangChain

41,016 Aufrufe • vor 7 Monaten

Today, we’re shipping new ways to observe, analyze, and debug agents with LangSmith: • Polly: an AI assistant for AI engineering that helps you understand traces, threads, and improve prompts • LangSmith Fetch: a CLI for pulling trace & thread data straight into your terminal or coding agent Agents are running longer and getting more complex, which demands new debugging workflows beyond simple LLM apps. We wrote a blog on the trends behind this shift— and why tools like Polly and LangSmith Fetch are needed. Shipping reliable agents requires full visibility into agent behavior, with tooling that helps you reason over that data in the UI, the terminal, or alongside coding agents. 📔Learn more about Polly: 📔Learn more about LangSmith Fetch: 📔How to observe deep agents: 📽️ Polly video tutorial: 📽️ LangSmith Fetch video tutorial:

Today, we’re shipping new ways to observe, analyze, and debug agents with LangSmith: • Polly: an AI assistant for AI engineering that helps you understand traces, threads, and improve prompts • LangSmith Fetch: a CLI for pulling trace & thread data straight into your terminal or coding agent Agents are running longer and getting more complex, which demands new debugging workflows beyond simple LLM apps. We wrote a blog on the trends behind this shift— and why tools like Polly and LangSmith Fetch are needed. Shipping reliable agents requires full visibility into agent behavior, with tooling that helps you reason over that data in the UI, the terminal, or alongside coding agents. 📔Learn more about Polly: 📔Learn more about LangSmith Fetch: 📔How to observe deep agents: 📽️ Polly video tutorial: 📽️ LangSmith Fetch video tutorial:

LangChain

31,035 Aufrufe • vor 7 Monaten

✨New LangChain Academy Course: Learn to build with LangSmith Agent Builder 🪄 Learn how to build your own agents with LangSmith Agent Builder. Anyone can now build agents for complex daily tasks, without writing code. We built Agent Builder for anyone whose day gets swallowed by routine work. The research, follow-ups, updates, scheduling, and status checks that are essential to your operations, but quickly take over your calendar. Agent Builder is different from traditional workflow automations. You don’t need to map every step, tinker with if-this-then-that branching, or babysit dependencies. You just give it feedback like you would a teammate, and the agent learns using its memory. In this quickstart course, you'll learn how to: - Build an email agent - Understand what agents can do - Improve your agent - Set up triggers and your Agent Inbox Enroll for free ➡️

✨New LangChain Academy Course: Learn to build with LangSmith Agent Builder 🪄 Learn how to build your own agents with LangSmith Agent Builder. Anyone can now build agents for complex daily tasks, without writing code. We built Agent Builder for anyone whose day gets swallowed by routine work. The research, follow-ups, updates, scheduling, and status checks that are essential to your operations, but quickly take over your calendar. Agent Builder is different from traditional workflow automations. You don’t need to map every step, tinker with if-this-then-that branching, or babysit dependencies. You just give it feedback like you would a teammate, and the agent learns using its memory. In this quickstart course, you'll learn how to: - Build an email agent - Understand what agents can do - Improve your agent - Set up triggers and your Agent Inbox Enroll for free ➡️

LangChain

26,513 Aufrufe • vor 6 Monaten

LangChain Academy is live! Our first course — Introduction to LangGraph — teaches you the in-and-outs of building a reliable AI agent. In this course, you’ll learn how to: 🛠️ Build agents with LangGraph's graph-based workflows 🔄 Use memory + human-in-the-loop for smarter, self-corrective agents 📚 Create your own AI assistant that can perform knowledge tasks Enroll now for free ➡️ Bring LangChain Academy to your company ➡️

LangChain Academy is live! Our first course — Introduction to LangGraph — teaches you the in-and-outs of building a reliable AI agent. In this course, you’ll learn how to: 🛠️ Build agents with LangGraph's graph-based workflows 🔄 Use memory + human-in-the-loop for smarter, self-corrective agents 📚 Create your own AI assistant that can perform knowledge tasks Enroll now for free ➡️ Bring LangChain Academy to your company ➡️

LangChain

105,434 Aufrufe • vor 1 Jahr

🔥 New LangChain Academy Course: LangChain Essentials (Python & TypeScript) 🔥 Learn the basics of LangChain – our open source framework that makes it easy to start building agents with any model provider. Last week, we released LangChain 1.0. We’ve completely rewritten LangChain to be opinionated, focused, and powered by LangGraph’s runtime. It includes a new `create_agent` abstraction to build agents quickly, middleware for flexibility, and standard content blocks that work across any model provider. In this quickstart course, you'll learn how to: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug and test your agent with LangSmith Observability & Evaluation Enroll for free ➡️

🔥 New LangChain Academy Course: LangChain Essentials (Python & TypeScript) 🔥 Learn the basics of LangChain – our open source framework that makes it easy to start building agents with any model provider. Last week, we released LangChain 1.0. We’ve completely rewritten LangChain to be opinionated, focused, and powered by LangGraph’s runtime. It includes a new `create_agent` abstraction to build agents quickly, middleware for flexibility, and standard content blocks that work across any model provider. In this quickstart course, you'll learn how to: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug and test your agent with LangSmith Observability & Evaluation Enroll for free ➡️

LangChain

50,475 Aufrufe • vor 9 Monaten

Insights Agent & Multi-turn Evals Agents run for a long period of time and have multiple interactions with users, tackling a wide variety of problems. Today we’re launching two new features in LangSmith, our agent engineering platform, so you can better understand your agent behavior: • Insights Agent is a new agent that automatically categorizes agent behavior patterns in LangSmith • Multi-turn Evals lets you evaluate agent trajectory across complete conversations, so you can see if the agent actually accomplished user goals Learn more:

Insights Agent & Multi-turn Evals Agents run for a long period of time and have multiple interactions with users, tackling a wide variety of problems. Today we’re launching two new features in LangSmith, our agent engineering platform, so you can better understand your agent behavior: • Insights Agent is a new agent that automatically categorizes agent behavior patterns in LangSmith • Multi-turn Evals lets you evaluate agent trajectory across complete conversations, so you can see if the agent actually accomplished user goals Learn more:

LangChain

13,322 Aufrufe • vor 9 Monaten

🚀 Announcing LangSmith Skills + CLI 🚀 Agent improvements are increasingly driven by coding agents themselves. We're releasing LangSmith Skills alongside the LangSmith CLI to make coding agents experts at the agent engineering lifecycle. LangSmith Skills enable agents to debug traces, create datasets, and run experiments - and thanks to the CLI, agents are able to do it all natively through the terminal, where they're most comfortable. Try out LangSmith Skills and the CLI with your own coding agents! ➡️ Skills: ➡️ CLI:

🚀 Announcing LangSmith Skills + CLI 🚀 Agent improvements are increasingly driven by coding agents themselves. We're releasing LangSmith Skills alongside the LangSmith CLI to make coding agents experts at the agent engineering lifecycle. LangSmith Skills enable agents to debug traces, create datasets, and run experiments - and thanks to the CLI, agents are able to do it all natively through the terminal, where they're most comfortable. Try out LangSmith Skills and the CLI with your own coding agents! ➡️ Skills: ➡️ CLI:

LangChain

49,572 Aufrufe • vor 4 Monaten

🔥 Hear how Cisco automated 60% of 1.8 million support cases with LangGraph, LangSmith, and LangGraph Platform 🔥 In his LangChain Interrupt talk, Carlos Pereira, Fellow & Chief Architect at Cisco, shares Cisco’s blueprint for transforming customer experience with AI agents. In this video, you’ll learn about: - How Cisco’s Customer Experience team identifies and prioritizes high-impact AI use cases - Their supervisor architecture that routes complex queries across specialized agents - How LangGraph, LangGraph Platform, and LangSmith power the development and continuous improvement of these agents Watch the video:

🔥 Hear how Cisco automated 60% of 1.8 million support cases with LangGraph, LangSmith, and LangGraph Platform 🔥 In his LangChain Interrupt talk, Carlos Pereira, Fellow & Chief Architect at Cisco, shares Cisco’s blueprint for transforming customer experience with AI agents. In this video, you’ll learn about: - How Cisco’s Customer Experience team identifies and prioritizes high-impact AI use cases - Their supervisor architecture that routes complex queries across specialized agents - How LangGraph, LangGraph Platform, and LangSmith power the development and continuous improvement of these agents Watch the video:

LangChain

45,405 Aufrufe • vor 1 Jahr

🔥 Our latest LangChain Academy course – Deep Research with LangGraph – is now live! 🔥 Deep research agents are taking off – from major AI labs to companies building their own. Research is inherently open-ended. You can't always predict whether a question needs broad exploration or deep analysis. Agents excel here because they adapt on the fly, using each finding to decide where to dig next. Building these systems ourselves and with customers, we've learned that structure matters. The best research agents scope problems with users first. Then, they coordinate multiple specialists instead of overwhelming one generalist. LangGraph is built for these types of long-running, multi-agent workflows. Its persistence layer tracks progress across agents. LangSmith gives you the observability and evaluation tools you need to track and improve performance. In this course, you'll build and evaluate: - A user scoping agent to define research parameters - A multi-agent research team with supervisor - And add tool integration via MCP Enroll for free ➡️

🔥 Our latest LangChain Academy course – Deep Research with LangGraph – is now live! 🔥 Deep research agents are taking off – from major AI labs to companies building their own. Research is inherently open-ended. You can't always predict whether a question needs broad exploration or deep analysis. Agents excel here because they adapt on the fly, using each finding to decide where to dig next. Building these systems ourselves and with customers, we've learned that structure matters. The best research agents scope problems with users first. Then, they coordinate multiple specialists instead of overwhelming one generalist. LangGraph is built for these types of long-running, multi-agent workflows. Its persistence layer tracks progress across agents. LangSmith gives you the observability and evaluation tools you need to track and improve performance. In this course, you'll build and evaluate: - A user scoping agent to define research parameters - A multi-agent research team with supervisor - And add tool integration via MCP Enroll for free ➡️

LangChain

61,205 Aufrufe • vor 11 Monaten

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 Aufrufe • vor 1 Jahr

🌟Our latest LangChain Academy course – Deep Agents with LangGraph – is now live!🌟 Many agents today follow the same simple pattern: run in a loop, call tools. That architecture works well enough, but it breaks down as tasks get more complex. Today, companies of all sizes – from startups to large enterprises – are building their own Deep Agents. These agents dive deeper. They’re able to plan complex tasks and carry them out over longer time horizons. There are four key features that set Deep Agents apart from regular agents: 1. Planning – keeps agents on track 2. File system – allows agents to offload context 3. Sub-agents – act as focused specialists 4. Prompting – provides agents with detailed instructions Our latest LangChain Academy course, Deep Agents with LangGraph, shows you how to combine these pieces with LangGraph to orchestrate long-running, multi-agent workflows. Big thanks to community member Dmitry Labazkin for helping us shape this course with his contributions! Enroll for free ➡️

🌟Our latest LangChain Academy course – Deep Agents with LangGraph – is now live!🌟 Many agents today follow the same simple pattern: run in a loop, call tools. That architecture works well enough, but it breaks down as tasks get more complex. Today, companies of all sizes – from startups to large enterprises – are building their own Deep Agents. These agents dive deeper. They’re able to plan complex tasks and carry them out over longer time horizons. There are four key features that set Deep Agents apart from regular agents: 1. Planning – keeps agents on track 2. File system – allows agents to offload context 3. Sub-agents – act as focused specialists 4. Prompting – provides agents with detailed instructions Our latest LangChain Academy course, Deep Agents with LangGraph, shows you how to combine these pieces with LangGraph to orchestrate long-running, multi-agent workflows. Big thanks to community member Dmitry Labazkin for helping us shape this course with his contributions! Enroll for free ➡️

LangChain

63,349 Aufrufe • vor 10 Monaten

⭐️ New LangChain Academy Course: LangGraph Essentials (Python & TypeScript)⭐️ Learn the basics of LangGraph – our low-level orchestration framework purpose-built for building AI agents. Last week, we launched LangGraph 1.0, marking its first major stable release. LangGraph allows you to control every step of your agent’s workflow. It supports memory, human-in-the-loop interactions, and durable execution for managing long-running tasks. In this quickstart course, you'll learn how to: - Create simple workflows and build an agent - Use LangGraph’s core building blocks: State, Nodes, and Edges - Add memory to your agent - Incorporate human-in-the-loop interactions Enroll for free ➡️

⭐️ New LangChain Academy Course: LangGraph Essentials (Python & TypeScript)⭐️ Learn the basics of LangGraph – our low-level orchestration framework purpose-built for building AI agents. Last week, we launched LangGraph 1.0, marking its first major stable release. LangGraph allows you to control every step of your agent’s workflow. It supports memory, human-in-the-loop interactions, and durable execution for managing long-running tasks. In this quickstart course, you'll learn how to: - Create simple workflows and build an agent - Use LangGraph’s core building blocks: State, Nodes, and Edges - Add memory to your agent - Incorporate human-in-the-loop interactions Enroll for free ➡️

LangChain

53,969 Aufrufe • vor 9 Monaten

🚀Announcing LangGraph Studio: The first agent IDE LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications With visual graphs and the ability to edit state, you can better understand agent workflows and iterate faster. LangGraph Studio integrates with LangSmith so you can collaborate with teammates to debug failure modes LangGraph Studio is available for free to all LangSmith users on any plan tier during its early development. Read more about it here: Watch a YouTube walkthrough: Try out LangGraph Studio for free here: Sign up for a LangSmith account:

🚀Announcing LangGraph Studio: The first agent IDE LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications With visual graphs and the ability to edit state, you can better understand agent workflows and iterate faster. LangGraph Studio integrates with LangSmith so you can collaborate with teammates to debug failure modes LangGraph Studio is available for free to all LangSmith users on any plan tier during its early development. Read more about it here: Watch a YouTube walkthrough: Try out LangGraph Studio for free here: Sign up for a LangSmith account:

LangChain

185,861 Aufrufe • vor 2 Jahren

🎨 Introducing Prompt Canvas — a novel UX for prompt engineering Building LLM applications requires new and dedicated tools for prompt engineering. With Prompt Canvas in LangSmith, you can: • Collaborate with an AI agent to draft, refine, and edit your prompts • Define custom quick actions to standardize prompting strategies used across the org ✍️ Learn more: ➡️ Try it out:

🎨 Introducing Prompt Canvas — a novel UX for prompt engineering Building LLM applications requires new and dedicated tools for prompt engineering. With Prompt Canvas in LangSmith, you can: • Collaborate with an AI agent to draft, refine, and edit your prompts • Define custom quick actions to standardize prompting strategies used across the org ✍️ Learn more: ➡️ Try it out:

LangChain

43,733 Aufrufe • vor 1 Jahr

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

Andrew Ng

126,450 Aufrufe • vor 1 Jahr

🚀 We just shipped a major update to LangSmith Agent Builder: • New agent chat: One always-available agent with access to all your workspace tools • Chat → Agent: Turn any conversation into a specialized agent with one click • File uploads: Attach files directly to Agent Builder • Tool registry: Add, authenticate, and manage your tools in one place Try it now: Learn more:

🚀 We just shipped a major update to LangSmith Agent Builder: • New agent chat: One always-available agent with access to all your workspace tools • Chat → Agent: Turn any conversation into a specialized agent with one click • File uploads: Attach files directly to Agent Builder • Tool registry: Add, authenticate, and manage your tools in one place Try it now: Learn more:

LangChain

24,885,978 Aufrufe • vor 5 Monaten

Sharing our latest short course: Building and Evaluating Data Agents, created in collaboration with Snowflake and taught by Anupam Datta (Anupam Datta) and Josh Reini (Josh Reini). A data agent extracts data from sources such as files or databases, analyzes it, and provides insights and visualizes its findings. But most data agents struggle with reliability or can't handle multi-step reasoning. In this course, you'll learn to build, trace, and evaluate a multi-agent workflow that plans tasks, pulls context from structured and unstructured data, performs web search, and summarizes or visualizes the final results. Learn more and enroll for free!

Sharing our latest short course: Building and Evaluating Data Agents, created in collaboration with Snowflake and taught by Anupam Datta (Anupam Datta) and Josh Reini (Josh Reini). A data agent extracts data from sources such as files or databases, analyzes it, and provides insights and visualizes its findings. But most data agents struggle with reliability or can't handle multi-step reasoning. In this course, you'll learn to build, trace, and evaluate a multi-agent workflow that plans tasks, pulls context from structured and unstructured data, performs web search, and summarizes or visualizes the final results. Learn more and enroll for free!

DeepLearning.AI

40,810 Aufrufe • vor 10 Monaten