Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Our first short course with Anthropic! Building Towards Computer Use with Anthropic. This teaches you to build an LLM-based agent that uses a computer interface by generating mouse clicks and keystrokes. Computer Use is an important, emerging capability for LLMs that will let AI agents do many more tasks... than were possible before, since it lets them interact with interfaces designed for humans to use, rather than only tools that provide explicit API access. I hope you will enjoy learning about it! This course is taught by Anthropic's Head of Curriculum, Colt_Steele. You'll learn to apply image reasoning and tool use to "use" a computer as follows: a model processes an image of the screen, analyzes it to understand what's going on, and navigates the computer via mouse clicks and keystrokes. This course goes through the key building blocks, and culminates in a demo of an AI assistant that uses a web browser to search for a research paper, downloads the PDF, and finally summarizes the paper for you. In detail, you’ll: - Learn about Anthropic's family of models, when to use which one, and make API requests to Claude - Use multi-modal prompts that combine text and image content blocks, and also work with streaming responses - Improve your prompting by using prompt templates, using XML to structure prompts, and providing examples - Implement prompt caching to reduce cost and latency - Apply tool-use to build a chatbot that can call different tools to respond to queries - See all these building blocks come together in Computer Use demo Please sign up here:show more

Andrew Ng

1,635,552 subscribers

170,305 views • 1 year ago •via X (Twitter)

Education Science & Technology News & Politics

Anya Rossi• Live Now

Private livecam show

11 Comments

daniel ostertag1 year ago

@AnthropicAI Colt works for anthropic? Anyone else get started learning web dev with his course? Probably one of the first courses I ever paid for.

Fast Company1 year ago

#AI agents: from tools to teammates. Software that can act, react, and interact with both humans and other agents will transform how businesses operate—and how they should invest in technology. This @EY_US article explains more. #ad

Vincent Valentine (CEO of UnOpen.ai)1 year ago

@AnthropicAI Exciting to see such innovative courses being offered.

Robert1 year ago

@AnthropicAI Thank you! Looking forward to this one and the many to follow!

Ethan_AI Marketer for 𝕏1 year ago

@AnthropicAI this sounds like a wild ride for LLMs, lol

Anthara Fairooz1 year ago

@AnthropicAI Exciting course! Building AI agents to interact with computer interfaces is a game-changer. Can't wait to see the demos!

Nitin Panwar1 year ago

@AnthropicAI Teaching AI to navigate human interfaces instead of waiting for APIs feels like the moment we taught it to walk instead of just teleport, exciting and transformative!

RyanRejoice1 year ago

@AnthropicAI Claude Bros unit! Let's go learn how to use Claude even better.

Navigate1 year ago

@AnthropicAI Fascinating! 🤔

Cheeku Tripathi1 year ago

@AnthropicAI ability to interact with human interfaces via mouse clicks and keystrokes opens up new possibilities

icecreambar1 year ago

@AnthropicAI Hi Andrew. I took your ML class in 2013 on Coursera and wanted to give thanks. Does this course cover Anthropic’s MCP?

Related Videos

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

Andrew Ng

297,053 views • 2 years ago

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

Andrew Ng

357,401 views • 1 year ago

OpenAI just released access to Computer Use Agent via an API. It combines GPT-4o vision with advanced reasoning for an AI Agent to simulate controlling computer interfaces and performing tasks just like humans. You can try it for free using this Agent Playground.

OpenAI just released access to Computer Use Agent via an API. It combines GPT-4o vision with advanced reasoning for an AI Agent to simulate controlling computer interfaces and performing tasks just like humans. You can try it for free using this Agent Playground.

Shubham Saboo

64,206 views • 1 year ago

🔗 New LangChain Academy Course: Introduction to LangChain (Python) 🔗 Learn how to build with LangChain – our open source framework that makes it easy to start building agents with any model provider. In this course, you’ll create agents that can reason, use tools, and take action, and learn how to debug their behavior with LangSmith. Along the way, you’ll: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug your agent with LangSmith Observability & Studio By the end of the course, you’ll have assembled a full team of personal assistants. Enroll for free ➡️

🔗 New LangChain Academy Course: Introduction to LangChain (Python) 🔗 Learn how to build with LangChain – our open source framework that makes it easy to start building agents with any model provider. In this course, you’ll create agents that can reason, use tools, and take action, and learn how to debug their behavior with LangSmith. Along the way, you’ll: - Build an agent with the `create_agent` abstraction - Use LangChain’s core building blocks: Models, Messages, Memory, and Tools - Customize your agent with middleware - Debug your agent with LangSmith Observability & Studio By the end of the course, you’ll have assembled a full team of personal assistants. Enroll for free ➡️

LangChain

41,016 views • 6 months ago

Imagine a computer where you don’t need to learn 10 apps to get work done. You just tell it what you want, and it adapts to how you work. I tested Happycapy with a real use case and created an image and a video. No coding, no complex software. Do this: - Build automations that run on schedule - Deploy agent teams that work for you AI molds how you work, not the other way around. If you can use a computer, you can make anything happen. Start your first Automation and Agent teams today.

Imagine a computer where you don’t need to learn 10 apps to get work done. You just tell it what you want, and it adapts to how you work. I tested Happycapy with a real use case and created an image and a video. No coding, no complex software. Do this: - Build automations that run on schedule - Deploy agent teams that work for you AI molds how you work, not the other way around. If you can use a computer, you can make anything happen. Start your first Automation and Agent teams today.

Aaliya

17,042 views • 4 months ago

Anthropic just announced Computer Use It allows Claude to control your computer screen based on a prompt and take actions on your behalf The use cases in agentic coding with automated debugging, customer support, and education are going to be INSANE

Anthropic just announced Computer Use It allows Claude to control your computer screen based on a prompt and take actions on your behalf The use cases in agentic coding with automated debugging, customer support, and education are going to be INSANE

Rowan Cheung

1,024,281 views • 1 year ago

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

Andrew Ng

131,640 views • 1 year ago

WOW! I'm so excited about this. OpenAI Developers said Codex was good at Computer Use, but I wasn't prepared for this. For the last two weeks I've been working on a Computer Use skill to work with Linux. And while I had some success, it was a pretty frustrating experience. That is...until the breakthrough. Using accessibility tools, Codex can now control my entire computer, not just the browser. There are limits to this, of course, but what a time to be alive. This Computer Use skill will unlock and entirely new set of automations, all powered by Codex. Demonstration below 👇

WOW! I'm so excited about this. OpenAI Developers said Codex was good at Computer Use, but I wasn't prepared for this. For the last two weeks I've been working on a Computer Use skill to work with Linux. And while I had some success, it was a pretty frustrating experience. That is...until the breakthrough. Using accessibility tools, Codex can now control my entire computer, not just the browser. There are limits to this, of course, but what a time to be alive. This Computer Use skill will unlock and entirely new set of automations, all powered by Codex. Demonstration below 👇

am.will

79,589 views • 3 months ago

Build and customize complex AI applications with a flexible framework in this new short course, Building AI Applications with Haystack. Created in collaboration with deepset, makers of Haystack, and taught by Tuana, who is the developer relations lead for Haystack at deepset. Generative AI technology is changing rapidly and it can be challenging to integrate APIs from different LLMs, vector databases, and various tools such as web search. In this course, you will learn how to use the Haystack framework to make your development process more modular, allowing you to manage complexity and focus more on building your application. In detail, you’ll: - Build a RAG pipeline using Haystack’s main building blocks – components, pipelines, and document stores. - Create custom components in your pipeline by building a Hacker News summarizer that extends your app’s ability to access APIs. - Use conditional routing to create a branching pipeline with a fallback to web search mechanism when the LLM does not have the necessary context to respond to the user's query. - Build a self-reflecting agent for named entity recognition that loops using an output validator custom component. - Create a chat agent using OpenAI's function-calling capabilities which allow you to provide Haystack pipelines as tools to the LLM, enhancing that agent's capabilities. By the end of this course, you will learn a high-level orchestration framework that can help make your applications flexible, extendible, and maintainable, even as the technology stack changes, new user needs arise, and you add new features to your application. Please sign up here:

Build and customize complex AI applications with a flexible framework in this new short course, Building AI Applications with Haystack. Created in collaboration with deepset, makers of Haystack, and taught by Tuana, who is the developer relations lead for Haystack at deepset. Generative AI technology is changing rapidly and it can be challenging to integrate APIs from different LLMs, vector databases, and various tools such as web search. In this course, you will learn how to use the Haystack framework to make your development process more modular, allowing you to manage complexity and focus more on building your application. In detail, you’ll: - Build a RAG pipeline using Haystack’s main building blocks – components, pipelines, and document stores. - Create custom components in your pipeline by building a Hacker News summarizer that extends your app’s ability to access APIs. - Use conditional routing to create a branching pipeline with a fallback to web search mechanism when the LLM does not have the necessary context to respond to the user's query. - Build a self-reflecting agent for named entity recognition that loops using an output validator custom component. - Create a chat agent using OpenAI's function-calling capabilities which allow you to provide Haystack pipelines as tools to the LLM, enhancing that agent's capabilities. By the end of this course, you will learn a high-level orchestration framework that can help make your applications flexible, extendible, and maintainable, even as the technology stack changes, new user needs arise, and you add new features to your application. Please sign up here:

Andrew Ng

53,785 views • 1 year ago

New course: MCP: Build Rich-Context AI Apps with Anthropic. Learn to build AI apps that access tools, data, and prompts using the Model Context Protocol in this short course, created in partnership with Anthropic Anthropic and taught by Elie Schoppik Elie Schoppik, its Head of Technical Education. Connecting AI applications to external systems that bring rich context to LLM-based applications has often meant writing custom integrations for each use case. MCP is an open protocol that standardizes how LLMs access tools, data, and prompts from external sources, and simplifies how you provide context to your LLM-based applications. For example, you can provide context via third-party tools that let your LLM make API calls to search the web, access data from local docs, retrieve code from a GitHub repo, and so on. MCP, developed by Anthropic, is based on a client-server architecture that defines the communication details between an MCP client, hosted inside the AI application, and an MCP server that exposes tools, resources, and prompt templates. The server can be a subprocess launched by the client that runs locally or an independent process running remotely. In this hands-on course, you'll learn the core architecture behind MCP. You’ll create an MCP-compatible chatbot, build and deploy an MCP server, and connect the chatbot to your MCP server and other open-source servers. Here’s what you’ll do: - Understand why MCP makes AI development less fragmented and standardizes connections between AI applications and external data sources - Learn the core components of the client-server architecture of MCP and the underlying communication mechanism - Build a chatbot with custom tools for searching academic papers, and transform it into an MCP-compatible application - Build a local MCP server that exposes tools, resources, and prompt templates using FastMCP, and test it using MCP Inspector - Create an MCP client inside your chatbot to dynamically connect to your server - Connect your chatbot to reference servers built by Anthropic’s MCP team, such as filesystem, which implements filesystem operations, and fetch, which extracts contents from the web as markdown - Configure Claude Desktop to connect to your server and others, and explore how it abstracts away the low-level logic of MCP clients - Deploy your MCP server remotely and test it with the Inspector or other MCP-compatible applications - Learn about the roadmap for future MCP development, such as multi-agent architecture, MCP registry API, server discovery, authorization, and authentication MCP is an exciting and important technology that lets you build rich-context AI applications that connect to a growing ecosystem of MCP servers, with minimal integration work. Please sign up here!

New course: MCP: Build Rich-Context AI Apps with Anthropic. Learn to build AI apps that access tools, data, and prompts using the Model Context Protocol in this short course, created in partnership with Anthropic Anthropic and taught by Elie Schoppik Elie Schoppik, its Head of Technical Education. Connecting AI applications to external systems that bring rich context to LLM-based applications has often meant writing custom integrations for each use case. MCP is an open protocol that standardizes how LLMs access tools, data, and prompts from external sources, and simplifies how you provide context to your LLM-based applications. For example, you can provide context via third-party tools that let your LLM make API calls to search the web, access data from local docs, retrieve code from a GitHub repo, and so on. MCP, developed by Anthropic, is based on a client-server architecture that defines the communication details between an MCP client, hosted inside the AI application, and an MCP server that exposes tools, resources, and prompt templates. The server can be a subprocess launched by the client that runs locally or an independent process running remotely. In this hands-on course, you'll learn the core architecture behind MCP. You’ll create an MCP-compatible chatbot, build and deploy an MCP server, and connect the chatbot to your MCP server and other open-source servers. Here’s what you’ll do: - Understand why MCP makes AI development less fragmented and standardizes connections between AI applications and external data sources - Learn the core components of the client-server architecture of MCP and the underlying communication mechanism - Build a chatbot with custom tools for searching academic papers, and transform it into an MCP-compatible application - Build a local MCP server that exposes tools, resources, and prompt templates using FastMCP, and test it using MCP Inspector - Create an MCP client inside your chatbot to dynamically connect to your server - Connect your chatbot to reference servers built by Anthropic’s MCP team, such as filesystem, which implements filesystem operations, and fetch, which extracts contents from the web as markdown - Configure Claude Desktop to connect to your server and others, and explore how it abstracts away the low-level logic of MCP clients - Deploy your MCP server remotely and test it with the Inspector or other MCP-compatible applications - Learn about the roadmap for future MCP development, such as multi-agent architecture, MCP registry API, server discovery, authorization, and authentication MCP is an exciting and important technology that lets you build rich-context AI applications that connect to a growing ecosystem of MCP servers, with minimal integration work. Please sign up here!

Andrew Ng

141,952 views • 1 year ago

Today we're excited to launch Action: a Claude computer use launcher for macOS. Action is a macOS launcher that can take actions (click, type, and more) on your Mac using Claude’s computer use API. The interface is a floating window triggered by a keyboard shortcut, similar to Spotlight. This lets you see what the model outputs as it performs actions.

Today we're excited to launch Action: a Claude computer use launcher for macOS. Action is a macOS launcher that can take actions (click, type, and more) on your Mac using Claude’s computer use API. The interface is a floating window triggered by a keyboard shortcut, similar to Spotlight. This lets you see what the model outputs as it performs actions.

Lawrence Chen

21,622 views • 1 year ago

New short course: Build AI Apps with MCP Servers: Working with Box Files, built with Box and taught by Ben Kus , their CTO. Many AI applications require custom code for basic file operations. The Model Context Protocol (MCP) standardizes this by letting you offload file tasks to dedicated servers that provide tools an LLM can use directly. In this course, you'll process documents stored in a Box folder using the Box MCP server. Rather than writing custom integration code to connect to the Box API and download files, you'll design your application to use the tools provided via MCP. Skills you'll gain: - Build an LLM-powered document processing app, using the Box MCP server to access files - Design a multi-agent system using Google's Agent Development Kit (ADK), consisting of specialized agents for file operations - Coordinate the multi-agent workflow through an orchestrator that uses the Agent2Agent (A2A) protocol to connect to the agents You'll start with a local file-processing app, refactor it to work with Box's MCP server, then evolve it into a multi-agent system. Sign up here:

New short course: Build AI Apps with MCP Servers: Working with Box Files, built with Box and taught by Ben Kus , their CTO. Many AI applications require custom code for basic file operations. The Model Context Protocol (MCP) standardizes this by letting you offload file tasks to dedicated servers that provide tools an LLM can use directly. In this course, you'll process documents stored in a Box folder using the Box MCP server. Rather than writing custom integration code to connect to the Box API and download files, you'll design your application to use the tools provided via MCP. Skills you'll gain: - Build an LLM-powered document processing app, using the Box MCP server to access files - Design a multi-agent system using Google's Agent Development Kit (ADK), consisting of specialized agents for file operations - Coordinate the multi-agent workflow through an orchestrator that uses the Agent2Agent (A2A) protocol to connect to the agents You'll start with a local file-processing app, refactor it to work with Box's MCP server, then evolve it into a multi-agent system. Sign up here:

Andrew Ng

81,563 views • 9 months ago

New short course: Open Source Models with Hugging Face 🤗, taught by Maria Khalusova, Marc Sun, and Younes Belkada! Hugging Face has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models. You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces. You can sign up here:

New short course: Open Source Models with Hugging Face 🤗, taught by Maria Khalusova, Marc Sun, and Younes Belkada! Hugging Face has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models. You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces. You can sign up here:

Andrew Ng

224,520 views • 2 years ago

New course to bring you up to state-of-the-art at using AI to help you code: Build Apps with Windsurf's AI Coding Agents, built in partnership with WIndsurf (Codeium) and taught by Anshul Ramachandran! AI-assisted IDEs (Integrated Development Environments) make developers’ workflows faster, more efficient, and much more fun. Agentic tools like Windsurf are more than just code autocomplete—they are collaborative coding agents that help you break down complex applications, iterate efficiently, and generate code that spans multiple files. Although a lot of coding assistants share the same underlying large language models for planning and reasoning, a major point of distinction is how they handle tools, keep track of context, and stay aligned with your intent as a developer. For instance, if you make modifications to a class definition in your code and make the same modifications to other classes in the same directory, you might tell the AI agent "Do the same thing in similar places in this directory." Here, tracking your intent means understanding that “the same thing" refers to that recent edit you just made, which must be followed by appropriate search and tool-calling to implement the changes. In this course, you'll learn the inner workings of coding agents, their strengths and limitations, and how to use Windsurf to quickly build several applications. In detail, you'll: - Build a mental model of how agents work by combining human-action tracking, tool integration, and context awareness to carry out an agentic coding workflow. - Learn the challenges of code search and discovery and how a multi-step retrieval approach helps coding agents address them. - Use Windsurf to analyze and understand a large, old codebase and update it to the latest versions of the frameworks and packages it uses. - Build a Wikipedia data analysis app that retrieves, parses, and analyzes word frequencies. - Enhance the performance of your Wikipedia analysis app by adding caching, and through this, also learn how to course-correct when the AI agent produces unexpected results. - Learn tips and tricks such as keyboard shortcuts, autocomplete, and @ mentions to quickly call on agentic capabilities. - Use image/multimodal capabilities of the AI agent to increase your development velocity; you'll see an example of uploading a mockup with sketched-out UI features, and ask the agent to use that to build new functionality to an app. By the end of this course, you’ll understand agentic coding in-depth and know how to use it to make your development process much faster, more efficient, and enjoyable. Please sign up here!

New course to bring you up to state-of-the-art at using AI to help you code: Build Apps with Windsurf's AI Coding Agents, built in partnership with WIndsurf (Codeium) and taught by Anshul Ramachandran! AI-assisted IDEs (Integrated Development Environments) make developers’ workflows faster, more efficient, and much more fun. Agentic tools like Windsurf are more than just code autocomplete—they are collaborative coding agents that help you break down complex applications, iterate efficiently, and generate code that spans multiple files. Although a lot of coding assistants share the same underlying large language models for planning and reasoning, a major point of distinction is how they handle tools, keep track of context, and stay aligned with your intent as a developer. For instance, if you make modifications to a class definition in your code and make the same modifications to other classes in the same directory, you might tell the AI agent "Do the same thing in similar places in this directory." Here, tracking your intent means understanding that “the same thing" refers to that recent edit you just made, which must be followed by appropriate search and tool-calling to implement the changes. In this course, you'll learn the inner workings of coding agents, their strengths and limitations, and how to use Windsurf to quickly build several applications. In detail, you'll: - Build a mental model of how agents work by combining human-action tracking, tool integration, and context awareness to carry out an agentic coding workflow. - Learn the challenges of code search and discovery and how a multi-step retrieval approach helps coding agents address them. - Use Windsurf to analyze and understand a large, old codebase and update it to the latest versions of the frameworks and packages it uses. - Build a Wikipedia data analysis app that retrieves, parses, and analyzes word frequencies. - Enhance the performance of your Wikipedia analysis app by adding caching, and through this, also learn how to course-correct when the AI agent produces unexpected results. - Learn tips and tricks such as keyboard shortcuts, autocomplete, and @ mentions to quickly call on agentic capabilities. - Use image/multimodal capabilities of the AI agent to increase your development velocity; you'll see an example of uploading a mockup with sketched-out UI features, and ask the agent to use that to build new functionality to an app. By the end of this course, you’ll understand agentic coding in-depth and know how to use it to make your development process much faster, more efficient, and enjoyable. Please sign up here!

Andrew Ng

139,763 views • 1 year ago

"Introducing Multimodal Llama 3.2": As promised two weeks ago, here's the short course on Meta's latest open model! This short course is created with Meta and taught by Amit Sangani, Director of AI Partner Engineering at Meta. Meta’s Llama family of models is leading the way in open models, allowing anyone to download, customize, fine-tune, or build new applications on top of them. Learn about the vision capabilities of the Llama 3.2, and use it for image classification, prompting, tokenization, tool-calling. You'll also learn about the open-source Llama stack, which gives building blocks for many different stages of the LLM application life cycle. In detail, you’ll: - Learn what are the features of Meta's four newest models, and when to use which Llama model. - Learn best practices for multimodal prompting, with applications to advanced image reasoning, illustrated by many examples: Understanding errors on a car dashboard, adding up the total of photographed restaurant receipts, grading written math homework. - Use different roles—system, user, assistant, ipython—in the Llama 3.1 and 3.2 models and the prompt format that identifies those roles. - Understand how Llama uses the tiktoken tokenizer, and how it has expanded to a 128k vocabulary size that improves encoding efficiency and multilingual support. - Learn how to prompt Llama to call built-in and custom tools (functions) with examples for web search and solving math equations. - Learn about Llama Stack, a standardized interface for common toolchain components like fine-tuning or synthetic data generation, useful for building agentic applications. By the end of this course, you’ll be equipped to build out new applications with the new Llama 3.2. Thank you to Ahmad Al-Dahle, Amit Sangani, and the whole AI at Meta team AI at Meta for all the hard work on Llama 3.2 — we’re excited to make these open models even more accessible to more developers with this new course! Please sign up here!

"Introducing Multimodal Llama 3.2": As promised two weeks ago, here's the short course on Meta's latest open model! This short course is created with Meta and taught by Amit Sangani, Director of AI Partner Engineering at Meta. Meta’s Llama family of models is leading the way in open models, allowing anyone to download, customize, fine-tune, or build new applications on top of them. Learn about the vision capabilities of the Llama 3.2, and use it for image classification, prompting, tokenization, tool-calling. You'll also learn about the open-source Llama stack, which gives building blocks for many different stages of the LLM application life cycle. In detail, you’ll: - Learn what are the features of Meta's four newest models, and when to use which Llama model. - Learn best practices for multimodal prompting, with applications to advanced image reasoning, illustrated by many examples: Understanding errors on a car dashboard, adding up the total of photographed restaurant receipts, grading written math homework. - Use different roles—system, user, assistant, ipython—in the Llama 3.1 and 3.2 models and the prompt format that identifies those roles. - Understand how Llama uses the tiktoken tokenizer, and how it has expanded to a 128k vocabulary size that improves encoding efficiency and multilingual support. - Learn how to prompt Llama to call built-in and custom tools (functions) with examples for web search and solving math equations. - Learn about Llama Stack, a standardized interface for common toolchain components like fine-tuning or synthetic data generation, useful for building agentic applications. By the end of this course, you’ll be equipped to build out new applications with the new Llama 3.2. Thank you to Ahmad Al-Dahle, Amit Sangani, and the whole AI at Meta team AI at Meta for all the hard work on Llama 3.2 — we’re excited to make these open models even more accessible to more developers with this new course! Please sign up here!

Andrew Ng

131,606 views • 1 year ago

8/ Computer Use Are A Sadness - 🎖️Engagement Award Exploiting computer use agents using LLM security vulnerabilities for fun and profit Hijack agents that try to access your computer

8/ Computer Use Are A Sadness - 🎖️Engagement Award Exploiting computer use agents using LLM security vulnerabilities for fun and profit Hijack agents that try to access your computer

Alex Reibman 🖇️

25,040 views • 1 year ago

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

Andrew Ng

200,729 views • 1 year ago

New short course: Prompt Engineering with Llama 2, built in collaboration with Meta AI at Meta, and taught by Amit Sangani! Meta's Llama 2 has been game-changing for AI. Building with open source lets you control your own data, scrutinize errors, update (or not) the models as you please, and work alongside the global community advancing open models. Llama isn't a single model, it's a collection of models. In this course, you'll: - Learn the differences between different Llama 2 flavors, and when to use each. - Prompt the Llama chat models -- you'll also see how Llama's instruction tags work -- so they can help you with day-to-day tasks, like writing or summarization. - Use advanced prompting, like few-shot prompting for classification, and chain-of-thought prompting for solving logic problems. - Use specialized models in the Llama collection for specific tasks, like Code Llama to help you write, analyze, and improve code, and Llama Guard, which checks prompts and model responses for harmful content. The course also touches on how to run Llama 2 locally on your own computer. I hope you’ll take this course and try out these powerful, open models!

New short course: Prompt Engineering with Llama 2, built in collaboration with Meta AI at Meta, and taught by Amit Sangani! Meta's Llama 2 has been game-changing for AI. Building with open source lets you control your own data, scrutinize errors, update (or not) the models as you please, and work alongside the global community advancing open models. Llama isn't a single model, it's a collection of models. In this course, you'll: - Learn the differences between different Llama 2 flavors, and when to use each. - Prompt the Llama chat models -- you'll also see how Llama's instruction tags work -- so they can help you with day-to-day tasks, like writing or summarization. - Use advanced prompting, like few-shot prompting for classification, and chain-of-thought prompting for solving logic problems. - Use specialized models in the Llama collection for specific tasks, like Code Llama to help you write, analyze, and improve code, and Llama Guard, which checks prompts and model responses for harmful content. The course also touches on how to run Llama 2 locally on your own computer. I hope you’ll take this course and try out these powerful, open models!

Andrew Ng

162,798 views • 2 years ago