Загрузка видео...

Не удалось загрузить видео

На главную

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught...

77,792 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля Nayak Satya
Nayak Satya1 год назад

Amazing , your contents and learning insights are best in AI thanks for helping genuinely to many people

Фото профиля Robert Hovden
Robert Hovden1 год назад

Atlas of Fourier Transforms - Unlock the potential of Fourier transforms with "Atlas of Fourier Transforms" - join us on Kickstarter today! *A Coffee Table Book with Math*

Фото профиля Fabrizio Milo
Fabrizio Milo1 год назад

I feel like deeplearning ai is just becoming the modern version of the TV shopping channels.

Фото профиля Data & Analytics
Data & Analytics1 год назад

@AndrewYNg, that course sounds dope, merging transformers with SSMs is something fresh. Curious about how well Jamba balances those two approaches.

Фото профиля Cliffinkent 🇬🇧
Cliffinkent 🇬🇧1 год назад

This course sounds like a great opportunity for developers interested in long-context AI! Hybrid architectures like Jamba show how innovation can balance computational efficiency with high-quality results. Curious: How might these advances impact industries relying on document-heavy workflows?

Фото профиля Dr. Komi NAGBE
Dr. Komi NAGBE1 год назад

Wahouuuu. This is my PhD topic. Vector auto-regressive State space

Фото профиля Gabriel
Gabriel1 год назад

Thank you Andrew

Фото профиля baldmas
baldmas1 год назад

The Jamba architecture seems to have a lot of potential for creative applications. Have you considered collaborating with artists or musicians to explore the possibilities of using state space models in their work?

Фото профиля Abu
Abu1 год назад

that sounds pretty neat! hybrid tech mixing things up is always fun.

Фото профиля Vincent Valentine (CEO of UnOpen.ai)
Vincent Valentine (CEO of UnOpen.ai)1 год назад

Exciting to see innovations in AI application development.

Похожие видео

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 просмотров • 11 месяцев назад

Introducing "Building with Llama 4." This short course is created with Meta AI at Meta, and taught by Amit Sangani, Director of Partner Engineering for Meta’s AI team. Meta’s new Llama 4 has added three new models and introduced the Mixture-of-Experts (MoE) architecture to its family of open-weight models, making them more efficient to serve. In this course, you’ll work with two of the three new models introduced in Llama 4. First is Maverick, a 400B parameter model, with 128 experts and 17B active parameters. Second is Scout, a 109B parameter model with 16 experts and 17B active parameters. Maverick and Scout support long context windows of up to a million tokens and 10M tokens, respectively. The latter is enough to support directly inputting even fairly large GitHub repos for analysis! In hands-on lessons, you’ll build apps using Llama 4’s new multimodal capabilities including reasoning across multiple images and image grounding, in which you can identify elements in images. You’ll also use the official Llama API, work with Llama 4’s long-context abilities, and learn about Llama’s newest open-source tools: its prompt optimization tool that automatically improves system prompts and synthetic data kit that generates high-quality datasets for fine-tuning. If you need an open model, Llama is a great option, and the Llama 4 family is an important part of any GenAI developer's toolkit. Through this course, you’ll learn to call Llama 4 via API, use its optimization tools, and build features that span text, images, and large context. Please sign up here:

Andrew Ng

67,587 просмотров • 1 год назад

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

253,812 просмотров • 1 год назад

New course: MCP: Build Rich-Context AI Apps with Anthropic. Learn to build AI apps that access tools, data, and prompts using the Model Context Protocol in this short course, created in partnership with Anthropic Anthropic and taught by Elie Schoppik Elie Schoppik, its Head of Technical Education. Connecting AI applications to external systems that bring rich context to LLM-based applications has often meant writing custom integrations for each use case. MCP is an open protocol that standardizes how LLMs access tools, data, and prompts from external sources, and simplifies how you provide context to your LLM-based applications. For example, you can provide context via third-party tools that let your LLM make API calls to search the web, access data from local docs, retrieve code from a GitHub repo, and so on. MCP, developed by Anthropic, is based on a client-server architecture that defines the communication details between an MCP client, hosted inside the AI application, and an MCP server that exposes tools, resources, and prompt templates. The server can be a subprocess launched by the client that runs locally or an independent process running remotely. In this hands-on course, you'll learn the core architecture behind MCP. You’ll create an MCP-compatible chatbot, build and deploy an MCP server, and connect the chatbot to your MCP server and other open-source servers. Here’s what you’ll do: - Understand why MCP makes AI development less fragmented and standardizes connections between AI applications and external data sources - Learn the core components of the client-server architecture of MCP and the underlying communication mechanism - Build a chatbot with custom tools for searching academic papers, and transform it into an MCP-compatible application - Build a local MCP server that exposes tools, resources, and prompt templates using FastMCP, and test it using MCP Inspector - Create an MCP client inside your chatbot to dynamically connect to your server - Connect your chatbot to reference servers built by Anthropic’s MCP team, such as filesystem, which implements filesystem operations, and fetch, which extracts contents from the web as markdown - Configure Claude Desktop to connect to your server and others, and explore how it abstracts away the low-level logic of MCP clients - Deploy your MCP server remotely and test it with the Inspector or other MCP-compatible applications - Learn about the roadmap for future MCP development, such as multi-agent architecture, MCP registry API, server discovery, authorization, and authentication MCP is an exciting and important technology that lets you build rich-context AI applications that connect to a growing ecosystem of MCP servers, with minimal integration work. Please sign up here!

Andrew Ng

141,952 просмотров • 1 год назад

New short course: Attention in Transformers: Concepts and Code in PyTorch. Last week we released a course on how LLM transformers work. This week, go deeper and learn about the technical ideas behind the attention mechanism, and see how to code it in PyTorch. This course is built with Joshua Starmer, Founder and CEO of StatQuest. The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design. In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications. What you will do: - Understand the evolution of the attention mechanism, a key breakthrough that led to transformers. - Learn the relationships between word embeddings, positional embeddings, and attention. - Learn about the Query, Key, and Value matrices, and how to produce and use them in attention. - Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work. - Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs. - Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer. - Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention. There're lots of exciting technical details in this course. Please sign up here:

Andrew Ng

132,135 просмотров • 1 год назад

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

Andrew Ng

200,729 просмотров • 1 год назад

New course to bring you up to state-of-the-art at using AI to help you code: Build Apps with Windsurf's AI Coding Agents, built in partnership with WIndsurf (Codeium) and taught by Anshul Ramachandran! AI-assisted IDEs (Integrated Development Environments) make developers’ workflows faster, more efficient, and much more fun. Agentic tools like Windsurf are more than just code autocomplete—they are collaborative coding agents that help you break down complex applications, iterate efficiently, and generate code that spans multiple files. Although a lot of coding assistants share the same underlying large language models for planning and reasoning, a major point of distinction is how they handle tools, keep track of context, and stay aligned with your intent as a developer. For instance, if you make modifications to a class definition in your code and make the same modifications to other classes in the same directory, you might tell the AI agent "Do the same thing in similar places in this directory." Here, tracking your intent means understanding that “the same thing" refers to that recent edit you just made, which must be followed by appropriate search and tool-calling to implement the changes. In this course, you'll learn the inner workings of coding agents, their strengths and limitations, and how to use Windsurf to quickly build several applications. In detail, you'll: - Build a mental model of how agents work by combining human-action tracking, tool integration, and context awareness to carry out an agentic coding workflow. - Learn the challenges of code search and discovery and how a multi-step retrieval approach helps coding agents address them. - Use Windsurf to analyze and understand a large, old codebase and update it to the latest versions of the frameworks and packages it uses. - Build a Wikipedia data analysis app that retrieves, parses, and analyzes word frequencies. - Enhance the performance of your Wikipedia analysis app by adding caching, and through this, also learn how to course-correct when the AI agent produces unexpected results. - Learn tips and tricks such as keyboard shortcuts, autocomplete, and @ mentions to quickly call on agentic capabilities. - Use image/multimodal capabilities of the AI agent to increase your development velocity; you'll see an example of uploading a mockup with sketched-out UI features, and ask the agent to use that to build new functionality to an app. By the end of this course, you’ll understand agentic coding in-depth and know how to use it to make your development process much faster, more efficient, and enjoyable. Please sign up here!

Andrew Ng

139,763 просмотров • 1 год назад

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

Andrew Ng

131,640 просмотров • 1 год назад

Build and customize complex AI applications with a flexible framework in this new short course, Building AI Applications with Haystack. Created in collaboration with deepset, makers of Haystack, and taught by Tuana, who is the developer relations lead for Haystack at deepset. Generative AI technology is changing rapidly and it can be challenging to integrate APIs from different LLMs, vector databases, and various tools such as web search. In this course, you will learn how to use the Haystack framework to make your development process more modular, allowing you to manage complexity and focus more on building your application. In detail, you’ll: - Build a RAG pipeline using Haystack’s main building blocks – components, pipelines, and document stores. - Create custom components in your pipeline by building a Hacker News summarizer that extends your app’s ability to access APIs. - Use conditional routing to create a branching pipeline with a fallback to web search mechanism when the LLM does not have the necessary context to respond to the user's query. - Build a self-reflecting agent for named entity recognition that loops using an output validator custom component. - Create a chat agent using OpenAI's function-calling capabilities which allow you to provide Haystack pipelines as tools to the LLM, enhancing that agent's capabilities. By the end of this course, you will learn a high-level orchestration framework that can help make your applications flexible, extendible, and maintainable, even as the technology stack changes, new user needs arise, and you add new features to your application. Please sign up here:

Andrew Ng

53,788 просмотров • 1 год назад

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 просмотров • 1 год назад