Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this... context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!show more

Andrew Ng

1,695,918 subscribers

200,788 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

10 Yorum

Rémi 📎 profil fotoğrafı

Rémi 📎1 yıl önce

@Letta_AI @charlespacker @sarahwooders Would be very happy to give a course on structured output for LLMs @AndrewYNg @dottxtai

Charles Packer profil fotoğrafı

Charles Packer1 yıl önce

@Letta_AI @sarahwooders @Letta_AI is open source and free to use at our github 👉

Sarah Wooders 👾 profil fotoğrafı

Sarah Wooders 👾1 yıl önce

@Letta_AI @charlespacker Was great working together and fantastic summary on the importance of memory for agents :) Really excited for this course to finally be released!

Tim Urista profil fotoğrafı

Tim Urista1 yıl önce

@Letta_AI @charlespacker @sarahwooders Multi-agent collaboration unlocks new potential for shared learning and efficiency.

Franck SN profil fotoğrafı

Franck SN1 yıl önce

@Letta_AI @charlespacker @sarahwooders That's why we need to build large reasoning models and drop o1

Vasek Mlejnsky profil fotoğrafı

Vasek Mlejnsky1 yıl önce

@Letta_AI @charlespacker @sarahwooders Whoa, nicely done @charlespacker @sarahwooders !

Omariba Collins profil fotoğrafı

Omariba Collins1 yıl önce

@Letta_AI @charlespacker @sarahwooders Did you by any chance get this idea from @karpathy ?

Sir Mr Meow Meow profil fotoğrafı

Sir Mr Meow Meow1 yıl önce

@Letta_AI @charlespacker @sarahwooders ooh so it's like a summary type memory where the key-values are stored and attention is applied to compress it. :3 interesting.

Ali Sheheryar profil fotoğrafı

Ali Sheheryar1 yıl önce

@Letta_AI @charlespacker @sarahwooders From LSTMs to Transformers. We have indeed come a long way!

Tim Urista profil fotoğrafı

Tim Urista1 yıl önce

@Letta_AI @charlespacker @sarahwooders An LLM agent deciding what enters the input context is reminiscent of MemGPT's strategic context window use.

Benzer Videolar

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

Andrew Ng

131,850 görüntüleme • 1 yıl önce

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

Andrew Ng

160,148 görüntüleme • 4 ay önce

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

Andrew Ng

152,597 görüntüleme • 2 yıl önce

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

Andrew Ng

297,131 görüntüleme • 2 yıl önce

"What happens when we can access each other's AI memory in enterprises?" scott belsky, investor/founder of Behance: "If I was working as an intern for years, and the context window is accessible to the enterprise, can the enterprise (keep prompting) and asking me what I'd think or do, after I leave?" "Every company right now is trying to build this AI memory for each of its products and users. A lot of them are building connectors to form a collective memory." "In the social/consumer world, imagine your girlfriend saying she wants access to your context window or memory." "There are all kinds of weird questions and implications that arise from this."

"What happens when we can access each other's AI memory in enterprises?" scott belsky, investor/founder of Behance: "If I was working as an intern for years, and the context window is accessible to the enterprise, can the enterprise (keep prompting) and asking me what I'd think or do, after I leave?" "Every company right now is trying to build this AI memory for each of its products and users. A lot of them are building connectors to form a collective memory." "In the social/consumer world, imagine your girlfriend saying she wants access to your context window or memory." "There are all kinds of weird questions and implications that arise from this."

TBPN

44,908 görüntüleme • 1 yıl önce

Everyone wants agent swarms. Very few people are talking seriously enough about the context layer that makes swarms useful. Even with one agent, context is fragile. Too little context and the agent guesses. Too much context and it wastes tokens, loses focus, or reasons over irrelevant noise. The sweet spot is precise context: the right knowledge, in the right structure, at the right moment. With many agents, that challenge explodes. Each agent produces decisions, assumptions, findings, summaries, risks, and partial conclusions. Unless that knowledge becomes shared, structured, and reusable, every new agent is forced to rediscover what another agent already learned. That is not a swarm. That is a crowd. Shared context graphs are what turn agent activity into agent collaboration, and OriginTrail DKG V10 brings them to life. Was just playing with some final polishing for the V10 release, and it is really powerful to see shared context graphs where multiple agents contribute knowledge into the same connected memory, with attribution visible directly in the graph ui. That matters for three reasons. First, agents can access and build on one shared memory instead of staying trapped in isolated sessions. Second, the graph structure helps them retrieve the exact context they need, instead of stuffing everything into a prompt and hoping the model sorts it out. Third, verifiability of provenance. You can see which agent contributed each piece of knowledge, trace the source, and decide what to trust. Tokenmaxxing starts with fewer tokens, but the deeper story is coordination - agents stop reloading the world and start building on shared, verifiable context. That is the foundation for serious multi-agent work across software engineering, research, finance, operations, project management, and far beyond. The future is not more agents, it is agents working from shared, verifiable context. But the more the merrier, of course.

Everyone wants agent swarms. Very few people are talking seriously enough about the context layer that makes swarms useful. Even with one agent, context is fragile. Too little context and the agent guesses. Too much context and it wastes tokens, loses focus, or reasons over irrelevant noise. The sweet spot is precise context: the right knowledge, in the right structure, at the right moment. With many agents, that challenge explodes. Each agent produces decisions, assumptions, findings, summaries, risks, and partial conclusions. Unless that knowledge becomes shared, structured, and reusable, every new agent is forced to rediscover what another agent already learned. That is not a swarm. That is a crowd. Shared context graphs are what turn agent activity into agent collaboration, and OriginTrail DKG V10 brings them to life. Was just playing with some final polishing for the V10 release, and it is really powerful to see shared context graphs where multiple agents contribute knowledge into the same connected memory, with attribution visible directly in the graph ui. That matters for three reasons. First, agents can access and build on one shared memory instead of staying trapped in isolated sessions. Second, the graph structure helps them retrieve the exact context they need, instead of stuffing everything into a prompt and hoping the model sorts it out. Third, verifiability of provenance. You can see which agent contributed each piece of knowledge, trace the source, and decide what to trust. Tokenmaxxing starts with fewer tokens, but the deeper story is coordination - agents stop reloading the world and start building on shared, verifiable context. That is the foundation for serious multi-agent work across software engineering, research, finance, operations, project management, and far beyond. The future is not more agents, it is agents working from shared, verifiable context. But the more the merrier, of course.

Jurij Skornik

11,070 görüntüleme • 2 ay önce

Context memory essentially unlocks Agentic AI Much needed for Opus 4.6's "multi-agent swarms" In this SemiDoped pod, Vikram Sekar talks to Val Bercovici from Weka about context storage. - How token warehouses save inference costs - A new networking tier? Context Storage Network! - High Bandwidth Flash for context? - Weka's Augmented Memory Grid for context storage - Where this is all headed The convo is info packed. Don't miss out on it! b/acc, context platform engineer Chapters (00:00) Introduction to Weka and AI Storage Solutions (05:18) The Evolution of Context Memory in AI (09:30) Understanding Memory Hierarchies and Their Impact (16:24) Latency Challenges in Modern Storage Solutions (21:32) The Role of Networking in AI Storage Efficiency (29:42) Dynamic Resource Utilization in AI Networks (30:04) Introducing the Context Memory Network (31:13) High Bandwidth Flash: A Game Changer (32:54) Weka’s Neural Mesh and Storage Solutions (35:01) Axon: Transforming GPU Storage into Memory (39:00) Augmented Memory Grid Explained (42:00) Pooling DRAM and CXL Innovations (46:02) Token Warehouses and Inference Economics (52:10) The Future of Storage Innovations

Context memory essentially unlocks Agentic AI Much needed for Opus 4.6's "multi-agent swarms" In this SemiDoped pod, Vikram Sekar talks to Val Bercovici from Weka about context storage. - How token warehouses save inference costs - A new networking tier? Context Storage Network! - High Bandwidth Flash for context? - Weka's Augmented Memory Grid for context storage - Where this is all headed The convo is info packed. Don't miss out on it! b/acc, context platform engineer Chapters (00:00) Introduction to Weka and AI Storage Solutions (05:18) The Evolution of Context Memory in AI (09:30) Understanding Memory Hierarchies and Their Impact (16:24) Latency Challenges in Modern Storage Solutions (21:32) The Role of Networking in AI Storage Efficiency (29:42) Dynamic Resource Utilization in AI Networks (30:04) Introducing the Context Memory Network (31:13) High Bandwidth Flash: A Game Changer (32:54) Weka’s Neural Mesh and Storage Solutions (35:01) Axon: Transforming GPU Storage into Memory (39:00) Augmented Memory Grid Explained (42:00) Pooling DRAM and CXL Innovations (46:02) Token Warehouses and Inference Economics (52:10) The Future of Storage Innovations

Semi Doped

12,796 görüntüleme • 5 ay önce

CONTEXT ENGINEERING > PROMPT ENGINEERING Everyone is obsessed with writing better prompts. The next generation of AI builders is focused on context engineering instead. • Prompt engineering shapes the question. Context engineering shapes everything the AI sees. • Great prompts can't save an agent missing critical context, memory, or tools. • AI failures often come from missing information, not weak models. • Context engineering combines prompts, memory, RAG, state management, and tool access into one system. What powers effective AI agents? → Memory: Remembers preferences, past interactions, and ongoing tasks. → State Management: Tracks progress across multi-step workflows. → RAG: Retrieves only the most relevant information when needed. → Tools: Connects AI to APIs, databases, code execution, and real-world actions. → Dynamic Prompts: Enriches instructions with live context at runtime. The key insight: Prompt Engineering = Better Questions Context Engineering = Better Systems The future of AI isn't building smarter prompts. It's building smarter environments for AI to think, remember, retrieve, and act.

CONTEXT ENGINEERING > PROMPT ENGINEERING Everyone is obsessed with writing better prompts. The next generation of AI builders is focused on context engineering instead. • Prompt engineering shapes the question. Context engineering shapes everything the AI sees. • Great prompts can't save an agent missing critical context, memory, or tools. • AI failures often come from missing information, not weak models. • Context engineering combines prompts, memory, RAG, state management, and tool access into one system. What powers effective AI agents? → Memory: Remembers preferences, past interactions, and ongoing tasks. → State Management: Tracks progress across multi-step workflows. → RAG: Retrieves only the most relevant information when needed. → Tools: Connects AI to APIs, databases, code execution, and real-world actions. → Dynamic Prompts: Enriches instructions with live context at runtime. The key insight: Prompt Engineering = Better Questions Context Engineering = Better Systems The future of AI isn't building smarter prompts. It's building smarter environments for AI to think, remember, retrieve, and act.

Dami-Defi

10,127 görüntüleme • 1 ay önce

New Course: ACP: Agent Communication Protocol Learn to build agents that communicate and collaborate across different frameworks using ACP in this short course built with IBM Research's BeeAI, and taught by Sandi Besen, AI Research Engineer & Ecosystem Lead at IBM, and Nicholas Renotte, Head of AI Developer Advocacy at IBM. Building a multi-agent system with agents built or used by different teams and organizations can become challenging. You may need to write custom integrations each time a team updates their agent design or changes their choice of agentic orchestration framework. The Agent Communication Protocol (ACP) is an open protocol that addresses this challenge by standardizing how agents communicate, using a unified RESTful interface that works across frameworks. In this protocol, you host an agent inside an ACP server, which handles requests from an ACP client and passes them to the appropriate agent. Using a standardized client-server interface allows multiple teams to reuse agents across projects. It also makes it easier to switch between frameworks, replace an agent with a new version, or update a multi-agent system without refactoring the entire system. In this course, you’ll learn to connect agents through ACP. You’ll understand the lifecycle of an ACP Agent and how it compares to other protocols, such as MCP (Model Context Protocol) and A2A (Agent-to-Agent). You’ll build ACP-compliant agents and implement both sequential and hierarchical workflows of multiple agents collaborating using ACP. Through hands-on exercises, you’ll build: - A RAG agent with CrewAI and wrap it inside an ACP server. - An ACP Client to make calls to the ACP server you created. - A sequential workflow that chains an ACP server, created with Smolagents, to the RAG agent. - A hierarchical workflow using a router agent that transforms user queries into tasks, delegated to agents available through ACP servers. - An agent that uses MCP to access tools and ACP to communicate with other agents. You’ll finish up by importing your ACP agents into the BeeAI platform, an open-source registry for discovering and sharing agents. ACP enables collaboration between agents across teams and organizations. By the end of this course, you’ll be able to build ACP agents and workflows that communicate and collaborate regardless of framework. Please sign up here:

New Course: ACP: Agent Communication Protocol Learn to build agents that communicate and collaborate across different frameworks using ACP in this short course built with IBM Research's BeeAI, and taught by Sandi Besen, AI Research Engineer & Ecosystem Lead at IBM, and Nicholas Renotte, Head of AI Developer Advocacy at IBM. Building a multi-agent system with agents built or used by different teams and organizations can become challenging. You may need to write custom integrations each time a team updates their agent design or changes their choice of agentic orchestration framework. The Agent Communication Protocol (ACP) is an open protocol that addresses this challenge by standardizing how agents communicate, using a unified RESTful interface that works across frameworks. In this protocol, you host an agent inside an ACP server, which handles requests from an ACP client and passes them to the appropriate agent. Using a standardized client-server interface allows multiple teams to reuse agents across projects. It also makes it easier to switch between frameworks, replace an agent with a new version, or update a multi-agent system without refactoring the entire system. In this course, you’ll learn to connect agents through ACP. You’ll understand the lifecycle of an ACP Agent and how it compares to other protocols, such as MCP (Model Context Protocol) and A2A (Agent-to-Agent). You’ll build ACP-compliant agents and implement both sequential and hierarchical workflows of multiple agents collaborating using ACP. Through hands-on exercises, you’ll build: - A RAG agent with CrewAI and wrap it inside an ACP server. - An ACP Client to make calls to the ACP server you created. - A sequential workflow that chains an ACP server, created with Smolagents, to the RAG agent. - A hierarchical workflow using a router agent that transforms user queries into tasks, delegated to agents available through ACP servers. - An agent that uses MCP to access tools and ACP to communicate with other agents. You’ll finish up by importing your ACP agents into the BeeAI platform, an open-source registry for discovering and sharing agents. ACP enables collaboration between agents across teams and organizations. By the end of this course, you’ll be able to build ACP agents and workflows that communicate and collaborate regardless of framework. Please sign up here:

Andrew Ng

105,343 görüntüleme • 1 yıl önce

New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with Red Hat and taught by Cedric Clyburn. Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On top of that, every active request needs its own chunk of GPU memory, the KV cache, to store the token context it has built up so far. In this course, you'll learn to reduce a model's memory footprint with quantization and serve it using vLLM, which handles many concurrent requests efficiently through smart memory management. Skills you'll gain: - Quantize a model and measure the accuracy tradeoff - Serve a model with vLLM and watch it handle concurrent requests efficiently - Benchmark your deployment and make informed tradeoffs between speed, cost, and accuracy Join and learn to serve LLMs efficiently:

New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with Red Hat and taught by Cedric Clyburn. Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On top of that, every active request needs its own chunk of GPU memory, the KV cache, to store the token context it has built up so far. In this course, you'll learn to reduce a model's memory footprint with quantization and serve it using vLLM, which handles many concurrent requests efficiently through smart memory management. Skills you'll gain: - Quantize a model and measure the accuracy tradeoff - Serve a model with vLLM and watch it handle concurrent requests efficiently - Benchmark your deployment and make informed tradeoffs between speed, cost, and accuracy Join and learn to serve LLMs efficiently:

Andrew Ng

129,020 görüntüleme • 1 ay önce

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,656 görüntüleme • 1 yıl önce

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

Andrew Ng

341,204 görüntüleme • 1 yıl önce

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 görüntüleme • 1 yıl önce

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

77,792 görüntüleme • 1 yıl önce

most agent memory is one file that grows forever and gets re-read every single turn. it overflows the context, overwrites old facts, loses the thread. that's why your agent feels sharp on day one and lost by week three. Sibyl Memory replaces the pile with a structure: → leaner context, lower token cost. a bloated agent file and flat memory get re-read in full every message. Sibyl Memory keeps a light hot layer and retrieves the rest on demand, so each turn carries only the tokens that matter. → relations are first-class. people link to projects, projects to deals, deals to the decisions behind them. ask about one partner and the agent surfaces the whole connected web around them. that relational context is what lets it help run a company. → one source of truth per fact, person, and project. no duplicates, no silent overwrites. works with hermes, claude code, and codex. beta open.

most agent memory is one file that grows forever and gets re-read every single turn. it overflows the context, overwrites old facts, loses the thread. that's why your agent feels sharp on day one and lost by week three. Sibyl Memory replaces the pile with a structure: → leaner context, lower token cost. a bloated agent file and flat memory get re-read in full every message. Sibyl Memory keeps a light hot layer and retrieves the rest on demand, so each turn carries only the tokens that matter. → relations are first-class. people link to projects, projects to deals, deals to the decisions behind them. ask about one partner and the agent surfaces the whole connected web around them. that relational context is what lets it help run a company. → one source of truth per fact, person, and project. no duplicates, no silent overwrites. works with hermes, claude code, and codex. beta open.

SIBYL

81,361 görüntüleme • 2 ay önce

Introducing the context course: a free course on doing ML with agent context. You will learn how to train models, optimize inferences, and build datasets, all by defining harness context with`SKILLS.md`, Plugins, MCP, Subagents, and Hooks. The course includes: - Weekly live AMA on YouTube - Weekly practical projects for ML with context - Instructions in Pi, Codex, Claude, and Opencode - Tutorials and guides on fundamentals - Interactive Quizzes Learn to give AI agents the right knowledge, tools, and structure to actually get work done. Skills, MCP servers, plugins, multi-agent workflows, and building an agent from scratch. Join here:

Introducing the context course: a free course on doing ML with agent context. You will learn how to train models, optimize inferences, and build datasets, all by defining harness context with`SKILLS.md`, Plugins, MCP, Subagents, and Hooks. The course includes: - Weekly live AMA on YouTube - Weekly practical projects for ML with context - Instructions in Pi, Codex, Claude, and Opencode - Tutorials and guides on fundamentals - Interactive Quizzes Learn to give AI agents the right knowledge, tools, and structure to actually get work done. Skills, MCP servers, plugins, multi-agent workflows, and building an agent from scratch. Join here:

Ben Burtenshaw

16,905 görüntüleme • 2 ay önce

researchers gave a tiny local model human-style memory and its context limit basically stopped existing a team from MBZUAI, Princeton and Weizmann took a 1B model and rebuilt how it reads. instead of attending to everything at once, the model reads in 1,024 token chunks and passes the important stuff forward through an associative memory, the same way you carry the plot of a book between chapters without rereading them. the design mirrors human memory on purpose. full attention inside a chunk works as short-term memory. the module that carries information between chunks works as long-term memory. they even trained it like a person, starting with short easy texts and raising the difficulty gradually, because memory thrown into the deep end learns nothing. the numbers back it up. the normal model burns 40GB of GPU memory on a long document and collapses hard past its limit, dropping from 0.86 to 0.32 accuracy. the memory version holds 0.71 at double that length while using a flat 12GB no matter how long the input gets. it also needs about 30% fewer FLOPs. the part i keep thinking about is that nobody scaled anything here. they didn't build a bigger model, didn't stretch the window, didn't add compute. they looked at how a brain handles a long day and copied the architecture. a model small enough to run on a consumer gpu now survives documents its own architecture used to choke on. we keep treating intelligence as a compute problem. sometimes it's a memory problem.

researchers gave a tiny local model human-style memory and its context limit basically stopped existing a team from MBZUAI, Princeton and Weizmann took a 1B model and rebuilt how it reads. instead of attending to everything at once, the model reads in 1,024 token chunks and passes the important stuff forward through an associative memory, the same way you carry the plot of a book between chapters without rereading them. the design mirrors human memory on purpose. full attention inside a chunk works as short-term memory. the module that carries information between chunks works as long-term memory. they even trained it like a person, starting with short easy texts and raising the difficulty gradually, because memory thrown into the deep end learns nothing. the numbers back it up. the normal model burns 40GB of GPU memory on a long document and collapses hard past its limit, dropping from 0.86 to 0.32 accuracy. the memory version holds 0.71 at double that length while using a flat 12GB no matter how long the input gets. it also needs about 30% fewer FLOPs. the part i keep thinking about is that nobody scaled anything here. they didn't build a bigger model, didn't stretch the window, didn't add compute. they looked at how a brain handles a long day and copied the architecture. a model small enough to run on a consumer gpu now survives documents its own architecture used to choke on. we keep treating intelligence as a compute problem. sometimes it's a memory problem.

Alex Veremeyenko

16,147 görüntüleme • 9 gün önce

Happy to properly launch Anna, the proactive AI agent for parents! Uncovering a bit of the technology behind the scenes! Building Anna is where I learned: 💾 Memory as plain text sucks. You need structured memory. Like a full-blown PostgreSQL DB that stores your tasks and calendar in a structured manner. Most harnesses are good at coding-related stuff. Let it do the query. Don't let it vibe-search the memory. Let it vibe your SQL query 💭 Dreaming is a useful concept for enhancing memory to feed the LLM context. But DO NOT vibe your dream. Asking your agent to "hey, just dream and keep the relevant memory around" is a recipe for deleting a bunch of important information and keeping trash around. Your dream needs to have some Taxonomy (or better, Ontology). What information is important? For who? With what object? What can they do? And again, these are impossible to describe and act well without a proper schema 🔄 Loop Engineering is important for smoothing out rough edges in the system we build. But even expensive loop engineering with a state-of-the-art model can't out-engineer bad system design. The highest leverage an AI Engineer can do is actually building the right system design, and having an eye on both product delight and engineering scalability There are several more insights that I plan to cover in a dedicated video about Agentic AI Engineering. But it's actually a huge relief that the future of software engineering... is still software engineering

Happy to properly launch Anna, the proactive AI agent for parents! Uncovering a bit of the technology behind the scenes! Building Anna is where I learned: 💾 Memory as plain text sucks. You need structured memory. Like a full-blown PostgreSQL DB that stores your tasks and calendar in a structured manner. Most harnesses are good at coding-related stuff. Let it do the query. Don't let it vibe-search the memory. Let it vibe your SQL query 💭 Dreaming is a useful concept for enhancing memory to feed the LLM context. But DO NOT vibe your dream. Asking your agent to "hey, just dream and keep the relevant memory around" is a recipe for deleting a bunch of important information and keeping trash around. Your dream needs to have some Taxonomy (or better, Ontology). What information is important? For who? With what object? What can they do? And again, these are impossible to describe and act well without a proper schema 🔄 Loop Engineering is important for smoothing out rough edges in the system we build. But even expensive loop engineering with a state-of-the-art model can't out-engineer bad system design. The highest leverage an AI Engineer can do is actually building the right system design, and having an eye on both product delight and engineering scalability There are several more insights that I plan to cover in a dedicated video about Agentic AI Engineering. But it's actually a huge relief that the future of software engineering... is still software engineering

Gogo | Dota for Toxicity

30,766 görüntüleme • 1 ay önce

RAG might already be becoming obsolete. A month ago, Andrej Karpathy dropped a simple GitHub gist called “LLM Wiki.” Now the comments section looks like the birth of an entirely new AI category. 5000+ stars later, developers are rapidly building: • persistent AI memory systems • self-maintaining knowledge bases • multi-agent research environments • contradiction detection engines • AI-native company operating systems • local-first memory architectures • graph-based reasoning layers • evolving second brains And the craziest part? Most of them were built in DAYS. Because the core idea is insanely powerful: Instead of AI repeatedly retrieving raw chunks like traditional RAG… …the model continuously maintains a living knowledge system. Not temporary context. Persistent synthesis. The shift sounds subtle until you realize what it changes: RAG: retrieve → answer → forget LLM Wiki: ingest → synthesize → evolve That one architectural difference is causing an explosion of experimentation right now. People are already building: • agent memory operating systems • AI-maintained engineering documentation • self-healing knowledge graphs • persistent research environments • conversational memory architectures • contradiction-aware wikis • context compression engines • machine-readable company systems The comments section alone feels like watching an ecosystem form in real time. One developer built deterministic contradiction detection using sheaf cohomology Another built “sleep consolidation” for AI memory systems inspired by human memory formation Another created persistent multi-agent vault conversations Another turned entire repositories into continuously maintained AI wikis Another built local-first memory systems with audit trails, provenance, graph exports, and MCP integration This is the important part: Karpathy didn’t launch a product. He introduced a pattern. And patterns are what create ecosystems. The same way: • transformers created modern AI • RAG created AI retrieval startups • agents created orchestration frameworks LLM Wikis may create persistent AI memory infrastructure. That’s why this moment feels different. For years, AI systems have been stateless. Now developers are trying to build systems that actually accumulate understanding over time. And once knowledge compounds instead of resetting… …the entire interface layer of AI changes. (Link in comments)

RAG might already be becoming obsolete. A month ago, Andrej Karpathy dropped a simple GitHub gist called “LLM Wiki.” Now the comments section looks like the birth of an entirely new AI category. 5000+ stars later, developers are rapidly building: • persistent AI memory systems • self-maintaining knowledge bases • multi-agent research environments • contradiction detection engines • AI-native company operating systems • local-first memory architectures • graph-based reasoning layers • evolving second brains And the craziest part? Most of them were built in DAYS. Because the core idea is insanely powerful: Instead of AI repeatedly retrieving raw chunks like traditional RAG… …the model continuously maintains a living knowledge system. Not temporary context. Persistent synthesis. The shift sounds subtle until you realize what it changes: RAG: retrieve → answer → forget LLM Wiki: ingest → synthesize → evolve That one architectural difference is causing an explosion of experimentation right now. People are already building: • agent memory operating systems • AI-maintained engineering documentation • self-healing knowledge graphs • persistent research environments • conversational memory architectures • contradiction-aware wikis • context compression engines • machine-readable company systems The comments section alone feels like watching an ecosystem form in real time. One developer built deterministic contradiction detection using sheaf cohomology Another built “sleep consolidation” for AI memory systems inspired by human memory formation Another created persistent multi-agent vault conversations Another turned entire repositories into continuously maintained AI wikis Another built local-first memory systems with audit trails, provenance, graph exports, and MCP integration This is the important part: Karpathy didn’t launch a product. He introduced a pattern. And patterns are what create ecosystems. The same way: • transformers created modern AI • RAG created AI retrieval startups • agents created orchestration frameworks LLM Wikis may create persistent AI memory infrastructure. That’s why this moment feels different. For years, AI systems have been stateless. Now developers are trying to build systems that actually accumulate understanding over time. And once knowledge compounds instead of resetting… …the entire interface layer of AI changes. (Link in comments)

Suryansh Tiwari

141,998 görüntüleme • 2 ay önce

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Avi Chawla

94,290 görüntüleme • 8 ay önce