Загрузка видео...

Не удалось загрузить видео

На главную

New short course on sophisticated RAG (Retrieval Augmented Generation) techniques is out! Taught by Jerry Liu and Anupam Datta of LlamaIndex 🦙 and TruEra , this teaches advanced techniques that help your LLM generate good answers. Topics include: - Sentence-window retrieval, which retrieves not just the most relevant sentence,...

655,486 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля AshutoshShrivastava
AshutoshShrivastava2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai Thank you so much Andrew for bringing us such amazing course and that too for free 🫡🙏

Фото профиля Indira Negi
Indira Negi2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai Thank you for creating these courses @AndrewYNg! People learn in different ways. For some it is really nice to have all concepts presented in a hierarchical order, where each builds on the previous lesson Plus a sense of completeness of knowledge about the subject

Фото профиля Scott Thorpe
Scott Thorpe2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai RAG to riches : )

Фото профиля Angelina Yang
Angelina Yang2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai My cofounder @MehdiAllahyari and I also wrote a book about RAG systems from our experience using @llama_index and @LangChainAI Share here as well!

Фото профиля MindMeld Maven
MindMeld Maven2 лет назад

As much as I love your short courses, I can't help but feel unfulfilled after completing them. I believe there needs to be more attention on deepening the concept within our minds by doing a single project from scratch to the end by ourselves (not just filling out some code blocks in a very complicated project). Please do consider it.

Фото профиля Jose Pozuelo
Jose Pozuelo2 лет назад

@llama_index @jerryjliu0 @datta_cs @truera_ai Amazing timing!

Фото профиля Darby Bailey 🖍️💫
Darby Bailey 🖍️💫2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai Ng is 🐐

Фото профиля Sri Vigneshwar DJ
Sri Vigneshwar DJ2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai Thank you @AndrewYNg sir for this course

Фото профиля Didier Lopes
Didier Lopes2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai This is sick @jerryjliu0 🔥🔥🔥

Фото профиля Dreyton Scott
Dreyton Scott2 лет назад

@jerryjliu0 @datta_cs @llama_index @truera_ai Perfect

Похожие видео

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 просмотров • 1 год назад

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 просмотров • 11 месяцев назад

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

Victoria Slocum

15,213 просмотров • 1 год назад

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

Andrew Ng

200,729 просмотров • 1 год назад

Traditional chunking: cheap but dumb. ColBERT: smart but expensive. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴: the solution we've been waiting for. Here’s a quick evolution of chunking strategies: → 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count • Sentence Chunking - split by sentence boundaries • Document-Based Chunking - split by sections/paragraphs → 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (when things got sophisticated) • Semantic Chunking - split by meaning • LLM-Based Chunking - let the model decide But each chunking method separates text at defined points, meaning context is lost within the document from one chunk to the next. → 𝗘𝗻𝘁𝗲𝗿 𝗟𝗮𝘁𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the game changer) Traditional approach: Chunk first → Embed each chunk separately Late chunking approach: Embed the entire document → Then chunk with context preserved 𝗪𝗵𝘆 𝗰𝗵𝗼𝗼𝘀𝗲 𝗹𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴? When you chunk first, each piece loses its contextual relationship to the rest of the document. It's like reading a book by randomly picking paragraphs - you miss the flow. With late chunking, every chunk maintains awareness of its neighbors because the embedding happens at the document level first. Mean pooling is done on segments AFTER the full context is embedded. Jina AI tested and saw significant improvements in retrieval quality - chunks that were previously disconnected now maintain their semantic relationships. As documents get longer and context windows expand, late chunking might just become the new standard for high-quality retrieval systems. 𝗪𝗵𝗮𝘁 𝗱𝗼 𝘆𝗼𝘂 𝗻𝗲𝗲𝗱 𝘁𝗼 𝗺𝗮𝗸𝗲 𝘁𝗵𝗶𝘀 𝘄𝗼𝗿𝗸? No modifications to your retrieval pipeline are needed. 1. Long context embedding models (8192+ tokens) 2. Chunking logic that tracks token spans 3. Less than 30 lines of code to implement All you need is to switch the order at which you chunk and embed. Embed FIRST, then chunk, not the other way around. Dive deeper into late chunking:

Femke Plantinga

125,305 просмотров • 10 месяцев назад

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

Andrew Ng

126,355 просмотров • 1 год назад

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

971,622 просмотров • 5 месяцев назад