Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that...

139,979 görüntüleme • 1 yıl önce •via X (Twitter)

9 Yorum

Abhishek Singh profil fotoğrafı
Abhishek Singh1 yıl önce

Really effective and simple explanation Femke! How did you generate the diagram animation?

Femke Plantinga profil fotoğrafı
Femke Plantinga1 yıl önce

Thank you! Kudos @victorialslocum to animating these designs. 🫶 Rive is great for making these.

esamyak Indore profil fotoğrafı
esamyak Indore1 yıl önce

cleared concepts thanks

Femke Plantinga profil fotoğrafı
Femke Plantinga1 yıl önce

Thank you! 🙏

digid profil fotoğrafı
digid1 yıl önce

Wow! Awesome work! I will check the material to learn more, but what I've watched so far it's pretty clever. Congrats!

QuantizedHuman profil fotoğrafı
QuantizedHuman1 yıl önce

The dutch accent always shines through, lol. Well explained. (Goed gedaan, en ik dol gwn een beetje.)

Nikhil Kumar profil fotoğrafı
Nikhil Kumar1 yıl önce

Nice, but still need to maintain relation between tables for complex schemas

John the Scott profil fotoğrafı
John the Scott1 yıl önce

huh? sql derives from predicate logic. would you describe predicate logic as "keyword-based"?

Manny Bernabe profil fotoğrafı
Manny Bernabe1 yıl önce

So good.

Benzer Videolar

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: ​ 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. ​ 2. The connector ecosystem to load data from unstructured data sources is very immature. ​ 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. ​ The goal of a RAG Pipeline is to solve these problems. ​ The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. ​ At a high level, there are four different stages in the architecture of a RAG pipeline: ​ 1. Ingestion: Here is where the pipeline loads the information from the data source. ​ 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. ​ 3. Transform: Where the pipeline chunks the data and generates document embeddings. ​ 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. ​ There are different rabbit holes at each one of these stages. Here are three of them: ​ 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. ​ 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. ​ 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. ​ In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. ​ I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. ​ ​ If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 görüntüleme • 1 yıl önce

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 görüntüleme • 1 yıl önce

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,314 görüntüleme • 10 ay önce

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

970,893 görüntüleme • 4 ay önce