Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves:... Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!show more

Avi Chawla

70,414 subscribers

971,454 просмотров • 4 месяцев назад •via X (Twitter)

Образование Наука и технологии Финансы

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,428 просмотров • 1 год назад

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

Femke Plantinga

29,660 просмотров • 1 год назад

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 просмотров • 1 год назад

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,034 просмотров • 2 лет назад

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Femke Plantinga

120,565 просмотров • 1 год назад

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Femke Plantinga

19,718 просмотров • 1 год назад

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 просмотров • 2 лет назад

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 просмотров • 11 месяцев назад

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

Victoria Slocum

15,213 просмотров • 1 год назад

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

Andrew Ng

191,219 просмотров • 2 лет назад

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,247 просмотров • 2 лет назад

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,624 просмотров • 1 год назад

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

Nishkarsh

3,857,066 просмотров • 3 месяцев назад

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,489 просмотров • 1 год назад

Traditional chunking: cheap but dumb. ColBERT: smart but expensive. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴: the solution we've been waiting for. Here’s a quick evolution of chunking strategies: → 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count • Sentence Chunking - split by sentence boundaries • Document-Based Chunking - split by sections/paragraphs → 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (when things got sophisticated) • Semantic Chunking - split by meaning • LLM-Based Chunking - let the model decide But each chunking method separates text at defined points, meaning context is lost within the document from one chunk to the next. → 𝗘𝗻𝘁𝗲𝗿 𝗟𝗮𝘁𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the game changer) Traditional approach: Chunk first → Embed each chunk separately Late chunking approach: Embed the entire document → Then chunk with context preserved 𝗪𝗵𝘆 𝗰𝗵𝗼𝗼𝘀𝗲 𝗹𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴? When you chunk first, each piece loses its contextual relationship to the rest of the document. It's like reading a book by randomly picking paragraphs - you miss the flow. With late chunking, every chunk maintains awareness of its neighbors because the embedding happens at the document level first. Mean pooling is done on segments AFTER the full context is embedded. Jina AI tested and saw significant improvements in retrieval quality - chunks that were previously disconnected now maintain their semantic relationships. As documents get longer and context windows expand, late chunking might just become the new standard for high-quality retrieval systems. 𝗪𝗵𝗮𝘁 𝗱𝗼 𝘆𝗼𝘂 𝗻𝗲𝗲𝗱 𝘁𝗼 𝗺𝗮𝗸𝗲 𝘁𝗵𝗶𝘀 𝘄𝗼𝗿𝗸? No modifications to your retrieval pipeline are needed. 1. Long context embedding models (8192+ tokens) 2. Chunking logic that tracks token spans 3. Less than 30 lines of code to implement All you need is to switch the order at which you chunk and embed. Embed FIRST, then chunk, not the other way around. Dive deeper into late chunking:

Traditional chunking: cheap but dumb. ColBERT: smart but expensive. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴: the solution we've been waiting for. Here’s a quick evolution of chunking strategies: → 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count • Sentence Chunking - split by sentence boundaries • Document-Based Chunking - split by sections/paragraphs → 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (when things got sophisticated) • Semantic Chunking - split by meaning • LLM-Based Chunking - let the model decide But each chunking method separates text at defined points, meaning context is lost within the document from one chunk to the next. → 𝗘𝗻𝘁𝗲𝗿 𝗟𝗮𝘁𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the game changer) Traditional approach: Chunk first → Embed each chunk separately Late chunking approach: Embed the entire document → Then chunk with context preserved 𝗪𝗵𝘆 𝗰𝗵𝗼𝗼𝘀𝗲 𝗹𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴? When you chunk first, each piece loses its contextual relationship to the rest of the document. It's like reading a book by randomly picking paragraphs - you miss the flow. With late chunking, every chunk maintains awareness of its neighbors because the embedding happens at the document level first. Mean pooling is done on segments AFTER the full context is embedded. Jina AI tested and saw significant improvements in retrieval quality - chunks that were previously disconnected now maintain their semantic relationships. As documents get longer and context windows expand, late chunking might just become the new standard for high-quality retrieval systems. 𝗪𝗵𝗮𝘁 𝗱𝗼 𝘆𝗼𝘂 𝗻𝗲𝗲𝗱 𝘁𝗼 𝗺𝗮𝗸𝗲 𝘁𝗵𝗶𝘀 𝘄𝗼𝗿𝗸? No modifications to your retrieval pipeline are needed. 1. Long context embedding models (8192+ tokens) 2. Chunking logic that tracks token spans 3. Less than 30 lines of code to implement All you need is to switch the order at which you chunk and embed. Embed FIRST, then chunk, not the other way around. Dive deeper into late chunking:

Femke Plantinga

125,283 просмотров • 10 месяцев назад

New short course on sophisticated RAG (Retrieval Augmented Generation) techniques is out! Taught by Jerry Liu and Anupam Datta of LlamaIndex 🦙 and TruEra , this teaches advanced techniques that help your LLM generate good answers. Topics include: - Sentence-window retrieval, which retrieves not just the most relevant sentence, but a window of sentences around it for higher quality context. - Auto-merging retrieval, which organizes your document into a hierarchical tree structure, where each parent node's text is split among its child nodes. Based on the relevance of the child nodes to a user query, this lets you better decide whether the entire parent node should be provided as context to the LLM. - Evaluation methodology for separately evaluating the quality of the key steps of RAG (context relevance, answer relevance, groundedness) so that you can perform error analysis, identify which part of your pipeline needs work, and tune components systematically. Please check out the course!

New short course on sophisticated RAG (Retrieval Augmented Generation) techniques is out! Taught by Jerry Liu and Anupam Datta of LlamaIndex 🦙 and TruEra , this teaches advanced techniques that help your LLM generate good answers. Topics include: - Sentence-window retrieval, which retrieves not just the most relevant sentence, but a window of sentences around it for higher quality context. - Auto-merging retrieval, which organizes your document into a hierarchical tree structure, where each parent node's text is split among its child nodes. Based on the relevance of the child nodes to a user query, this lets you better decide whether the entire parent node should be provided as context to the LLM. - Evaluation methodology for separately evaluating the quality of the key steps of RAG (context relevance, answer relevance, groundedness) so that you can perform error analysis, identify which part of your pipeline needs work, and tune components systematically. Please check out the course!

Andrew Ng

655,486 просмотров • 2 лет назад

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

Philip Vollet

93,431 просмотров • 10 месяцев назад

We saw exactly this in my previous startup RAG + vector DB gives decent results, but agentic search over the repo (glob/grep/read, etc) consistently worked better on real-world codebases. We even pushed further: RAG + embeddings + AST + tree-sitter. The quality was excellent But exactly as Boris Cherny mentions: staleness and privacy, you need continuous re-indexing, and all the code and embeddings must live on your servers. In practice, fast models + bash-style agentic search ended up outperforming general RAG search, even if it requires more tool calls This is what we built 👇

We saw exactly this in my previous startup RAG + vector DB gives decent results, but agentic search over the repo (glob/grep/read, etc) consistently worked better on real-world codebases. We even pushed further: RAG + embeddings + AST + tree-sitter. The quality was excellent But exactly as Boris Cherny mentions: staleness and privacy, you need continuous re-indexing, and all the code and embeddings must live on your servers. In practice, fast models + bash-style agentic search ended up outperforming general RAG search, even if it requires more tool calls This is what we built 👇

Daniel San

447,639 просмотров • 4 месяцев назад

Finally, a RAG solution that works with complex documents! Real-world documents can be messy, filled with text, tables, images, and intricate flow charts. Traditional parsing and chunking methods struggle to handle these. So, what’s the solution? We need smart techniques that can intuitively chunk relevant context and understand what’s inside each chunk, whether it's text, images, or diagrams. In this video, I’ll walk you through a breakthrough technique for extracting structured information from complex documents. It's unlike any other technique you've seen before.✨ It takes any unstructured (text, tables, images, flow-charts) input and parses it into a JSON format that LLMs can easily process. I used EyeLevel.AI's GroundX platform for this – a powerful tool that allows you to build a RAG application in just 3 steps. It also comes with a nice Python SDK and can be easily deployed on-premise (K8s cluster)! Try it yourself:

Finally, a RAG solution that works with complex documents! Real-world documents can be messy, filled with text, tables, images, and intricate flow charts. Traditional parsing and chunking methods struggle to handle these. So, what’s the solution? We need smart techniques that can intuitively chunk relevant context and understand what’s inside each chunk, whether it's text, images, or diagrams. In this video, I’ll walk you through a breakthrough technique for extracting structured information from complex documents. It's unlike any other technique you've seen before.✨ It takes any unstructured (text, tables, images, flow-charts) input and parses it into a JSON format that LLMs can easily process. I used EyeLevel.AI's GroundX platform for this – a powerful tool that allows you to build a RAG application in just 3 steps. It also comes with a nice Python SDK and can be easily deployed on-premise (K8s cluster)! Try it yourself:

Akshay 🚀

102,660 просмотров • 1 год назад

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 просмотров • 1 год назад