Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that... doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:show more

Femke Plantinga

12,967 subscribers

140,618 просмотров • 2 лет назад •via X (Twitter)

Новости и политика Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 9

Фото профиля Abhishek Singh

Abhishek Singh2 лет назад

Really effective and simple explanation Femke! How did you generate the diagram animation?

Фото профиля Femke Plantinga

Femke Plantinga2 лет назад

Thank you! Kudos @victorialslocum to animating these designs. 🫶 Rive is great for making these.

Фото профиля esamyak Indore

esamyak Indore2 лет назад

cleared concepts thanks

Фото профиля Femke Plantinga

Femke Plantinga2 лет назад

Thank you! 🙏

Фото профиля digid

digid2 лет назад

Wow! Awesome work! I will check the material to learn more, but what I've watched so far it's pretty clever. Congrats!

Фото профиля QuantizedHuman

QuantizedHuman2 лет назад

The dutch accent always shines through, lol. Well explained. (Goed gedaan, en ik dol gwn een beetje.)

Фото профиля Nikhil Kumar

Nikhil Kumar2 лет назад

Nice, but still need to maintain relation between tables for complex schemas

Фото профиля John the Scott

John the Scott2 лет назад

huh? sql derives from predicate logic. would you describe predicate logic as "keyword-based"?

Фото профиля Manny Bernabe

Manny Bernabe1 год назад

So good.

Похожие видео

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

Femke Plantinga

23,911 просмотров • 1 год назад

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 просмотров • 2 лет назад

Today we announced the new Einstein Copilot Search and Salesforce Vector Database in Data Cloud to power semantic search and retrieval augmented generation. This will enhance Einstein Copilot's ability to understand, generate outputs, and automate actions across a wide variety of use cases, contexts, and data/content types. LLMs can't be relied on to recall specific data they were trained on, so the way to make them work in the enterprise, where accuracy is paramount, is to feed them with the right data using hybrid keyword and vector search-powered RAG. By bringing all this seamlessly into our apps and Einstein Copilot UX as well as Hyperforce cloud infrastructure, we can offer high performance, low latency, addressing data privacy, security, and residency requirements🌟

Today we announced the new Einstein Copilot Search and Salesforce Vector Database in Data Cloud to power semantic search and retrieval augmented generation. This will enhance Einstein Copilot's ability to understand, generate outputs, and automate actions across a wide variety of use cases, contexts, and data/content types. LLMs can't be relied on to recall specific data they were trained on, so the way to make them work in the enterprise, where accuracy is paramount, is to feed them with the right data using hybrid keyword and vector search-powered RAG. By bringing all this seamlessly into our apps and Einstein Copilot UX as well as Hyperforce cloud infrastructure, we can offer high performance, low latency, addressing data privacy, security, and residency requirements🌟

Clara Shih

49,620 просмотров • 2 лет назад

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Femke Plantinga

120,565 просмотров • 1 год назад

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 просмотров • 1 год назад

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,313 просмотров • 1 год назад

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,625 просмотров • 1 год назад

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Andrew Ng

71,710 просмотров • 2 лет назад

🚨 Tiger Data - Creators of TimescaleDB (Yes! the team behind TimescaleDB) is quietly revolutionizing how we build agents. The Old Way: Glueing together Postgres + Vector DBs + Search tools = complexity. The New Way: Tiger Data’s Agentic Postgres handles it all natively 🔥 What’s now possible 🧵 ↓ ✦ Hybrid retrieval → combine keyword filtering and vector similarity in a single query ✦ Forkable databases → retrieval and safe experimentation without external systems ✦ Native Postgres semantics → everything stays queryable, inspectable, and debuggable in SQL Why this matters: → Fewer moving parts in RAG pipelines → No extra infrastructure to stitch together → Agents can iterate and learn safely → Production data stays isolated and protected This is an awesome shift! Stop bolting workflows onto your database. Let your database be the workflow 🔥

🚨 Tiger Data - Creators of TimescaleDB (Yes! the team behind TimescaleDB) is quietly revolutionizing how we build agents. The Old Way: Glueing together Postgres + Vector DBs + Search tools = complexity. The New Way: Tiger Data’s Agentic Postgres handles it all natively 🔥 What’s now possible 🧵 ↓ ✦ Hybrid retrieval → combine keyword filtering and vector similarity in a single query ✦ Forkable databases → retrieval and safe experimentation without external systems ✦ Native Postgres semantics → everything stays queryable, inspectable, and debuggable in SQL Why this matters: → Fewer moving parts in RAG pipelines → No extra infrastructure to stitch together → Agents can iterate and learn safely → Production data stays isolated and protected This is an awesome shift! Stop bolting workflows onto your database. Let your database be the workflow 🔥

Charly Wargnier

20,181 просмотров • 5 месяцев назад

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

Santiago

109,517 просмотров • 1 год назад

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

972,565 просмотров • 6 месяцев назад

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 просмотров • 2 лет назад

I don’t think people realize the scale of data behind the JBP Search Engine. Every single episode since Episode 1 has been transcribed and converted into embeddings for semantic search. This isn’t keyword matching. It understands the context of your query and returns the exact moment you’re looking for…not just a list of episodes where something was mentioned. 1,415 episodes. ~3 hours each. Over 600GB of locally stored audio + additional metadata powering search quality. Could you do it? Maybe, but just know, I’ve personally spent thousands of dollars in compute to make this possible. There’s no way around it when dealing with this much data.

I don’t think people realize the scale of data behind the JBP Search Engine. Every single episode since Episode 1 has been transcribed and converted into embeddings for semantic search. This isn’t keyword matching. It understands the context of your query and returns the exact moment you’re looking for…not just a list of episodes where something was mentioned. 1,415 episodes. ~3 hours each. Over 600GB of locally stored audio + additional metadata powering search quality. Could you do it? Maybe, but just know, I’ve personally spent thousands of dollars in compute to make this possible. There’s no way around it when dealing with this much data.

burner account

130,769 просмотров • 1 месяц назад

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Avi Chawla

94,290 просмотров • 8 месяцев назад

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

Andrew Ng

384,282 просмотров • 3 лет назад

We’re expanding the Gemini API File Search tool 🔍 with 3 new updates that enable developers to more easily build multimodal RAG systems with enhanced precision: + Multimodal Support: By leveraging our Gemini Embedding 2 model, File Search can now reason across image and text simultaneously. + Custom Metadata Filtering: Bring structure to unstructured data by tagging files with custom key-value labels. This pre-filters your data and boosts search speed. + Exact citations: File Search can now capture and return the exact source (down to the page number) for every piece of information indexed. See multimodal File Search in action with our example app in Google AI Studio. Chat with your entire image and doc library, ask questions, and trace answers back to the source:

We’re expanding the Gemini API File Search tool 🔍 with 3 new updates that enable developers to more easily build multimodal RAG systems with enhanced precision: + Multimodal Support: By leveraging our Gemini Embedding 2 model, File Search can now reason across image and text simultaneously. + Custom Metadata Filtering: Bring structure to unstructured data by tagging files with custom key-value labels. This pre-filters your data and boosts search speed. + Exact citations: File Search can now capture and return the exact source (down to the page number) for every piece of information indexed. See multimodal File Search in action with our example app in Google AI Studio. Chat with your entire image and doc library, ask questions, and trace answers back to the source:

Google AI Developers

108,622 просмотров • 2 месяцев назад

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,505 просмотров • 1 год назад

I can search a database with 32 million records in less than 50 milliseconds. This speed is insane! There's no reason your search is slow. There's no reason you aren't using this library when building applications with a lot of data. Take a look at typesense |. It's an open-source search engine. It's crazy fast and easy to use. They are sponsoring this post. Here is their GitHub repository: You can use the library in Python, JavaScript, PHP, Ruby, Swift, and pretty much any other programming language out there. Here are some of the things you can do with Typesense: • Vector search over embeddings • Conversational search for RAG applications • Image similarity search • Voice search And it does all of this extremely fast. A few interesting notes about the library: • Optimized for developer happiness • Works out of the box for most use cases • Very intuitive API • Easy to deploy and scale Probably the best way to compare them is against ElasticSearch. The latter is much more complex to configure and run with a steep learning curve. Check their GitHub repository. They have examples, tutorials, and everything you need.

I can search a database with 32 million records in less than 50 milliseconds. This speed is insane! There's no reason your search is slow. There's no reason you aren't using this library when building applications with a lot of data. Take a look at typesense |. It's an open-source search engine. It's crazy fast and easy to use. They are sponsoring this post. Here is their GitHub repository: You can use the library in Python, JavaScript, PHP, Ruby, Swift, and pretty much any other programming language out there. Here are some of the things you can do with Typesense: • Vector search over embeddings • Conversational search for RAG applications • Image similarity search • Voice search And it does all of this extremely fast. A few interesting notes about the library: • Optimized for developer happiness • Works out of the box for most use cases • Very intuitive API • Easy to deploy and scale Probably the best way to compare them is against ElasticSearch. The latter is much more complex to configure and run with a steep learning curve. Check their GitHub repository. They have examples, tutorials, and everything you need.

Santiago

160,154 просмотров • 2 лет назад

We're excited to introduce the Retrieval Harness in LlamaParse - which is the 2026 version of RAG over documents Generalized agents need the right set of tools to scalably search and read through an arbitrary corpus of data (from 10 docs to 1m+ docs). They can already demonstrate great retrieval performance over a local filesystem, need a proper backend for a large collection of managed data. The Retrieval Harness exposes a diverse set of tools for various needs: 1. Hybrid Retrieval: Combine vector search with keyword search, let the agent set the alpha value to toggle between the two 2. List Files: a scalable version of `ls` to list files within an index 3. File Grep: enable regex search within a given file 4. File Read: Allow agents to read a subsection from an existing document. The agent can choose to interleave any sequence of these tools in order to complete a variety of tasks, from simple to hard. Come check it out! Blog: Sign up to LlamaParse:

We're excited to introduce the Retrieval Harness in LlamaParse - which is the 2026 version of RAG over documents Generalized agents need the right set of tools to scalably search and read through an arbitrary corpus of data (from 10 docs to 1m+ docs). They can already demonstrate great retrieval performance over a local filesystem, need a proper backend for a large collection of managed data. The Retrieval Harness exposes a diverse set of tools for various needs: 1. Hybrid Retrieval: Combine vector search with keyword search, let the agent set the alpha value to toggle between the two 2. List Files: a scalable version of `ls` to list files within an index 3. File Grep: enable regex search within a given file 4. File Read: Allow agents to read a subsection from an existing document. The agent can choose to interleave any sequence of these tools in order to complete a variety of tasks, from simple to hard. Come check it out! Blog: Sign up to LlamaParse:

Jerry Liu

18,615 просмотров • 24 дней назад

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,329 просмотров • 2 лет назад