Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your... Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:show more

Femke Plantinga

12,622 subscribers

120,565 views • 1 year ago •via X (Twitter)

Science & Technology News & Politics Education

Anya Rossi• Live Now

Private livecam show

10 Comments

Muratcan Koylan1 year ago

Thanks for sharing this, the UI is super cool.

Leonie1 year ago

Love your new content format! 👏

Femke Plantinga1 year ago

Says you! 🫶

Paul Seth1 year ago

Can I use any llama based LLM with it? I want to try some science based LLM or use my PDFs to train it.

Femke Plantinga1 year ago

Absolutely! You can use any model supported by Ollama, including the newest llama3.1, llama3, or even llama2. Here's a full list of supported models To set it up in Verba, I recommend checking out the GitHub Readme of Verba

Peter Okwara1 year ago

Verba is really dope Once I get cash I will probably go live with a proper mvp

Femke Plantinga1 year ago

Nice!

Joel1 year ago

Can I use Verba in my code with DSPY?

Femke Plantinga1 year ago

DSPY is not supported by Verba yet

Mark Virag1 year ago

Loved this thanks for sharing

Related Videos

Build a RAG Application from Scratch 🐕 This video covers the architecture behind Verba 0.3.0! The design behind Verba is to have an explicit manager for each core component of RAG pipelines (Read, Chunk, Embed, Retrieve, Generate) 1. ReaderManager: Load in your data (GitHubReader, PDFReader, SimpleReader, etc.) 2. ChunkManager: Chunk up your data into smaller bits (avoids hitting the token limit and results in better retrieval) 3. EmbeddingManager: Convert your data into embeddings (use OpenAI, Cohere, SentenceTransformer, etc.) 4. RetrieverManager: Retrieve the relevant context from your query 5. GenerationManager: Generate an answer from the retrieved chunks Test out Verba on the Weaviate AI Database docs/blogs here: Verba repo:

Build a RAG Application from Scratch 🐕 This video covers the architecture behind Verba 0.3.0! The design behind Verba is to have an explicit manager for each core component of RAG pipelines (Read, Chunk, Embed, Retrieve, Generate) 1. ReaderManager: Load in your data (GitHubReader, PDFReader, SimpleReader, etc.) 2. ChunkManager: Chunk up your data into smaller bits (avoids hitting the token limit and results in better retrieval) 3. EmbeddingManager: Convert your data into embeddings (use OpenAI, Cohere, SentenceTransformer, etc.) 4. RetrieverManager: Retrieve the relevant context from your query 5. GenerationManager: Generate an answer from the retrieved chunks Test out Verba on the Weaviate AI Database docs/blogs here: Verba repo:

Erika Shorten

155,273 views • 2 years ago

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 views • 1 year ago

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,618 views • 2 years ago

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

Andrew Ng

191,219 views • 2 years ago

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 views • 2 years ago

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,313 views • 1 year ago

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 views • 1 year ago

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,505 views • 1 year ago

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from Cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from Cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,450 views • 2 years ago

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 views • 2 years ago

Introducing RAGs, a Streamlit app that allows you to create and customize your own RAG agent and then use it over your own data, all with natural language 🔥 Directly inspired by OpenAI GPTs, you can converse with an agent to help you do search/retrieval over any data you specify. The app contains three main pages: 🏠 Home Page : Have a “builder agent” build your RAG agent through natural language (you specify the data). ⚙️ RAG Config: Look at configured parameters 🤖 Use your RAG agent! Check out details below 👇 Blog: Repo:

Introducing RAGs, a Streamlit app that allows you to create and customize your own RAG agent and then use it over your own data, all with natural language 🔥 Directly inspired by OpenAI GPTs, you can converse with an agent to help you do search/retrieval over any data you specify. The app contains three main pages: 🏠 Home Page : Have a “builder agent” build your RAG agent through natural language (you specify the data). ⚙️ RAG Config: Look at configured parameters 🤖 Use your RAG agent! Check out details below 👇 Blog: Repo:

LlamaIndex 🦙

475,732 views • 2 years ago

Startups have raised $1B+ to crack Enterprise AI search. Now someone open-sourced it for FREE. The video below shows a self-hosted Slack assistant that answers your questions by searching across all your company's tools in a single query. It's built on top of Airweave, an open-source context retrieval layer that makes all your tools searchable for Agents using semantic, keyword, and agentic search. Here's how the app works: - The app watches for questions in Slack. - It searches every connected tool at once using Airweave (Notion, GitHub, Jira, Linear, etc.) to find relevant context. - The Airweave engine ranks the results by relevance and returns references to the original docs. - An LLM generates the final response and sends it back to Slack with citations. The entire stack is self-hostable via Docker. Find the project in the replies!

Startups have raised $1B+ to crack Enterprise AI search. Now someone open-sourced it for FREE. The video below shows a self-hosted Slack assistant that answers your questions by searching across all your company's tools in a single query. It's built on top of Airweave, an open-source context retrieval layer that makes all your tools searchable for Agents using semantic, keyword, and agentic search. Here's how the app works: - The app watches for questions in Slack. - It searches every connected tool at once using Airweave (Notion, GitHub, Jira, Linear, etc.) to find relevant context. - The Airweave engine ranks the results by relevance and returns references to the original docs. - An LLM generates the final response and sends it back to Slack with citations. The entire stack is self-hostable via Docker. Find the project in the replies!

Avi Chawla

11,927 views • 4 months ago

New short course on sophisticated RAG (Retrieval Augmented Generation) techniques is out! Taught by Jerry Liu and Anupam Datta of LlamaIndex 🦙 and TruEra , this teaches advanced techniques that help your LLM generate good answers. Topics include: - Sentence-window retrieval, which retrieves not just the most relevant sentence, but a window of sentences around it for higher quality context. - Auto-merging retrieval, which organizes your document into a hierarchical tree structure, where each parent node's text is split among its child nodes. Based on the relevance of the child nodes to a user query, this lets you better decide whether the entire parent node should be provided as context to the LLM. - Evaluation methodology for separately evaluating the quality of the key steps of RAG (context relevance, answer relevance, groundedness) so that you can perform error analysis, identify which part of your pipeline needs work, and tune components systematically. Please check out the course!

New short course on sophisticated RAG (Retrieval Augmented Generation) techniques is out! Taught by Jerry Liu and Anupam Datta of LlamaIndex 🦙 and TruEra , this teaches advanced techniques that help your LLM generate good answers. Topics include: - Sentence-window retrieval, which retrieves not just the most relevant sentence, but a window of sentences around it for higher quality context. - Auto-merging retrieval, which organizes your document into a hierarchical tree structure, where each parent node's text is split among its child nodes. Based on the relevance of the child nodes to a user query, this lets you better decide whether the entire parent node should be provided as context to the LLM. - Evaluation methodology for separately evaluating the quality of the key steps of RAG (context relevance, answer relevance, groundedness) so that you can perform error analysis, identify which part of your pipeline needs work, and tune components systematically. Please check out the course!

Andrew Ng

655,572 views • 2 years ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,510 views • 3 years ago

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,329 views • 2 years ago

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

Andrew Ng

297,131 views • 2 years ago

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

Philip Vollet

93,598 views • 11 months ago

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

Andrew Ng

384,282 views • 3 years ago

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

Santiago

255,433 views • 1 year ago

Today we announced the new Einstein Copilot Search and Salesforce Vector Database in Data Cloud to power semantic search and retrieval augmented generation. This will enhance Einstein Copilot's ability to understand, generate outputs, and automate actions across a wide variety of use cases, contexts, and data/content types. LLMs can't be relied on to recall specific data they were trained on, so the way to make them work in the enterprise, where accuracy is paramount, is to feed them with the right data using hybrid keyword and vector search-powered RAG. By bringing all this seamlessly into our apps and Einstein Copilot UX as well as Hyperforce cloud infrastructure, we can offer high performance, low latency, addressing data privacy, security, and residency requirements🌟

Today we announced the new Einstein Copilot Search and Salesforce Vector Database in Data Cloud to power semantic search and retrieval augmented generation. This will enhance Einstein Copilot's ability to understand, generate outputs, and automate actions across a wide variety of use cases, contexts, and data/content types. LLMs can't be relied on to recall specific data they were trained on, so the way to make them work in the enterprise, where accuracy is paramount, is to feed them with the right data using hybrid keyword and vector search-powered RAG. By bringing all this seamlessly into our apps and Einstein Copilot UX as well as Hyperforce cloud infrastructure, we can offer high performance, low latency, addressing data privacy, security, and residency requirements🌟

Clara Shih

49,620 views • 2 years ago