Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer... support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:show more

Andrew Ng

1,721,541 subscribers

124,625 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

9 Yorum

Zain profil fotoğrafı

Zain1 yıl önce

It was a pleasure working with you and the entire DLAI team!

Nenad Mancevic profil fotoğrafı

Nenad Mancevic1 yıl önce

@ZainHasan6 @AndrewYNg RAG is soo 2024 :) you need a course on how AI Agents are solving the world’s greatest problems. 🤣

Mohammed Lubbad, PhD profil fotoğrafı

Mohammed Lubbad, PhD1 yıl önce

@ZainHasan6 Retrieval Augmented Generation systems are transformational in AI. How do you envision their future impact on content creation? 🤔 #RAGTraining

Tony Sousan profil fotoğrafı

Tony Sousan1 yıl önce

@ZainHasan6 @AndrewYNg, exciting opportunity. What unique insights can RAG systems offer us in solving real-world challenges? #AIinnovation

Julian (Jiayuan) Zhang profil fotoğrafı

Julian (Jiayuan) Zhang1 yıl önce

@ZainHasan6 Awesome! Planning to crash it this weekend.

SOb (💙,🧡) | METTO | 💚🌙 profil fotoğrafı

SOb (💙,🧡) | METTO | 💚🌙1 yıl önce

@ZainHasan6 @MirraTerminal

Hassan profil fotoğrafı

Hassan1 yıl önce

@ZainHasan6 Incredible @ZainHasan6!!

Himanshu Kumar profil fotoğrafı

Himanshu Kumar1 yıl önce

@ZainHasan6 Impressive, but will RAG systems replace traditional search engines, or just enhance them?

Songbird profil fotoğrafı

Songbird1 yıl önce

@ZainHasan6 Excited to take this course

Benzer Videolar

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,329 görüntüleme • 2 yıl önce

Vector databases are a key part of many LLM applications that need search or data retrieval, for example with Retrieval Augmented Generation (RAG). Learn how they work + how to use them in our new short course, taught by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿!

Vector databases are a key part of many LLM applications that need search or data retrieval, for example with Retrieval Augmented Generation (RAG). Learn how they work + how to use them in our new short course, taught by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿!

Andrew Ng

420,683 görüntüleme • 2 yıl önce

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

DeepLearning.AI

11,500 görüntüleme • 11 ay önce

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

Andrew Ng

191,219 görüntüleme • 2 yıl önce

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Andrew Ng

71,710 görüntüleme • 2 yıl önce

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate component behavior - Run experiments and monitor system performance over time The lesson uses Phoenix (by Arize) as an open-source example, but the techniques apply broadly, and you'll also learn where traditional tools like Grafana or Datadog fit in. 🧠 Learn how to monitor and improve your RAG systems in the full course:

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate component behavior - Run experiments and monitor system performance over time The lesson uses Phoenix (by Arize) as an open-source example, but the techniques apply broadly, and you'll also learn where traditional tools like Grafana or Datadog fit in. 🧠 Learn how to monitor and improve your RAG systems in the full course:

DeepLearning.AI

14,702 görüntüleme • 11 ay önce

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 görüntüleme • 2 yıl önce

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 görüntüleme • 2 yıl önce

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

Jerry Liu

76,293 görüntüleme • 2 yıl önce

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Femke Plantinga

120,565 görüntüleme • 1 yıl önce

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

Andrew Ng

384,282 görüntüleme • 3 yıl önce

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,618 görüntüleme • 2 yıl önce

Data preprocessing is critical for building effective RAG systems. Our new short course, Preprocessing Unstructured Data for LLM Applications, taught by Matt Robinson of Unstructured, demonstrates important but sometimes overlooked aspects of RAG systems: - How to extract and normalize content from diverse formats like PDF, Powerpoint, and HTML to expand your LLM's knowledge - Enriching data with metadata to enable more powerful retrieval and reasoning - Applying document layout analysis and vision transforms to process embedded images and tables Then you’ll use all these skills and build a RAG bot that draws from a corpus that includes PDF, PowerPoint, and Markdown documents. Please sign up here:

Data preprocessing is critical for building effective RAG systems. Our new short course, Preprocessing Unstructured Data for LLM Applications, taught by Matt Robinson of Unstructured, demonstrates important but sometimes overlooked aspects of RAG systems: - How to extract and normalize content from diverse formats like PDF, Powerpoint, and HTML to expand your LLM's knowledge - Enriching data with metadata to enable more powerful retrieval and reasoning - Applying document layout analysis and vision transforms to process embedded images and tables Then you’ll use all these skills and build a RAG bot that draws from a corpus that includes PDF, PowerPoint, and Markdown documents. Please sign up here:

Andrew Ng

150,317 görüntüleme • 2 yıl önce

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 görüntüleme • 1 yıl önce

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,313 görüntüleme • 1 yıl önce

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

I’m excited to kick off the first of our short courses focused on agents, starting with Building Agentic RAG with LlamaIndex, taught by Jerry Liu, CEO of LlamaIndex 🦙. This covers an important shift in RAG (retrieval augmented generation), in which rather than having the developer write explicit routines to retrieve information to feed into the LLM context, we instead build a RAG agent that that has access to tools for retrieving information. This lets the agent decide what information to fetch, and enables it to answer more complex questions using multi-step reasoning. In detail, you'll learn about: - Routing: Where your agent will use decision-making to route requests to multiple tools. - Tool Use: Where you'll create an interface for agents to select what tool (function call) to use as well as generate the right arguments. - Multi-step reasoning with tool use: Where you'll use an LLM to carry out multiple steps of reasoning, while retaining memory throughout the process. You’ll also learn how to step through what your agent is doing to debug and improve it iteratively. It’s an exciting time to build agents. Sign up and get started here!

Andrew Ng

297,131 görüntüleme • 2 yıl önce

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,505 görüntüleme • 1 yıl önce

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

77,792 görüntüleme • 1 yıl önce

Production-ready RAG systems need observability. From tracking latency and throughput to evaluating response quality with human feedback or LLM-as-a-judge, robust observability gives you visibility into both system performance and output quality, on both a component and system-wide level. This lesson from our Retrieval Augmented Generation course breaks down the core components of an effective eval system and how to balance cost, automation, and accuracy when choosing your metrics. 📚 Learn more in the full course:

Production-ready RAG systems need observability. From tracking latency and throughput to evaluating response quality with human feedback or LLM-as-a-judge, robust observability gives you visibility into both system performance and output quality, on both a component and system-wide level. This lesson from our Retrieval Augmented Generation course breaks down the core components of an effective eval system and how to balance cost, automation, and accuracy when choosing your metrics. 📚 Learn more in the full course:

DeepLearning.AI

18,793 görüntüleme • 6 ay önce

Our first Generative AI short course in JavaScript! GitHub recently reported that JavaScript is again the world’s most popular programming language. To support web developers exploring and developing with generative AI, we just launched a new short course in JavaScript taught by Jacob Lee, founding engineer at . In Build LLM Apps with LangChain.js you’ll learn elements common in AI development, including: (i) Using data loaders to pull data from common sources such as PDFs, websites, and databases (ii) Prompts, which are used to provide the LLM context (iii) Modules to support RAG such as text splitters and integrations with vector stores (iv) Working with different models to write applications that are not vendor-specific (v) Parsers, which extract and format the output for your downstream code to process You’ll also build with the LangChain Expression Language, which lets you easily compose sequences (also called chains) of modules to perform complex tasks using LLMs. Putting all this together, you’ll also work on a conversational question-answering LLM application capable of using external data as context. Please sign up here:

Our first Generative AI short course in JavaScript! GitHub recently reported that JavaScript is again the world’s most popular programming language. To support web developers exploring and developing with generative AI, we just launched a new short course in JavaScript taught by Jacob Lee, founding engineer at . In Build LLM Apps with LangChain.js you’ll learn elements common in AI development, including: (i) Using data loaders to pull data from common sources such as PDFs, websites, and databases (ii) Prompts, which are used to provide the LLM context (iii) Modules to support RAG such as text splitters and integrations with vector stores (iv) Working with different models to write applications that are not vendor-specific (v) Parsers, which extract and format the output for your downstream code to process You’ll also build with the LangChain Expression Language, which lets you easily compose sequences (also called chains) of modules to perform complex tasks using LLMs. Putting all this together, you’ll also work on a conversational question-answering LLM application capable of using external data as context. Please sign up here:

Andrew Ng

284,320 görüntüleme • 2 yıl önce