Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

We just released "Large Language Models with Semantic Search”, built with Cohere, and taught by Jay Alammar and Luis Serrano. Search is a key part of many applications. Say, you need to retrieve documents or products in response to a user query; how can LLMs help? You’ll learn about... show more

Andrew Ng

1,704,718 subscribers

596,792 görüntüleme • 3 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

Mario Caronna profil fotoğrafı

Mario Caronna3 yıl önce

@cohere @JayAlammar @SerranoAcademy Imho, who is following this courses is like doing courses of www in the early '90. We are putting foundations for the next decades.

Jindong Wang profil fotoğrafı

Jindong Wang2 yıl önce

@cohere @JayAlammar @SerranoAcademy We did similar things a few months ago. Check our project: SearchAnything

𝕨𝕒𝕟𝕟𝕒ℂ𝕣𝕪 profil fotoğrafı

𝕨𝕒𝕟𝕟𝕒ℂ𝕣𝕪3 yıl önce

@cohere @JayAlammar @SerranoAcademy كفايا علم يسطى ابوس ايدك العالم هيفرقع

SGK 🔥 🚩 profil fotoğrafı

SGK 🔥 🚩3 yıl önce

@cohere @JayAlammar @SerranoAcademy So this basically just RAG? Searching your domain dataset is basically searching your embeddings in the vectorised database, how does sending that back to the LLM improve the search quality?

Chirag profil fotoğrafı

Chirag3 yıl önce

@cohere @JayAlammar @SerranoAcademy @AndrewYNg would it be possible to make those courses available offline? I usually find most time when I am on a flight. Would be great to get them offline.

crowdinsights profil fotoğrafı

crowdinsights2 yıl önce

@cohere @JayAlammar @SerranoAcademy This is amazing Andrew! Your short course is very well explained. Can't wait to share this to our community. Truly a pioneer!💯

Eyad Aiman 🦜🦙 profil fotoğrafı

Eyad Aiman 🦜🦙3 yıl önce

@cohere @JayAlammar @SerranoAcademy Legend❤️

Russ profil fotoğrafı

Russ3 yıl önce

@cohere @JayAlammar @SerranoAcademy Been following the quick movement to smaller, open source LLMs to save time and compute for companies for fine tuning? Vs. latest Anthropic deal to customize (fine tune) their LLM for telco use cases. $100Mil is corporate signalling and crazy talk. @karpathy

OpenTools profil fotoğrafı

OpenTools2 yıl önce

@cohere @JayAlammar @SerranoAcademy This is truly awesome!

Mo Rebaie profil fotoğrafı

Mo Rebaie3 yıl önce

@cohere @JayAlammar @SerranoAcademy Exciting!

Benzer Videolar

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

LangChain: Chat with Your Data, a new free short course created with Harrison Chase, is now available! In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or collection of documents (often called Retrieval Augmented Generation). You'll also learn how to use vector stores and embeddings to retrieve document chunks relevant to a query. I hope you enjoy the course!

Andrew Ng

384,282 görüntüleme • 3 yıl önce

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,656 görüntüleme • 1 yıl önce

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

New short course on advanced retrieval for RAG (retrieval augmented generation)! RAG fetches relevant documents to give context to an LLM. In Advanced Retrieval for AI with Chroma, taught by Chroma founder anton 🇺🇸, you’ll learn: (i) Query expansion using an LLM to rewrite and improve a query, by either generating either additional relevant queries or a hypothetical answer to the query. (ii) Reranking using a cross-encoder - a model trained to measure similarity between two inputs presented simultaneously. Reranking reorders retrieved documents based on the cross-encoder similarity measure. (iii) Constructing and training an Embedding Adaptor, which is a model that adapts the embedding values to be more relevant to your use case. Each of these techniques can help you build much better RAG systems. Please sign up for the course here:

Andrew Ng

191,219 görüntüleme • 2 yıl önce

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 görüntüleme • 2 yıl önce

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Learn to optimize RAG for cost and performance in our new short course, Prompt Compression and Query Optimization, created with MongoDB and taught by Richmond Alake. This course teaches you to combine traditional database capabilities with vector search using MongoDB for RAG. You'll learn these techniques: - Vector search: For semantic matching of user queries - Filtering using metadata: Pre- and post-filtering to narrow search results - Projections: Selecting only necessary fields to minimize data returned - Boosting: Reranking results to improve relevance - Prompt compression: Using a small LLM to compress context, significantly reducing token count and processing costs These methods address scaling, performance, and security challenges in large-scale RAG applications. You can sign up here:

Andrew Ng

71,710 görüntüleme • 2 yıl önce

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,313 görüntüleme • 1 yıl önce

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 görüntüleme • 2 yıl önce

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

Chris Long

13,185 görüntüleme • 2 yıl önce

Today I’m excited to introduce Copilot Search in Bing. Copilot Search blends the best of traditional and generative search to help you find what you need. Whether it’s a navigational search result, a quick straightforward answer, or a complex query that leads you on a journey of discovery, Bing is your AI-powered search and answer engine. Copilot Search is rolling out today to everyone. To get started, go to and start exploring. This is a meaningful next step in our evolution of search, building on our learnings from Bing Chat, Copilot, and Bing Generative Search to provide our users the best search experience while supporting and building a healthy web ecosystem. Learn more in today’s announcement:

Today I’m excited to introduce Copilot Search in Bing. Copilot Search blends the best of traditional and generative search to help you find what you need. Whether it’s a navigational search result, a quick straightforward answer, or a complex query that leads you on a journey of discovery, Bing is your AI-powered search and answer engine. Copilot Search is rolling out today to everyone. To get started, go to and start exploring. This is a meaningful next step in our evolution of search, building on our learnings from Bing Chat, Copilot, and Bing Generative Search to provide our users the best search experience while supporting and building a healthy web ecosystem. Learn more in today’s announcement:

Jordi Ribas

16,729 görüntüleme • 1 yıl önce

Vector databases are a key part of many LLM applications that need search or data retrieval, for example with Retrieval Augmented Generation (RAG). Learn how they work + how to use them in our new short course, taught by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿!

Vector databases are a key part of many LLM applications that need search or data retrieval, for example with Retrieval Augmented Generation (RAG). Learn how they work + how to use them in our new short course, taught by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿!

Andrew Ng

420,683 görüntüleme • 2 yıl önce

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

Jerry Liu

76,293 görüntüleme • 2 yıl önce

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,618 görüntüleme • 2 yıl önce

New short course: Attention in Transformers: Concepts and Code in PyTorch. Last week we released a course on how LLM transformers work. This week, go deeper and learn about the technical ideas behind the attention mechanism, and see how to code it in PyTorch. This course is built with Joshua Starmer, Founder and CEO of StatQuest. The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design. In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications. What you will do: - Understand the evolution of the attention mechanism, a key breakthrough that led to transformers. - Learn the relationships between word embeddings, positional embeddings, and attention. - Learn about the Query, Key, and Value matrices, and how to produce and use them in attention. - Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work. - Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs. - Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer. - Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention. There're lots of exciting technical details in this course. Please sign up here:

New short course: Attention in Transformers: Concepts and Code in PyTorch. Last week we released a course on how LLM transformers work. This week, go deeper and learn about the technical ideas behind the attention mechanism, and see how to code it in PyTorch. This course is built with Joshua Starmer, Founder and CEO of StatQuest. The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design. In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications. What you will do: - Understand the evolution of the attention mechanism, a key breakthrough that led to transformers. - Learn the relationships between word embeddings, positional embeddings, and attention. - Learn about the Query, Key, and Value matrices, and how to produce and use them in attention. - Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work. - Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs. - Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer. - Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention. There're lots of exciting technical details in this course. Please sign up here:

Andrew Ng

132,270 görüntüleme • 1 yıl önce

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,352 görüntüleme • 2 yıl önce

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 görüntüleme • 1 yıl önce

New short course: Open Source Models with Hugging Face 🤗, taught by Maria Khalusova, Marc Sun, and Younes Belkada! Hugging Face has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models. You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces. You can sign up here:

New short course: Open Source Models with Hugging Face 🤗, taught by Maria Khalusova, Marc Sun, and Younes Belkada! Hugging Face has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models. You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces. You can sign up here:

Andrew Ng

224,538 görüntüleme • 2 yıl önce

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

Santiago

255,433 görüntüleme • 1 yıl önce

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

Isaac Flath

40,212 görüntüleme • 3 ay önce

99% of AI applications are cool-looking demos. Impressive, but don't get fooled by the hype. It takes a lot to build enterprise-grade products that deliver real value. I have at least three weekly conversations with companies that want to use a Large Language Model with their data. The demand is huge! Here is one idea about what you can do to help. The use cases that most of these companies want to solve are similar: They have an extensive knowledge base and want to build a simple application that uses that information to answer questions. In other words, they need help building Retrieval Augmented Generation (RAG) applications they can use in many different scenarios: 1. To train new employees 2. To help their support team 3. To search old meetings and documents 4. To help with their research However, building these systems is not straightforward. Yes, there's a lot of information online, but there aren't enough people who know how to create solutions that work. Here is the idea: Today, you can build an enterprise-grade RAG application without writing code. A couple of MIT PhDs with 10+ years of experience building AI applications created . It's a no-code platform for building applications using Large Language Models. They are partnering with me on this post. You can use Stack AI to create, test, and deploy an end-to-end production-ready AI system. It's SOC-2, HIPAA, and GDPR compliant and offers SSO, role management, access control, and on-premise deployments. Of course, you can use the platform with any LLM on the market now. It's the whole nine yards for building AI applications. Check them out here: 2023 was about models. 2024 is about the tools using these models to build production-ready applications. That's where I'd start.

99% of AI applications are cool-looking demos. Impressive, but don't get fooled by the hype. It takes a lot to build enterprise-grade products that deliver real value. I have at least three weekly conversations with companies that want to use a Large Language Model with their data. The demand is huge! Here is one idea about what you can do to help. The use cases that most of these companies want to solve are similar: They have an extensive knowledge base and want to build a simple application that uses that information to answer questions. In other words, they need help building Retrieval Augmented Generation (RAG) applications they can use in many different scenarios: 1. To train new employees 2. To help their support team 3. To search old meetings and documents 4. To help with their research However, building these systems is not straightforward. Yes, there's a lot of information online, but there aren't enough people who know how to create solutions that work. Here is the idea: Today, you can build an enterprise-grade RAG application without writing code. A couple of MIT PhDs with 10+ years of experience building AI applications created . It's a no-code platform for building applications using Large Language Models. They are partnering with me on this post. You can use Stack AI to create, test, and deploy an end-to-end production-ready AI system. It's SOC-2, HIPAA, and GDPR compliant and offers SSO, role management, access control, and on-premise deployments. Of course, you can use the platform with any LLM on the market now. It's the whole nine yards for building AI applications. Check them out here: 2023 was about models. 2024 is about the tools using these models to build production-ready applications. That's where I'd start.

Santiago

197,675 görüntüleme • 2 yıl önce

New course! Generative AI with Large Language Models, created with Amazon Web Services and hosted on Coursera. This course goes deep into the technical foundations of LLMs and how to use them. You can sign up here: You’ll work through the full life-cycle of a generative AI project, and learn specific techniques like RLHF; zero-shot, one-shot, and few-shot learning with LLMs; advanced prompting frameworks like ReAct; even fine-tuning LLMs, and gain hands-on practice with all of these techniques. Instructors Antje Barth Chris Fregly Shelbee Eigenbrode and Mike G Chambers all do incredible Generative AI work at AWS, and have supported many companies to build creative LLM applications. They bring tremendous practical LLM expertise to this course. I'm confident you’ll finish this course with a deeper understanding of how LLMs work, and how to use them. I hope you enjoy the course!

New course! Generative AI with Large Language Models, created with Amazon Web Services and hosted on Coursera. This course goes deep into the technical foundations of LLMs and how to use them. You can sign up here: You’ll work through the full life-cycle of a generative AI project, and learn specific techniques like RLHF; zero-shot, one-shot, and few-shot learning with LLMs; advanced prompting frameworks like ReAct; even fine-tuning LLMs, and gain hands-on practice with all of these techniques. Instructors Antje Barth Chris Fregly Shelbee Eigenbrode and Mike G Chambers all do incredible Generative AI work at AWS, and have supported many companies to build creative LLM applications. They bring tremendous practical LLM expertise to this course. I'm confident you’ll finish this course with a deeper understanding of how LLMs work, and how to use them. I hope you enjoy the course!

Andrew Ng

467,912 görüntüleme • 3 yıl önce