Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is... close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️show more

Nishkarsh

8,079 subscribers

3,865,174 Aufrufe • vor 4 Monaten •via X (Twitter)

Bildung Nachrichten & Politik Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,618 Aufrufe • vor 2 Jahren

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

972,565 Aufrufe • vor 6 Monaten

Vector databases are a scam. Not technically, they do exactly what they say. Return the most cosine-similar string to your query. The scam is the entire industry pretending that's the same thing as relevance. It isn't. Search "Apple." You get the fruit, the company, the watch, and a recipe blog. Your agent picks one at random and calls it retrieval. Your customer calls it broken. Most AI agents shipping right now are duct-taped on top of this. They demo well because demos are easy. They die in production because production is real. HydraDB's Founder Nish (Nishkarsh) said the quiet part out loud — "vector databases suck, similarity is not relevance" — and the demo signups haven't stopped since. He raised $6.5M because he was the first to name what everyone in the room already knew. If your retrieval layer is a flat embedding index, you're not building infrastructure. You're building a liability with a prettier name. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 (00:00) AI Needs Context (01:30) HydraDB Explained (07:41) Vector Search Breaks (09:32) Messaging That Converts (13:41) Writing the Viral Tweet (16:07) Similarity Not Relevance (20:46) POC to Production Gap (35:35) Raising 6.5 Million Fast (39:33) Founder Lesson on Messaging This is a Composio "Agents at Work" podcast, where I chat with founders building the next leap of AI. Follow for more:)

Vector databases are a scam. Not technically, they do exactly what they say. Return the most cosine-similar string to your query. The scam is the entire industry pretending that's the same thing as relevance. It isn't. Search "Apple." You get the fruit, the company, the watch, and a recipe blog. Your agent picks one at random and calls it retrieval. Your customer calls it broken. Most AI agents shipping right now are duct-taped on top of this. They demo well because demos are easy. They die in production because production is real. HydraDB's Founder Nish (Nishkarsh) said the quiet part out loud — "vector databases suck, similarity is not relevance" — and the demo signups haven't stopped since. He raised $6.5M because he was the first to name what everyone in the room already knew. If your retrieval layer is a flat embedding index, you're not building infrastructure. You're building a liability with a prettier name. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 (00:00) AI Needs Context (01:30) HydraDB Explained (07:41) Vector Search Breaks (09:32) Messaging That Converts (13:41) Writing the Viral Tweet (16:07) Similarity Not Relevance (20:46) POC to Production Gap (35:35) Raising 6.5 Million Fast (39:33) Founder Lesson on Messaging This is a Composio "Agents at Work" podcast, where I chat with founders building the next leap of AI. Follow for more:)

Julia Fedorin

186,958 Aufrufe • vor 2 Monaten

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 Aufrufe • vor 2 Jahren

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 Aufrufe • vor 1 Jahr

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

Chris Long

13,185 Aufrufe • vor 2 Jahren

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,639 Aufrufe • vor 1 Jahr

Is AI being designed to fail? Everyone talks about reasoning. But when given a task, the AI isn't reasoning the way you might expect. It looks at your input, finds the closest match it's seen before, and predicts the most likely next action. That process is called vector similarity search. It's genuinely powerful. It's also not the same thing as understanding what you actually meant. Think of a plumber who hears the word "leak" and starts pulling up floorboards before you've finished the sentence. He's not being careless. He's pattern-matching - that's exactly how he was trained. Your AI agent is doing the same thing. Context is the one thing that gets deprioritized when teams are racing to ship. But without it, you don't have an intelligent agent. You have a very fast guesser. Similarity ≠ relevance. How? Find out with the link in the comments ⬇️

Is AI being designed to fail? Everyone talks about reasoning. But when given a task, the AI isn't reasoning the way you might expect. It looks at your input, finds the closest match it's seen before, and predicts the most likely next action. That process is called vector similarity search. It's genuinely powerful. It's also not the same thing as understanding what you actually meant. Think of a plumber who hears the word "leak" and starts pulling up floorboards before you've finished the sentence. He's not being careless. He's pattern-matching - that's exactly how he was trained. Your AI agent is doing the same thing. Context is the one thing that gets deprioritized when teams are racing to ship. But without it, you don't have an intelligent agent. You have a very fast guesser. Similarity ≠ relevance. How? Find out with the link in the comments ⬇️

Nishkarsh

822,957 Aufrufe • vor 4 Monaten

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Femke Plantinga

120,565 Aufrufe • vor 2 Jahren

The United States Officially Activates AI Warfare Why was a US-based AI company paying Nigerians for personal data and information from their surroundings? That is the bigger question behind Kled’s decision to ban Nigeria from its platform over alleged fraud-related activities. It is not a question about whether or not some users broke the rules. Why was a foreign AI company operating a business model that offered financial incentives in exchange for the personal data and environmental information of Nigerians in the first place? At a time when U.S. tech companies are becoming increasingly tied to the country’s military and intelligence outfits – if there ever even was a time when they weren’t – Africans must treat data extraction as a serious sovereignty issue. AI platforms cannot be considered neutral tools when they are built inside systems that serve foreign political, economic, and military interests. Data is power. In the wrong hands, it becomes weaponizable intelligence. Biggest Mack reports for the Spearhead.

The United States Officially Activates AI Warfare Why was a US-based AI company paying Nigerians for personal data and information from their surroundings? That is the bigger question behind Kled’s decision to ban Nigeria from its platform over alleged fraud-related activities. It is not a question about whether or not some users broke the rules. Why was a foreign AI company operating a business model that offered financial incentives in exchange for the personal data and environmental information of Nigerians in the first place? At a time when U.S. tech companies are becoming increasingly tied to the country’s military and intelligence outfits – if there ever even was a time when they weren’t – Africans must treat data extraction as a serious sovereignty issue. AI platforms cannot be considered neutral tools when they are built inside systems that serve foreign political, economic, and military interests. Data is power. In the wrong hands, it becomes weaponizable intelligence. Biggest Mack reports for the Spearhead.

The Spearhead

90,405 Aufrufe • vor 2 Monaten

For 40 years the file browser hasn’t changed. Today, we’re launching with $8 million in seed funding to rebuild the file browser into something more intelligent, searchable, and delightful. The world is in the middle of a data explosion. We’re generating and using more files than ever, but the apps we’re using to manage our files don’t even understand them. It’s time for file browsers to become useful. When you search for “dog”, it should show you content with dogs in it, not just files with “dog” in the name! When you want to edit, convert, summarize, or organize a file, your browser should do that, too. Your files tell the story of your life, but when you need a specific one, you usually can’t even find it anymore. Why can’t your file browser find it for you, or cross-reference it when you have a question? Prompting can give an LLM a million tokens of context. With Poly, you can give it the next trillion. As long as we can afford it, all new users receive 100GB of free cloud storage. We can’t wait for you to try it out!

For 40 years the file browser hasn’t changed. Today, we’re launching with $8 million in seed funding to rebuild the file browser into something more intelligent, searchable, and delightful. The world is in the middle of a data explosion. We’re generating and using more files than ever, but the apps we’re using to manage our files don’t even understand them. It’s time for file browsers to become useful. When you search for “dog”, it should show you content with dogs in it, not just files with “dog” in the name! When you want to edit, convert, summarize, or organize a file, your browser should do that, too. Your files tell the story of your life, but when you need a specific one, you usually can’t even find it anymore. Why can’t your file browser find it for you, or cross-reference it when you have a question? Prompting can give an LLM a million tokens of context. With Poly, you can give it the next trillion. As long as we can afford it, all new users receive 100GB of free cloud storage. We can’t wait for you to try it out!

Abhay Agarwal

1,776,989 Aufrufe • vor 8 Monaten

For 60 years every computer ever built did the same thing. Stored information and retrieved it on demand. Jensen Huang just explained why that era is over and what replaces it. His framing was the clearest I have ever heard. Think about everything a computer has ever done for you. You wrote a document, you saved it to a file. You took a photo, it saved to a file. You recorded music, it saved to a file. When you wanted it back, you retrieved it from a disc. That is it. That is 60 years of computing. Store and retrieve. He pointed out something hiding in plain sight. We call them data centers. Not computer centers. Because we were not really computing anything meaningful. We were storing data that you retrieved based on what you tapped on your phone. Then he explained what changed. Every time you give AI a prompt today, the response is produced originally in real time. It is not retrieved from storage. It is generated fresh based on your specific context, your specific question, your specific moment. What you see is completely different from what anyone else sees because it was made for you. Jensen said every pixel you see, every word you read, every video you watch in the future will be originally generated. Not retrieved. 60 years of computing was about building better storage and faster retrieval. The entire paradigm flipped overnight. He said this simply: we went from a retrieval industry to a generation industry. And the machines that generate intelligence are what Nvidia builds. The buildings used to be called data centers because they stored data. Nobody has renamed them yet. But the job description changed completely.

For 60 years every computer ever built did the same thing. Stored information and retrieved it on demand. Jensen Huang just explained why that era is over and what replaces it. His framing was the clearest I have ever heard. Think about everything a computer has ever done for you. You wrote a document, you saved it to a file. You took a photo, it saved to a file. You recorded music, it saved to a file. When you wanted it back, you retrieved it from a disc. That is it. That is 60 years of computing. Store and retrieve. He pointed out something hiding in plain sight. We call them data centers. Not computer centers. Because we were not really computing anything meaningful. We were storing data that you retrieved based on what you tapped on your phone. Then he explained what changed. Every time you give AI a prompt today, the response is produced originally in real time. It is not retrieved from storage. It is generated fresh based on your specific context, your specific question, your specific moment. What you see is completely different from what anyone else sees because it was made for you. Jensen said every pixel you see, every word you read, every video you watch in the future will be originally generated. Not retrieved. 60 years of computing was about building better storage and faster retrieval. The entire paradigm flipped overnight. He said this simply: we went from a retrieval industry to a generation industry. And the machines that generate intelligence are what Nvidia builds. The buildings used to be called data centers because they stored data. Nobody has renamed them yet. But the job description changed completely.

Ihtesham Ali

30,059 Aufrufe • vor 1 Monat

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,510 Aufrufe • vor 3 Jahren

how to use Google's NEW open source Design.md + AI Skills to make your startup look like a $100 million company in 1 hour: 1. Design.md is an open source file from Google that captures the soul of a design. Typography, colors, spacing, all in one markdown file. You attach it to your prompt and your agent builds beautiful things every time. 2. Think of it this way. The HTML is the finished dish. The design.md is the recipe. The skills are the ingredients. Put them together and everything you build looks consistent and professional. 3. Don't create a design system from scratch. Find a brand you love. Linear, Stripe, Vercel, whatever resonates. Study it. Use ChatGPT or Claude to help you extract the design language into your own design.md file. 4. Build skills on top of your design.md. A landing page skill. A mobile app skill. A motion design skill. A slide deck skill. Each one references the same design.md so everything looks like it came from the same designer. 5. The biggest mistake people make: they nail one screen and then everything else looks generic. Design.md solves this. One file keeps every page, every format, every medium consistent. 6. Use it across everything. Your landing page. Your app. Your pitch deck. Your promo videos. Same DNA. Same taste. Same system. That's what separates a startup that looks real from one that looks vibe-coded. 7. Build a second brain for design inspiration. When you see something beautiful in the real world or online, capture it. Save it. When you're building something new, reference it. Taste is developed, not downloaded. 8. It's obvious but the difference between a product people trust and a product people bounce from is how it looks and feels. Design.md gives you that edge. you can watch below shoutout to Meng To for coming on The Startup Ideas Podcast (SIP) 🧃 and walking through his full workflow. if you want to use AI to actually build gorgeous designs, you'll want to use see this. watch

GREG ISENBERG

506,491 Aufrufe • vor 2 Monaten

lando talking about how special it feels for him to win constructors’ titles after starting with the team in tougher times and their dominance 🥹❤️‍🩹 “i mean another one is just a great thing. it's another constructor feels the same as the first, because to get the first was quite an achievement if you still look at where we were just three years ago. we've overtaken every team in terms of development. we've outdone them by a long way in terms of development, and in a time when it's almost harder to do than ever, with more restrictions and less wind tunnel time, all of those different things, budget cap, that's really been more in our favor over the last five years comparing to the budget that the other teams could run at. but in a time when it should be more difficult than ever to dominate, that's exactly what the team have done and given us, by a long way, the best car on the grid. i mean, it's always a very nice thing to say. every driver that gets to say that always puts a smile on your face. but we've also done very well as a team in terms of drivers between oscar and myself pushing each other and delivering every single weekend. and you don't see that on any other team. so i think we're also very proud of that as drivers. but for me, i've been with mclaren since i started. especially it was a very different time and different place then to where we are now. so that journey makes it more special to know the downs because that's a lot of what it was back then to see the rise that we've had to see the teamwork, the changes, the atmosphere difference and the leadership from zak, from andrea especially, has turned things around and made us as a team the best in the world. and that's something that many people don't ever get to say”

lando talking about how special it feels for him to win constructors’ titles after starting with the team in tougher times and their dominance 🥹❤️‍🩹 “i mean another one is just a great thing. it's another constructor feels the same as the first, because to get the first was quite an achievement if you still look at where we were just three years ago. we've overtaken every team in terms of development. we've outdone them by a long way in terms of development, and in a time when it's almost harder to do than ever, with more restrictions and less wind tunnel time, all of those different things, budget cap, that's really been more in our favor over the last five years comparing to the budget that the other teams could run at. but in a time when it should be more difficult than ever to dominate, that's exactly what the team have done and given us, by a long way, the best car on the grid. i mean, it's always a very nice thing to say. every driver that gets to say that always puts a smile on your face. but we've also done very well as a team in terms of drivers between oscar and myself pushing each other and delivering every single weekend. and you don't see that on any other team. so i think we're also very proud of that as drivers. but for me, i've been with mclaren since i started. especially it was a very different time and different place then to where we are now. so that journey makes it more special to know the downs because that's a lot of what it was back then to see the rise that we've had to see the teamwork, the changes, the atmosphere difference and the leadership from zak, from andrea especially, has turned things around and made us as a team the best in the world. and that's something that many people don't ever get to say”

ray

35,603 Aufrufe • vor 9 Monaten

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

Santiago

255,433 Aufrufe • vor 1 Jahr

Vector Database by hand ✍️ ~ 10 steps walkthrough below Vector databases are the backbone of Retrieval Augmented Generation (RAG). How do they actually work? Goal: index three sentences, then answer a query by finding the nearest one, filling in every cell yourself. = 1. Given = A dataset of three sentences, three words each. In practice it is millions of them. = 2. Word embeddings = Let us look up each word in an embedding table. Here the vocabulary is 22 words; in practice it is tens of thousands, and the vectors have thousands of dimensions rather than four. = 3. Encoding = We feed the sequence to an encoder, one linear layer and a ReLU, and get one feature vector per word. In practice the encoder is a transformer. = 4. Mean pooling = Let us average across the columns. Three word vectors collapse into one, which is what people mean by a text embedding or a sentence embedding. = 5. Indexing = We multiply by a projection matrix and the four dimensions become two. It is doing the job of a hash: a short representation that is faster to compare, and it is what gets saved in the vector storage. = 6. Process "who are you" = Let us repeat steps 2 to 5 on the second sentence. = 7. Process "who am I" = We do it a third time. The database is now indexed. = 8. Query "am I you" = Let us push the query through the very same pipeline: lookup, encoder, mean pooling, projection, and it lands as a 2D vector in the same space. = 9. Dot products = We transpose the query and multiply, which takes the dot product against every stored vector at once. The dot product is the estimate of similarity. = 10. Nearest neighbour = Let us scan for the largest: 60/9 beats 44/9 and 40/9, so the answer is "who am I". Scanning billions of vectors one at a time is what makes this the slow step in practice, which is why real databases use an approximate nearest neighbour index like HNSW. The outputs: Stored index vectors = [5/3, 2/3], [5/3, 0], [7/3, 2/3] Query vector = [8/3, 2/3] Dot products = 44/9, 40/9, 60/9 Nearest neighbour = "who am I" The takeaway: a vector database is an embedding pipeline, a projection, and a dot product. Every step here is arithmetic you can do in pen, which is worth remembering when the word "database" makes it sound like something else. 💾 Save this post!

Vector Database by hand ✍️ ~ 10 steps walkthrough below Vector databases are the backbone of Retrieval Augmented Generation (RAG). How do they actually work? Goal: index three sentences, then answer a query by finding the nearest one, filling in every cell yourself. = 1. Given = A dataset of three sentences, three words each. In practice it is millions of them. = 2. Word embeddings = Let us look up each word in an embedding table. Here the vocabulary is 22 words; in practice it is tens of thousands, and the vectors have thousands of dimensions rather than four. = 3. Encoding = We feed the sequence to an encoder, one linear layer and a ReLU, and get one feature vector per word. In practice the encoder is a transformer. = 4. Mean pooling = Let us average across the columns. Three word vectors collapse into one, which is what people mean by a text embedding or a sentence embedding. = 5. Indexing = We multiply by a projection matrix and the four dimensions become two. It is doing the job of a hash: a short representation that is faster to compare, and it is what gets saved in the vector storage. = 6. Process "who are you" = Let us repeat steps 2 to 5 on the second sentence. = 7. Process "who am I" = We do it a third time. The database is now indexed. = 8. Query "am I you" = Let us push the query through the very same pipeline: lookup, encoder, mean pooling, projection, and it lands as a 2D vector in the same space. = 9. Dot products = We transpose the query and multiply, which takes the dot product against every stored vector at once. The dot product is the estimate of similarity. = 10. Nearest neighbour = Let us scan for the largest: 60/9 beats 44/9 and 40/9, so the answer is "who am I". Scanning billions of vectors one at a time is what makes this the slow step in practice, which is why real databases use an approximate nearest neighbour index like HNSW. The outputs: Stored index vectors = [5/3, 2/3], [5/3, 0], [7/3, 2/3] Query vector = [8/3, 2/3] Dot products = 44/9, 40/9, 60/9 Nearest neighbour = "who am I" The takeaway: a vector database is an embedding pipeline, a projection, and a dot product. Every step here is arithmetic you can do in pen, which is worth remembering when the word "database" makes it sound like something else. 💾 Save this post!

Tom Yeh

35,070 Aufrufe • vor 4 Tagen

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,664 Aufrufe • vor 1 Jahr

The highest leverage thing you have to optimize for AI agents is their context. If you’re not getting enough out of AI agents, you probably have a context problem. By default the AI model knows absolutely nothing relevant to perform the task successfully. The whole job of context engineering is to prioritize what the model has as information to work with. Small deviations in the goals, instructions, domain understanding, or underlying data you supply the model will lead to wildly different outcomes. This is where you should spend almost all your time and energy in a workflow. We will start to religiously document more of our workflows and business processes, create clearer goals for people and agents, and ensure our tech infra is setup to get AI agents the right information. The lessons are currently being learned the fastest with coding agents, but the same lessons will apply to every field of knowledge work going forward. dex has a great talk on this that’s absolutely worth your time to watch.

The highest leverage thing you have to optimize for AI agents is their context. If you’re not getting enough out of AI agents, you probably have a context problem. By default the AI model knows absolutely nothing relevant to perform the task successfully. The whole job of context engineering is to prioritize what the model has as information to work with. Small deviations in the goals, instructions, domain understanding, or underlying data you supply the model will lead to wildly different outcomes. This is where you should spend almost all your time and energy in a workflow. We will start to religiously document more of our workflows and business processes, create clearer goals for people and agents, and ensure our tech infra is setup to get AI agents the right information. The lessons are currently being learned the fastest with coding agents, but the same lessons will apply to every field of knowledge work going forward. dex has a great talk on this that’s absolutely worth your time to watch.

Aaron Levie

87,086 Aufrufe • vor 11 Monaten

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,313 Aufrufe • vor 1 Jahr