Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages... of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥show more

Femke Plantinga

13,204 subscribers

19,793 просмотров • 1 год назад •via X (Twitter)

Образование Новости и политика Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 9

Фото профиля Laurent Sorber

Laurent Sorber1 год назад

No need to choose: you can apply late chunking (to pool token embeddings) _and_ semantic chunking (to partition the document) for even better retrieval results! An example implementation that applies both techniques:

Фото профиля dontreadonmeow

dontreadonmeow1 год назад

I thought this was going to be a video about cats getting fat later in life…late-chonking

Фото профиля Femke Plantinga

Femke Plantinga1 год назад

hahaha

Фото профиля Data knight

Data knight1 год назад

Thanks for sharing

Фото профиля Femke Plantinga

Femke Plantinga1 год назад

😁 You're welcome!

Фото профиля Tommy Xiao

Tommy Xiao1 год назад

thanks share

Фото профиля 八一菜刀

八一菜刀1 год назад

Better block to solve the problem of context loss. For context information, I think the problem is that the user‘s problem may be scattered in various parts of the article, and it needs to be answered after reading the full text. This situation seems difficult to solve?

Фото профиля Deedax Inc.

Deedax Inc.1 год назад

Thanks twitter algorithm for putting this in my feed. Great share @femke_plantinga Will late chunking still work for very very long documents?

Фото профиля mert⚡️

mert⚡️1 год назад

Thank you for explanations! 😎

Похожие видео

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

Femke Plantinga

29,660 просмотров • 1 год назад

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀 Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for each patch retain contextual information, similarly to text embeddings in methods like ColBERT. During retrieval, user queries are embedded in the same space, and then compared to document patches using the MaxSim operator. ColPali recipe POC Weaviate AI Database: ColPali paper: As always, shoutout to the awesome Daniel Williams for the help! 💙

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀 Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for each patch retain contextual information, similarly to text embeddings in methods like ColBERT. During retrieval, user queries are embedded in the same space, and then compared to document patches using the MaxSim operator. ColPali recipe POC Weaviate AI Database: ColPali paper: As always, shoutout to the awesome Daniel Williams for the help! 💙

Victoria Slocum

67,833 просмотров • 1 год назад

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

972,565 просмотров • 6 месяцев назад

Finally, a RAG solution that works with complex documents! Real-world documents can be messy, filled with text, tables, images, and intricate flow charts. Traditional parsing and chunking methods struggle to handle these. So, what’s the solution? We need smart techniques that can intuitively chunk relevant context and understand what’s inside each chunk, whether it's text, images, or diagrams. In this video, I’ll walk you through a breakthrough technique for extracting structured information from complex documents. It's unlike any other technique you've seen before.✨ It takes any unstructured (text, tables, images, flow-charts) input and parses it into a JSON format that LLMs can easily process. I used EyeLevel.AI's GroundX platform for this – a powerful tool that allows you to build a RAG application in just 3 steps. It also comes with a nice Python SDK and can be easily deployed on-premise (K8s cluster)! Try it yourself:

Finally, a RAG solution that works with complex documents! Real-world documents can be messy, filled with text, tables, images, and intricate flow charts. Traditional parsing and chunking methods struggle to handle these. So, what’s the solution? We need smart techniques that can intuitively chunk relevant context and understand what’s inside each chunk, whether it's text, images, or diagrams. In this video, I’ll walk you through a breakthrough technique for extracting structured information from complex documents. It's unlike any other technique you've seen before.✨ It takes any unstructured (text, tables, images, flow-charts) input and parses it into a JSON format that LLMs can easily process. I used EyeLevel.AI's GroundX platform for this – a powerful tool that allows you to build a RAG application in just 3 steps. It also comes with a nice Python SDK and can be easily deployed on-premise (K8s cluster)! Try it yourself:

Akshay 🚀

102,660 просмотров • 1 год назад

An underrated issue with document parsing for RAG / agent use cases is dealing with multi-page tables - sometimes a big table spills over into multiple pages. This breaks chunking algorithms that generally operate at the page-level or smaller, and causes LLMs to lose the full view of the data. With LlamaParse Continuous Mode (in beta), you can now parse a document with multi-page tables and join them into a single table! This means you can now: 💡 Do contiguous chunking for RAG use cases OR 💡 Parse the table for text-to-SQL Check out our blog post highlighting this feature. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron : Signup here: It's in beta, let us know your feedback!

An underrated issue with document parsing for RAG / agent use cases is dealing with multi-page tables - sometimes a big table spills over into multiple pages. This breaks chunking algorithms that generally operate at the page-level or smaller, and causes LLMs to lose the full view of the data. With LlamaParse Continuous Mode (in beta), you can now parse a document with multi-page tables and join them into a single table! This means you can now: 💡 Do contiguous chunking for RAG use cases OR 💡 Parse the table for text-to-SQL Check out our blog post highlighting this feature. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron : Signup here: It's in beta, let us know your feedback!

Jerry Liu

24,245 просмотров • 1 год назад

MICROSOFT OPEN SOURCED A 7B PARAMETER MODEL THAT TRANSCRIBES 60 MINUTES OF AUDIO IN A SINGLE PASS and it's completely free VIBEVOICE ASR no chunking, no context loss, full speaker diarization baked in not just speech to text..not a basic wrapper who spoke, when they spoke, exactly what they said..all in one shot and it handles the hard stuff too..50+ languages, custom hotwords, long form audio that breaks every other tool the model doesn't know what "context window" means apparently Available on macOS and Windows right now. Free to use. Free to fine tune. Free to build on.

MICROSOFT OPEN SOURCED A 7B PARAMETER MODEL THAT TRANSCRIBES 60 MINUTES OF AUDIO IN A SINGLE PASS and it's completely free VIBEVOICE ASR no chunking, no context loss, full speaker diarization baked in not just speech to text..not a basic wrapper who spoke, when they spoke, exactly what they said..all in one shot and it handles the hard stuff too..50+ languages, custom hotwords, long form audio that breaks every other tool the model doesn't know what "context window" means apparently Available on macOS and Windows right now. Free to use. Free to fine tune. Free to build on.

Rahul

1,373,048 просмотров • 2 месяцев назад

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 просмотров • 1 год назад

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,505 просмотров • 1 год назад

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!

Akshay 🚀

45,206 просмотров • 3 месяцев назад

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

Prince Canuma

81,345 просмотров • 11 месяцев назад

someone just made a tool that indexes code by meaning, not string matching this project is the first time i have seen someone treat code like a knowledge graph. sometimes i feel they are the only ones actually pushing cs most rag pipelines for agents are embarrassing and we should be chunking by semantic scope, imagine your agent's accuracy if this was used instead of simple grep google spent millions on kythe to index code and devs just did it better as a weekend hobby project. never bet against a functional programmer with a strict type system.

someone just made a tool that indexes code by meaning, not string matching this project is the first time i have seen someone treat code like a knowledge graph. sometimes i feel they are the only ones actually pushing cs most rag pipelines for agents are embarrassing and we should be chunking by semantic scope, imagine your agent's accuracy if this was used instead of simple grep google spent millions on kythe to index code and devs just did it better as a weekend hobby project. never bet against a functional programmer with a strict type system.

𝕱𝖔𝖗𝕷𝖔𝖔𝖕

25,587 просмотров • 5 месяцев назад

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Traditional (SQL) databases rely primarily on keyword-based searches to retrieve information. These searches match the exact words or phrases in your query to the text stored in the database. While effective for many applications, this method has limitations when it comes to understanding context or finding relevant information that doesn’t include the exact keywords. Hybrid search combines the strengths of traditional keyword-based BM25 search with the advanced capabilities of semantic search. To effectively implement a hybrid search, a vector database is essential. Vector databases go beyond just words; they understand the meaning behind the data. They transform data such as text, images, or audio into numerical representations called vectors. These vector embeddings enable the database to find similar items, even if they don't share exact keywords. When you integrate hybrid search with Retrieval-Augmented Generation (RAG) systems, you can achieve higher accuracy in retrieved context and better output in generated responses. Learn more about RAG systems in this video with Victoria Slocum:

Femke Plantinga

140,618 просмотров • 2 лет назад

⚡️We are excited to announce that our new no-code Enterprise Platform is NOW available in private beta! As RAG apps advance from prototype to production we’ve been overwhelmed by requests for an enterprise grade solution to provide these applications with the data they need. Designed to make it easy to get your data #RAGready, our Platform can preprocess more than 25 file types and soon will be fully #multimodal, also able to ingest audio, video and image files. We ship with a baseline suite of source connectors, including Amazon Web Services S3, Microsoft Azure Blob Storage, OneDrive, SFTP, Databricks Delta Table, Google Drive, Salesforce, Elastic, OpenSearch, and Google Cloud storage with many more fast following. Platform transforms your documents into a standardized JSON schema, broken down into semantically coherent elements allowing you to reconstruct your document in the manner most useful to you. Want only the narrative text but not the headers and footers? This is entirely configurable through the UI. Additionally, we generate more than 30 types of metadata for each element to make it easy to curate the data being written downstream and to support metadata filtering during retrieval. Smart chunking and the ability to choose from a range of embedding models are in from launch, delivering a turnkey solution for chunk and embedding experimentation. As for destination connectors, we've got that covered too, with Amazon Web Services S3, Pinecone, Chroma , Weaviate AI Database, Google Cloud storage, MongoDB, Microsoft Azure cognitive search, PostgreSQL, Elastic, OpenSearch, and Databricks Delta Table. And of course, all of this can be scheduled to keep your data continuously hydrated. The private-beta is live today! Sign-up to get access and come build the future of LLM data foundations with us: 🚀 #ETLforLLMs #AI #DataPreprocessing #DataScience #DataTransformation #LLMs #ETL #ML #PreppingData #MachineLearning #RAG #Engineer #Unstructured #Unstructuredio #RetrievalAugmentedGeneration #multimodal #AIJobs

⚡️We are excited to announce that our new no-code Enterprise Platform is NOW available in private beta! As RAG apps advance from prototype to production we’ve been overwhelmed by requests for an enterprise grade solution to provide these applications with the data they need. Designed to make it easy to get your data #RAGready, our Platform can preprocess more than 25 file types and soon will be fully #multimodal, also able to ingest audio, video and image files. We ship with a baseline suite of source connectors, including Amazon Web Services S3, Microsoft Azure Blob Storage, OneDrive, SFTP, Databricks Delta Table, Google Drive, Salesforce, Elastic, OpenSearch, and Google Cloud storage with many more fast following. Platform transforms your documents into a standardized JSON schema, broken down into semantically coherent elements allowing you to reconstruct your document in the manner most useful to you. Want only the narrative text but not the headers and footers? This is entirely configurable through the UI. Additionally, we generate more than 30 types of metadata for each element to make it easy to curate the data being written downstream and to support metadata filtering during retrieval. Smart chunking and the ability to choose from a range of embedding models are in from launch, delivering a turnkey solution for chunk and embedding experimentation. As for destination connectors, we've got that covered too, with Amazon Web Services S3, Pinecone, Chroma , Weaviate AI Database, Google Cloud storage, MongoDB, Microsoft Azure cognitive search, PostgreSQL, Elastic, OpenSearch, and Databricks Delta Table. And of course, all of this can be scheduled to keep your data continuously hydrated. The private-beta is live today! Sign-up to get access and come build the future of LLM data foundations with us: 🚀 #ETLforLLMs #AI #DataPreprocessing #DataScience #DataTransformation #LLMs #ETL #ML #PreppingData #MachineLearning #RAG #Engineer #Unstructured #Unstructuredio #RetrievalAugmentedGeneration #multimodal #AIJobs

Unstructured

21,874 просмотров • 2 лет назад

In our latest benchmark, a traditional text-to-SQL approach required more than 4,400 schema-related tokens per query just to provide the model with enough context to understand a 28-table insurance data model. Think about that for a second: before AI can interpret the question, it first has to read the instruction manual. What we found is that when business context is already organized and governed, AI doesn't need to repeatedly rediscover how your data works. We created this white paper to help companies direct their attention to where costs are piling up and how Mosaic directly reduces them. Read the benchmark here: #EnterpriseAI #DataAnalytics #SemanticLayer #BusinessIntelligence

In our latest benchmark, a traditional text-to-SQL approach required more than 4,400 schema-related tokens per query just to provide the model with enough context to understand a 28-table insurance data model. Think about that for a second: before AI can interpret the question, it first has to read the instruction manual. What we found is that when business context is already organized and governed, AI doesn't need to repeatedly rediscover how your data works. We created this white paper to help companies direct their attention to where costs are piling up and how Mosaic directly reduces them. Read the benchmark here: #EnterpriseAI #DataAnalytics #SemanticLayer #BusinessIntelligence

Strategy

11,097 просмотров • 1 месяц назад

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,329 просмотров • 2 лет назад

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

Victoria Slocum

15,333 просмотров • 1 год назад

What started as building a personal taste.md skill for myself, turned into building a pipeline to create any taste as a skill. The most important piece is references. This is where you should spend time. If the references suck, so does the skill. I find that references cropped tightly on details in high resolution work the best. Each image gets analyzed by both Opus 4.7 and GPT 5.5. The analysis is based on why the reference is successful as a piece of design - not what it does functionally. Using two models helps rule out biases and gaps from each. The models focus on layout, spacing, typography, rhythm, composition, hierarchy, etc. At the end, each image has: reference-01/ - opus-4-7-analysis.md - gpt-5-5-analysis.md Then we fuse them together using GPT 5.5 - but the md files are anonymized so 5.5 doesn't prefer itself. reference-01/ - fused-analysis.md reference-02/ - fused-analysis.md etc. After fusion, we have one synthesized analysis per reference. Now the goal is to combine all of those into a single rule set. This is where chunking matters. If you ask one model to combine 100 image analyses at once, the result becomes too broad. It summarizes instead of preserving the granular design rules we want. Instead we chunk the fused analyses into smaller groups. Each group gets merged into a chunk-level synthesis, usually from around 6 to 8 image notes at a time. Then one final model pass fuses those chunks into a single md rule set. Finally, using the rule set, we write a skill of concrete instructions. It enforces constraints, uses imperative wording, and avoids vague taste words.

What started as building a personal taste.md skill for myself, turned into building a pipeline to create any taste as a skill. The most important piece is references. This is where you should spend time. If the references suck, so does the skill. I find that references cropped tightly on details in high resolution work the best. Each image gets analyzed by both Opus 4.7 and GPT 5.5. The analysis is based on why the reference is successful as a piece of design - not what it does functionally. Using two models helps rule out biases and gaps from each. The models focus on layout, spacing, typography, rhythm, composition, hierarchy, etc. At the end, each image has: reference-01/ - opus-4-7-analysis.md - gpt-5-5-analysis.md Then we fuse them together using GPT 5.5 - but the md files are anonymized so 5.5 doesn't prefer itself. reference-01/ - fused-analysis.md reference-02/ - fused-analysis.md etc. After fusion, we have one synthesized analysis per reference. Now the goal is to combine all of those into a single rule set. This is where chunking matters. If you ask one model to combine 100 image analyses at once, the result becomes too broad. It summarizes instead of preserving the granular design rules we want. Instead we chunk the fused analyses into smaller groups. Each group gets merged into a chunk-level synthesis, usually from around 6 to 8 image notes at a time. Then one final model pass fuses those chunks into a single md rule set. Finally, using the rule set, we write a skill of concrete instructions. It enforces constraints, uses imperative wording, and avoids vague taste words.

Jaytel

59,652 просмотров • 2 месяцев назад

‘pip install elysia’ and ‘elysia start’ That’s literally all it takes to get the most advanced open source agentic RAG app running on your data. We just released 𝗘𝗹𝘆𝘀𝗶𝗮, our open source, agentic RAG framework and an app so cool needed a cool video to go with it. Watch the full video: In the video, we go through these components of Elysia: 1️⃣ 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Instead of giving agents access to all tools at once, Elysia uses a pre-defined web of nodes with corresponding actions. Each decision agent has global context awareness. 2️⃣ 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Seven different data display formats including tables, e-commerce product cards, GitHub tickets, and charts. The system automatically choses the best display format. 3️⃣ 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗘𝘅𝗽𝗲𝗿𝘁𝗶𝘀𝗲: Unlike naive RAG systems that perform blind vector searches, Elysia analyzes your collections to understand data structure and meaning before performing queries. 𝗢𝘁𝗵𝗲𝗿 𝗖𝗼𝗼𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: • 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗦𝘆𝘀𝘁𝗲𝗺: Uses positive examples as few-shot demonstrations for smaller, faster models • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Dynamically chunks documents at query time instead of pre-chunking • 𝗠𝘂𝗹𝘁𝗶-𝗠𝗼𝗱𝗲𝗹 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆: Routes different tasks to appropriate model sizes based on complexity …And also how to get started with your own data! The entire project is open source and designed with customization in mind. You can use it as-is for effective data searching, or install the Python package to create custom tools for whatever agentic AI purposes you need. Big kudos to Edward for the vision, filming, and editing this masterpiece

‘pip install elysia’ and ‘elysia start’ That’s literally all it takes to get the most advanced open source agentic RAG app running on your data. We just released 𝗘𝗹𝘆𝘀𝗶𝗮, our open source, agentic RAG framework and an app so cool needed a cool video to go with it. Watch the full video: In the video, we go through these components of Elysia: 1️⃣ 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Instead of giving agents access to all tools at once, Elysia uses a pre-defined web of nodes with corresponding actions. Each decision agent has global context awareness. 2️⃣ 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Seven different data display formats including tables, e-commerce product cards, GitHub tickets, and charts. The system automatically choses the best display format. 3️⃣ 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗘𝘅𝗽𝗲𝗿𝘁𝗶𝘀𝗲: Unlike naive RAG systems that perform blind vector searches, Elysia analyzes your collections to understand data structure and meaning before performing queries. 𝗢𝘁𝗵𝗲𝗿 𝗖𝗼𝗼𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: • 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗦𝘆𝘀𝘁𝗲𝗺: Uses positive examples as few-shot demonstrations for smaller, faster models • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Dynamically chunks documents at query time instead of pre-chunking • 𝗠𝘂𝗹𝘁𝗶-𝗠𝗼𝗱𝗲𝗹 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆: Routes different tasks to appropriate model sizes based on complexity …And also how to get started with your own data! The entire project is open source and designed with customization in mind. You can use it as-is for effective data searching, or install the Python package to create custom tools for whatever agentic AI purposes you need. Big kudos to Edward for the vision, filming, and editing this masterpiece

Victoria Slocum

45,497 просмотров • 11 месяцев назад

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

Isaac Flath

40,212 просмотров • 2 месяцев назад