Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

We just hit 94% accuracy on RobustQA, beating industry standards. Traditional RAG chunks perfectly sized documents into small pieces, destroying context. We preserve complete documents instead. Better accuracy, complete context, more efficient storage.

webAI

5,588 subscribers

542,761 просмотров • 11 месяцев назад •via X (Twitter)

Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Check it out: An entire lesson from BloomTech's AI for Developer Productivity course! Fundamentals of RAG (Retrieval-Augmented-Generation). How we enhance accuracy and reliability of generative AI models. This is the foundation we build on to give AI important context.

Check it out: An entire lesson from BloomTech's AI for Developer Productivity course! Fundamentals of RAG (Retrieval-Augmented-Generation). How we enhance accuracy and reliability of generative AI models. This is the foundation we build on to give AI important context.

Austen Allred

79,507 просмотров • 2 лет назад

Launched Activeloop L0 on Y Combinator. L0 turns multimodal documents into cited answers with state-of-the-art accuracy. Check out the link below!

Launched Activeloop L0 on Y Combinator. L0 turns multimodal documents into cited answers with state-of-the-art accuracy. Check out the link below!

Activeloop

12,238 просмотров • 1 год назад

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Femke Plantinga

19,718 просмотров • 1 год назад

Full speech with English subtitles (Disclosure: GPT translated, no promises on complete accuracy)

Full speech with English subtitles (Disclosure: GPT translated, no promises on complete accuracy)

OSINTtechnical

114,189 просмотров • 1 год назад

Box AI showcases Google Gemini 2.5 Flash’s speed and accuracy across diverse documents—from invoices to research papers and syllabi—delivering quick, reliable answers on enterprise data with efficient reasoning and processing.

Box AI showcases Google Gemini 2.5 Flash’s speed and accuracy across diverse documents—from invoices to research papers and syllabi—delivering quick, reliable answers on enterprise data with efficient reasoning and processing.

Box

19,807 просмотров • 1 год назад

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

971,934 просмотров • 5 месяцев назад

🚀 Introducing Genspark Hub—a dedicated space where you keep all relevant files in one place. Every task in the hub shares the same context—Genspark automatically references your files, past work, and custom instructions. No more scattered documents, no more repeating context — just projects that get smarter over time. Ready to get more organized and efficient? Try Genspark Hub:

🚀 Introducing Genspark Hub—a dedicated space where you keep all relevant files in one place. Every task in the hub shares the same context—Genspark automatically references your files, past work, and custom instructions. No more scattered documents, no more repeating context — just projects that get smarter over time. Ready to get more organized and efficient? Try Genspark Hub:

Genspark

13,629 просмотров • 8 месяцев назад

you wanted more no context players, we give you more no context players 🤝

you wanted more no context players, we give you more no context players 🤝

ATP Tour

144,455 просмотров • 9 месяцев назад

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

There’s been two papers released in the past couple months, one by Google and one by NVIDIA, that argue that ordering the documents retrieved by RAG systems can enhance performance. However, they both give two different strategies on HOW these documents should be ordered 🤔 Both papers agree on two main points: 1️⃣ There’s a fundamental issue in RAG - as more documents are retrieved, more irrelevant context (e.g., hard negatives) are introduced, which leads to confusion for the LLM and eventually degrades the quality of the generated output. This is called an inverted-U performance curve. 2️⃣ Ordering the retrieved documents is a key lever for optimizing RAG performance. Google Cloud researchers proposed ordering results based on relevance scores: The authors in this paper argue for relevance-based reordering, or ordering the retrieved chunks based on their similarity scores, so the most relevant documents are at the beginning and the end of the inputs to counter the “lost in the middle” effect. NVIDIA researchers proposed ordering results based on the original sequence of document chunks: The authors of this paper argue for Order-Preserving Reordering, or Order-Preserve RAG (OP-RAG), to maintain the logically coherent content flow of the document. So they preserved the original order of retrieved document chunks in the source text, instead of ranking them by relevance scores. So which one is right? It probably depends on the specific use case and dataset - relevance-based reordering could perform better in tasks where you need fast access to the most critical information (e.g., fact retrieval, QA systems), while order-preserving RAG might be better where you need to understand the sequential structure of information (e.g., narrative or legal documents). There are still so many uncertainties in AI - we don’t actually know what we’re doing, and it takes awhile to figure out the best strategies for most things! Excited to see more research about this.

Victoria Slocum

15,213 просмотров • 1 год назад

Crazy accuracy for what we just call a Monday now.

Crazy accuracy for what we just call a Monday now.

MAGA Cult Slayer🦅🇺🇸

23,859 просмотров • 10 дней назад

token efficiency isn’t just about speed, it’s also about accuracy. less noise in context == fewer hallucinations == better results. at we’ve been using call graphs + static analysis to give LLMs the best possible code context. big win not just for codegen, but for generating docs that stay true to the code itself.

token efficiency isn’t just about speed, it’s also about accuracy. less noise in context == fewer hallucinations == better results. at we’ve been using call graphs + static analysis to give LLMs the best possible code context. big win not just for codegen, but for generating docs that stay true to the code itself.

ayman nadeem

15,896 просмотров • 9 месяцев назад

Create documents that actually sound like you. Use Gemini in Google Docs to pull from your work context and generate polished, ready-to-share documents. ✅

Create documents that actually sound like you. Use Gemini in Google Docs to pull from your work context and generate polished, ready-to-share documents. ✅

Google Workspace

22,507 просмотров • 3 месяцев назад

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

Femke Plantinga

29,660 просмотров • 1 год назад

"We didn't get into this to build more infrastructure layers. We got into this to provide efficiency to the traditional finance system." - Colin Cunningham Chainlink Labs At #HederaCon2026, digital asset leaders reflected on what the industry is here to do: make current systems more efficient, trusted and connected.

"We didn't get into this to build more infrastructure layers. We got into this to provide efficiency to the traditional finance system." - Colin Cunningham Chainlink Labs At #HederaCon2026, digital asset leaders reflected on what the industry is here to do: make current systems more efficient, trusted and connected.

Hedera

22,708 просмотров • 1 месяц назад

Today, we’re excited to announce Deep Extract. We’re pushing the boundary of complex document extraction by utilizing an agent harness approach, iterating and verifying outputs until they are at human-level accuracy. The results speak for themselves: we’ve handled documents exceeding 2,500 pages, extracting a total of 30M+ fields in the last couple weeks with results in our production beta reaching 99-100% accuracy. It’s particularly effective on complex documents with long lists, such as invoice line items, brokerage statement transactions, equipment manifests, and more. See more on how we built it, how to use it, and more👇

Today, we’re excited to announce Deep Extract. We’re pushing the boundary of complex document extraction by utilizing an agent harness approach, iterating and verifying outputs until they are at human-level accuracy. The results speak for themselves: we’ve handled documents exceeding 2,500 pages, extracting a total of 30M+ fields in the last couple weeks with results in our production beta reaching 99-100% accuracy. It’s particularly effective on complex documents with long lists, such as invoice line items, brokerage statement transactions, equipment manifests, and more. See more on how we built it, how to use it, and more👇

Reducto

23,949,618 просмотров • 2 месяцев назад

Your AI agents are giving you generic outputs because they have zero context... I just built a RAG workflow that ingests your data and creates a searchable database that your AI can reference. Now your workflows will give precise, contextual responses instead of generic slop. Follow + comment "RAG" and I'll send you the complete n8n workflow.

Your AI agents are giving you generic outputs because they have zero context... I just built a RAG workflow that ingests your data and creates a searchable database that your AI can reference. Now your workflows will give precise, contextual responses instead of generic slop. Follow + comment "RAG" and I'll send you the complete n8n workflow.

Tom

51,469 просмотров • 1 год назад

⚛️ React optimization tip: break your context into chunks so only the updated parts re-render ↓

⚛️ React optimization tip: break your context into chunks so only the updated parts re-render ↓

George Moller

42,898 просмотров • 1 год назад

Introducing HydraDB. The graph native context infrastructure for agents. Purpose built to deliver precise context & observability into why agents act the way they do. We've always believed graphs are the best way to manage AI context, but they've been too expensive to scale or impractical for storing full context. Until now. HydraDB combines in memory, NVMe, and object storage into a single graph layer, making context delivery faster, cheaper, and more precise. We want context delivery to be extremely fast, 1000x cheap, and highly precise. Give your agents a brain.

Introducing HydraDB. The graph native context infrastructure for agents. Purpose built to deliver precise context & observability into why agents act the way they do. We've always believed graphs are the best way to manage AI context, but they've been too expensive to scale or impractical for storing full context. Until now. HydraDB combines in memory, NVMe, and object storage into a single graph layer, making context delivery faster, cheaper, and more precise. We want context delivery to be extremely fast, 1000x cheap, and highly precise. Give your agents a brain.

Nishkarsh

2,291,203 просмотров • 1 месяц назад