正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Finally, a RAG solution that works with complex documents! Real-world documents can be messy, filled with text, tables, images, and intricate flow charts. Traditional parsing and chunking methods struggle to handle these. So, what’s the solution? We need smart techniques that can intuitively chunk relevant context and understand what’s... inside each chunk, whether it's text, images, or diagrams. In this video, I’ll walk you through a breakthrough technique for extracting structured information from complex documents. It's unlike any other technique you've seen before.✨ It takes any unstructured (text, tables, images, flow-charts) input and parses it into a JSON format that LLMs can easily process. I used EyeLevel.AI's GroundX platform for this – a powerful tool that allows you to build a RAG application in just 3 steps. It also comes with a nice Python SDK and can be easily deployed on-premise (K8s cluster)! Try it yourself:show more

Akshay 🚀

268,509 subscribers

102,660 次观看 • 1 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

10 条评论

Daily Dose of Data Science 的头像

Daily Dose of Data Science1 年前

This is amazing! 💯 It is indeed hard to extract data from complex docs!

Akshay 🚀 的头像

Akshay 🚀1 年前

That’s what I loved about it!!

Avi Kumar Talaviya 的头像

Avi Kumar Talaviya1 年前

Seems. amazing solution for advanced rag

Akshay 🚀 的头像

Akshay 🚀1 年前

Absolutely! 💯

Avi Chawla 的头像

Avi Chawla1 年前

Thanks for the walkthrough. Parsing looks quite incredible.

Akshay 🚀 的头像

Akshay 🚀1 年前

Absolutely! 💯

Shivang Agarwal 的头像

Shivang Agarwal1 年前

Amazing, looks like GroundX are doing object detection on steroids!

Akshay 🚀 的头像

Akshay 🚀1 年前

Exactly and then state of the art parsing what's inside that object! 🔥

Shashank Gupta. 的头像

Shashank Gupta.1 年前

This was amazing. Signed up on GroundX.

Akshay 🚀 的头像

Akshay 🚀1 年前

Awesome! Also check the SDK that devs like us care about a lot! :)

相关视频

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

How do professional RAG applications chunk their text? Let’s cover some Advanced Chunking Techniques. In our latest video, we cover simple chunking methods like splitting documents into sentences or sections. But these methods often miss out on ensuring each chunk has independent meaning. Semantic chunking solved exactly this! By measuring the semantic similarity between sentences using vector embeddings, we can combine similar sentences into meaningful chunks. With LLM-based chunking, large language models help break down text effectively, although it can be slow and costly. And what about the newest Late Chunking? Which keeps context intact across chunks—more on that soon. 👀 In this video, we cover these advanced techniques in detail. Watch it to learn more. A big shoutout to Daniel Williams for helping create this video! 💚

Femke Plantinga

29,660 次观看 • 1 年前

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Femke Plantinga

19,793 次观看 • 1 年前

Building a RAG system that works with real-life documents is crazy hard. Why is nobody talking about this? (I'll show you what a complex document looks like in the attached video. Good luck with the 10-line code demos if you want to deal with this.) All I see online are "how to talk to a PDF" demos that won't take you far. If you think that's all you need, you won't like what happens when you try. In the video below, I'll show you what it takes to build a system that processes huge numbers of documents without losing accuracy. There are a few tricks here that I'm sure will impress you. The magic is happening in three phases: • In the way we process the documents • In the way we chunk the content • In the way we augment and store the chunks This is all built-in in GroundX, EyeLevel.AI's out-of-the-box RAG system. You can use it as a SaaS or install it in your own Kubernetes cluster and run it locally. If you have seen any other RAG system doing what I do in the attached video, please, let me know. I'd love to check them out. Disclaimer: I work with the team at EyeLevel. My advice is to take any document you have lying around, go to their site, create an account, and try them out. Here is their website:

Building a RAG system that works with real-life documents is crazy hard. Why is nobody talking about this? (I'll show you what a complex document looks like in the attached video. Good luck with the 10-line code demos if you want to deal with this.) All I see online are "how to talk to a PDF" demos that won't take you far. If you think that's all you need, you won't like what happens when you try. In the video below, I'll show you what it takes to build a system that processes huge numbers of documents without losing accuracy. There are a few tricks here that I'm sure will impress you. The magic is happening in three phases: • In the way we process the documents • In the way we chunk the content • In the way we augment and store the chunks This is all built-in in GroundX, EyeLevel.AI's out-of-the-box RAG system. You can use it as a SaaS or install it in your own Kubernetes cluster and run it locally. If you have seen any other RAG system doing what I do in the attached video, please, let me know. I'd love to check them out. Disclaimer: I work with the team at EyeLevel. My advice is to take any document you have lying around, go to their site, create an account, and try them out. Here is their website:

Santiago

75,788 次观看 • 1 年前

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

New short course: Building Multimodal Search and RAG", by Weaviate AI Database's Sebastia(N_) Witalec ✊🏽✊🏾✊🏿. Contrastive learning is used to train models to map vectors into an embedding space by pulling similar concepts closer together and pushing dissimilar concepts away from each other. This technique is also used to train multimodal embedding models that capture semantic similarity across different modalities like text, images, and audio. These multimodal embeddings can be used to build multimodal search and RAG systems. In this course, you'll learn how contrastive learning works, and how to add multimodality to RAG – so your models can draw on diverse, relevant context to answer questions. For example, a query about a financial report might synthesize information from text snippets, graphs, tables, and slides. You will also learn how visual instruction tuning lets you integrate image understanding into language models, and build a multi-vector recommender system using Weaviate’s open-source vector database. Please sign up here:

Andrew Ng

104,371 次观看 • 2 年前

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

58,769 次观看 • 10 个月前

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

LlamaIndex 🦙

143,136 次观看 • 2 年前

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,624 次观看 • 1 年前

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 次观看 • 1 年前

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

Jerry Liu

76,293 次观看 • 2 年前

An underrated issue with document parsing for RAG / agent use cases is dealing with multi-page tables - sometimes a big table spills over into multiple pages. This breaks chunking algorithms that generally operate at the page-level or smaller, and causes LLMs to lose the full view of the data. With LlamaParse Continuous Mode (in beta), you can now parse a document with multi-page tables and join them into a single table! This means you can now: 💡 Do contiguous chunking for RAG use cases OR 💡 Parse the table for text-to-SQL Check out our blog post highlighting this feature. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron : Signup here: It's in beta, let us know your feedback!

An underrated issue with document parsing for RAG / agent use cases is dealing with multi-page tables - sometimes a big table spills over into multiple pages. This breaks chunking algorithms that generally operate at the page-level or smaller, and causes LLMs to lose the full view of the data. With LlamaParse Continuous Mode (in beta), you can now parse a document with multi-page tables and join them into a single table! This means you can now: 💡 Do contiguous chunking for RAG use cases OR 💡 Parse the table for text-to-SQL Check out our blog post highlighting this feature. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron : Signup here: It's in beta, let us know your feedback!

Jerry Liu

24,245 次观看 • 1 年前

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

This will retire 90% of RAG systems with dignity (and a sad song playlist). Powered by DSPy: If you're still building "text in, text out" chatbots that only perform blind vector and text searches, you're not gonna make it! My team just dropped Elysia, and it's not just an incremental successor to Verba… It's a whole rethink of how we interact with our data using AI. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗘𝗹𝘆𝗶𝘀𝗮? An open-source platform for building agentic RAG architectures. It learns from your preferences, intelligently categorizes, labels, and searches through your data, and provides complete transparency into its decision-making process. The long & exciting feature list: • 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝘁 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗧𝗿𝗲𝗲 𝗔𝗴𝗲𝗻𝘁𝘀: Elysia’s core is a customizable decision tree, and it visualizes its entire reasoning process, showing you why it chooses a specific tool or path. It enables advanced error handling, self-healing from failed queries, and prevents infinite loops. You can also add custom tools and branches to build complex, state-aware workflows. • 𝗗𝗮𝘁𝗮 𝗔𝘄𝗮𝗿𝗲𝗻𝗲𝘀𝘀: Before it even attempts a query, Elysia performs a full analysis of your data collections. This eliminates the blind search problem plaguing most RAG systems and allows for far more complex and accurate query generation. • 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗽𝗹𝗮𝘆𝘀: Your RAG pipeline shouldn't be limited to text, right? That’s why Elysia analyzes each query's results and chooses the best way to display them, from tables and charts to product cards and GitHub tickets. It also features a comprehensive data explorer with search, sorting, and filtering capabilities. • 𝗛𝘆𝗽𝗲𝗿-𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘃𝗶𝗮 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸: It uses your positively-rated queries as few-shot examples to improve future responses. This allows you to use smaller, faster models that perform like larger ones over time, cutting costs without sacrificing quality for most use cases. • 𝗖𝗵𝘂𝗻𝗸-𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱: Elysia chunks documents at query time. It performs initial searches on document-level vectors and only chunks relevant documents on the fly, storing them in a parallel quantized collection with cross references for future use. 𝗧𝗵𝗲 𝗦𝘁𝗮𝗰𝗸 Elysia is built from scratch on Weaviate, using its native features like named vectors, a variety of search types, filters, cross references, quantization, etc. It uses DSPy for LLM interactions and is delivered as a production-ready application via FastAPI, serving a NextJS frontend as static HTML. Also available as a Python package via pip: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗲𝗹𝘆𝘀𝗶𝗮-𝗮𝗶 Type: 𝗲𝗹𝘆𝘀𝗶𝗮 𝘀𝘁𝗮𝗿𝘁 Connect your Weaviate cluster and go explore what’s possible.

Philip Vollet

93,598 次观看 • 11 个月前

The web was never meant to be flattened into text. Yet most web RAG systems start by parsing HTML --- a complex and lossy process. 🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels. Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly. PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%. Why? (1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals. (2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever. (3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines. Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG. Demo below 👇 Code: Paper: Playground:

The web was never meant to be flattened into text. Yet most web RAG systems start by parsing HTML --- a complex and lossy process. 🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels. Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly. PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%. Why? (1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals. (2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever. (3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines. Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG. Demo below 👇 Code: Paper: Playground:

Yichuan Wang

89,492 次观看 • 1 个月前

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

972,565 次观看 • 6 个月前

I've been playing with a feature for my prose editor, I call "ghost cut". Y'all know how to cut and paste. You hit ctrl+x, the text disappears and goes into the clipboard. Then you move the caret and ctrl+v to paste it into a new location. You've done that a bazillion times, I bet. An annoyance I have with this is that as soon as you cut, the text reflows—and you have to reorient yourself in the "page". The solution I'm playing with is to ghost the text that was just cut. It only disappears from view once you paste it. I'm kinda liking it so far. But there is more testing to be done. Does anyone know of any precedence for this? I've not seen this in any editor or word processor.

I've been playing with a feature for my prose editor, I call "ghost cut". Y'all know how to cut and paste. You hit ctrl+x, the text disappears and goes into the clipboard. Then you move the caret and ctrl+v to paste it into a new location. You've done that a bazillion times, I bet. An annoyance I have with this is that as soon as you cut, the text reflows—and you have to reorient yourself in the "page". The solution I'm playing with is to ghost the text that was just cut. It only disappears from view once you paste it. I'm kinda liking it so far. But there is more testing to be done. Does anyone know of any precedence for this? I've not seen this in any editor or word processor.

Will McGugan

20,180 次观看 • 18 天前

✨ I made my first video game with ChatGPT: 1) ChatGPT generates a text-based adventure game with DALL-E 3 generating images for it 2) Every time you play the game is different because it generates the story and images live 3) The images from DALL-E are sent to Runway which turns images into video 4) The text is sent to ElevenLabs which turns the text adventure into a pirate narrator voice 5) It's merged into a video 6) Interactive buttons are overlayed The game is called: 🐒🏝️🇳🇱The Secret of Monkey Island: Amsterdam (unofficial) And you can play it here: (video + TTS + buttons doesn't work auto yet, for now manual but text + img works, I'm building an interface for it now)

✨ I made my first video game with ChatGPT: 1) ChatGPT generates a text-based adventure game with DALL-E 3 generating images for it 2) Every time you play the game is different because it generates the story and images live 3) The images from DALL-E are sent to Runway which turns images into video 4) The text is sent to ElevenLabs which turns the text adventure into a pirate narrator voice 5) It's merged into a video 6) Interactive buttons are overlayed The game is called: 🐒🏝️🇳🇱The Secret of Monkey Island: Amsterdam (unofficial) And you can play it here: (video + TTS + buttons doesn't work auto yet, for now manual but text + img works, I'm building an interface for it now)

@levelsio

2,724,987 次观看 • 2 年前

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,625 次观看 • 1 年前

Building with AI gets easier every day. Here is an open-source library that makes integrating AI into an application extremely easy: Star the repository! This library alone can make React the best front-end framework out there! There are a bunch of cool things I like about CopilotKit. Here are 3 of them: 1. It allows you to take any -powered agent and bring it into your application. (This is a brand-new feature!) 2. You can build an AI-powered chatbot in your application. The chatbot will have access to your context and can act on the application. 3. You can build a RAG workflow to process and answer questions from a real-time knowledge base. I recorded a video to show you how simple it is to make some of this happen. A few lines of code, and you are in business. Here is a link to the sample application: CopilotKit is open-source. You can self-host it. You can use it with any LLM. Thanks to the team for showing me their tool and collaborating with me on this post!

Santiago

108,824 次观看 • 2 年前

We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown - that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesn’t have high-fidelity understanding over graphics/charts/tables. Our project Surreal Slides uses LlamaParse to convert presentations into clean structured data that you can put into a db (SurrealDB) for simple retrieval, without having to take screenshots of the data on the fly. Thanks to Clelia Bertelli (🦙/acc) for this project, check it out:

We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown - that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesn’t have high-fidelity understanding over graphics/charts/tables. Our project Surreal Slides uses LlamaParse to convert presentations into clean structured data that you can put into a db (SurrealDB) for simple retrieval, without having to take screenshots of the data on the fly. Thanks to Clelia Bertelli (🦙/acc) for this project, check it out:

Jerry Liu

39,872 次观看 • 4 个月前

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

The Peel

51,632 次观看 • 5 个月前

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

Andrew Ng

200,141 次观看 • 6 个月前