正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Vision RAG with vector database is all you need. It uses vision language model to embed pages of PDF as directly vectors, without the tedious chunking process. 100% Opensource code.

Shubham Saboo

116,899 subscribers

106,727 次观看 • 1 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

11 条评论

Shubham Saboo 的头像

Shubham Saboo1 年前

GitHub Repo: 50+ Step-by-step tutorials of LLM apps with AI Agents and RAG. P.S: Don't forget to subscribe for FREE to access future tutorials.

Shubham Saboo 的头像

Shubham Saboo1 年前

Find all the awesome LLM Apps with AI Agents and RAG in the following Github Repo. P.S: Don't forget to star the repo to show your support 🌟

Shubham Saboo 的头像

Shubham Saboo1 年前

If you find this useful, RT to share it with your friends. Don't forget to follow me @Saboo_Shubham_ for daily tips and tutorials on LLMs, RAG and AI Agents.

PDF GPT 的头像

PDF GPT1 年前

This is my favorite AI tool for reviewing reports. Just upload a report, ask for a summary, and get one in seconds. It's like ChatGPT, but built for documents. Try it for free.

Gargi 的头像

Gargi1 年前

would love a tutorial on this

Shubham Saboo 的头像

Shubham Saboo1 年前

On it!

Kairos Data Labs 的头像

Kairos Data Labs1 年前

This is amazing. Thanks for sharing!

Shubham Saboo 的头像

Shubham Saboo1 年前

You’re welcome!

Jason 的头像

Jason1 年前

That’s an amazing share. Thank you brother

Shubham Saboo 的头像

Shubham Saboo1 年前

You’re welcome!

SAMO 的头像

SAMO1 年前

Thank you for this. Could solve a use case I have of very large pdf docs that needs semantic search.

相关视频

Voice RAG Agent using OpenAI Agents SDK and Qdrant as the vector database. 100% Opensource Code.

Voice RAG Agent using OpenAI Agents SDK and Qdrant as the vector database. 100% Opensource Code.

Shubham Saboo

28,219 次观看 • 1 年前

Build a Vision RAG app with Gemini 2.5 Flash and Cohere Multimodal Embedding that can understand images and diagrams in PDF. 100% Opensource code with step-by-step tutorial.

Build a Vision RAG app with Gemini 2.5 Flash and Cohere Multimodal Embedding that can understand images and diagrams in PDF. 100% Opensource code with step-by-step tutorial.

Shubham Saboo

60,638 次观看 • 1 年前

Build AI Agents with RAG without writing a single line of Python Code. 100% Opensource and Nocode.

Build AI Agents with RAG without writing a single line of Python Code. 100% Opensource and Nocode.

Unwind AI

70,756 次观看 • 1 年前

Build a Local RAG Reasoning Agent with Qwen 3, using a Vector Database with Web Search for fallback. You can also choose Gemma 3 or DeepSeek R1. 100% free, local, and Opensource.

Build a Local RAG Reasoning Agent with Qwen 3, using a Vector Database with Web Search for fallback. You can also choose Gemma 3 or DeepSeek R1. 100% free, local, and Opensource.

Unwind AI

52,006 次观看 • 1 年前

Today, we're introducing our document parser built specifically for RAG. The parser combines the best vision, OCR, and vision language models to deliver unmatched accuracy. Try it for free today—the first 500+ pages are on us! 🧵 1/

Today, we're introducing our document parser built specifically for RAG. The parser combines the best vision, OCR, and vision language models to deliver unmatched accuracy. Try it for free today—the first 500+ pages are on us! 🧵 1/

Douwe Kiela

1,308,495 次观看 • 1 年前

I present to you my AI automation that converts your Google Drive folder to RAG vector database. It also updates the RAG vector database whenever a new file is added or deleted from the Google Drive folder. You can download the workflow for free (link in comment)

I present to you my AI automation that converts your Google Drive folder to RAG vector database. It also updates the RAG vector database whenever a new file is added or deleted from the Google Drive folder. You can download the workflow for free (link in comment)

Victor

260,273 次观看 • 1 年前

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

47,032 次观看 • 1 年前

⭐ The first foundational model available on LeRobot ⭐ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by Physical Intelligence and ported to pytorch by Pablo Montalvo 👇🧵

⭐ The first foundational model available on LeRobot ⭐ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by Physical Intelligence and ported to pytorch by Pablo Montalvo 👇🧵

Remi Cadene

130,941 次观看 • 1 年前

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Traditional Chunking can lose context between chunks. (Let's explore a better way!) Enter Late Chunking… Here's how it works: Traditional Chunking • Split the text into chunks • Embed each chunk separately Late Chunking • Embed the entire text first • Split it into chunks after the embedding Advantages of Late Chunking • Maintains connections between segments • Reduces the need for complex chunking strategies • Cost-effective: extremely similar cost to regular chunking methods Late Chunking is a promising alternative to traditional methods like ColBERT and naive chunking. It's particularly useful for applications where the documents are long, and context needs to be retained across many pages of text when retrieving information. Want to learn more? • Blog post: • Notebook: Special thanks to Daniel Williams for his invaluable collaboration on this one! 🔥

Femke Plantinga

19,718 次观看 • 1 年前

Planning with Reasoning using Vision Language World Model

Planning with Reasoning using Vision Language World Model

AK

26,274 次观看 • 9 个月前

Vision-based(Colapli) RAG is becoming popular, so we built a platform to compare: - Simple OCR RAG - VisionRAG - Colpali - Hybrid Colpali 🚀 Introducing VARAG – the Vision-First RAG Engine (Vision Augmented Retrieval and Generation).

Vision-based(Colapli) RAG is becoming popular, so we built a platform to compare: - Simple OCR RAG - VisionRAG - Colpali - Hybrid Colpali 🚀 Introducing VARAG – the Vision-First RAG Engine (Vision Augmented Retrieval and Generation).

Adithya S K

95,370 次观看 • 1 年前

StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization and text-guided SVG creation.

StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization and text-guided SVG creation.

AK

254,259 次观看 • 1 年前

As part of the ongoing Vision-Language-Autonomy project, CMU researchers have developed a general navigation platform with an onboard autonomy stack to support Vision-Language-Navigation. It functions in simulation and on real robots 🤖💥 Check it out!

As part of the ongoing Vision-Language-Autonomy project, CMU researchers have developed a general navigation platform with an onboard autonomy stack to support Vision-Language-Navigation. It functions in simulation and on real robots 🤖💥 Check it out!

CMU Robotics Institute

23,706 次观看 • 1 年前

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Verba is an open source Retrieval Augmented Generation (RAG) application that performs RAG on your own data. To showcase its capabilities, we've customized it as an Airbnb chatbot using Airbnb’s customer documentation. How it works: • Ask any questions, related to your booking, policies, or anything related to your Airbnb experience. • Get relevant, human-like responses: Verba provides natural and informative answers. • Access original sources: One of the standout features of RAG is its ability to directly indicate the sources it used to generate each response. Under the hood, Verba uses a RAG pipeline to deliver these exceptional results. Your query is transformed into a numerical representation (vector) and be used to search through our vector database for the most similar context using Hybrid Search. The most relevant context is then combined with your original question and fed into a powerful large language model (LLM). The LLM will then use all of that information to generate a conversational response. Et voilà! 💫 Try Verba: Verba on GitHub: Learn more in our video:

Femke Plantinga

120,565 次观看 • 1 年前

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

Shubham Saboo

42,269 次观看 • 1 年前

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

AI at Meta

94,330 次观看 • 1 年前

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

everybody talks about building AI chatbots, but nobody tells you HOW to do it that's why I made a full practical walkthrough on how to build an AI chatbot that's hooked up to your own custom knowledgebase inside of the walk-through i go over: – data collection: gathering all relevant documents, conversations, and info - preprocessing: cleaning up and formatting the collected data - chunking: break down the cleaned data into smaller, manageable pieces - embedding & storing in a vector database - RAG & chatbot integration: using RAG to allow the chatbot to retrieve relevant information from the vector database based on a user's question reply to this tweet w/ the word “RAG” & I’ll send it to you (must be following so I can DM)

Tyler

83,489 次观看 • 1 年前

Announcing moondream 0.5B, the world's smallest vision language model.

Announcing moondream 0.5B, the world's smallest vision language model.

vik

58,888 次观看 • 1 年前

Traditional chunking: cheap but dumb. ColBERT: smart but expensive. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴: the solution we've been waiting for. Here’s a quick evolution of chunking strategies: → 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count • Sentence Chunking - split by sentence boundaries • Document-Based Chunking - split by sections/paragraphs → 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (when things got sophisticated) • Semantic Chunking - split by meaning • LLM-Based Chunking - let the model decide But each chunking method separates text at defined points, meaning context is lost within the document from one chunk to the next. → 𝗘𝗻𝘁𝗲𝗿 𝗟𝗮𝘁𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the game changer) Traditional approach: Chunk first → Embed each chunk separately Late chunking approach: Embed the entire document → Then chunk with context preserved 𝗪𝗵𝘆 𝗰𝗵𝗼𝗼𝘀𝗲 𝗹𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴? When you chunk first, each piece loses its contextual relationship to the rest of the document. It's like reading a book by randomly picking paragraphs - you miss the flow. With late chunking, every chunk maintains awareness of its neighbors because the embedding happens at the document level first. Mean pooling is done on segments AFTER the full context is embedded. Jina AI tested and saw significant improvements in retrieval quality - chunks that were previously disconnected now maintain their semantic relationships. As documents get longer and context windows expand, late chunking might just become the new standard for high-quality retrieval systems. 𝗪𝗵𝗮𝘁 𝗱𝗼 𝘆𝗼𝘂 𝗻𝗲𝗲𝗱 𝘁𝗼 𝗺𝗮𝗸𝗲 𝘁𝗵𝗶𝘀 𝘄𝗼𝗿𝗸? No modifications to your retrieval pipeline are needed. 1. Long context embedding models (8192+ tokens) 2. Chunking logic that tracks token spans 3. Less than 30 lines of code to implement All you need is to switch the order at which you chunk and embed. Embed FIRST, then chunk, not the other way around. Dive deeper into late chunking:

Traditional chunking: cheap but dumb. ColBERT: smart but expensive. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴: the solution we've been waiting for. Here’s a quick evolution of chunking strategies: → 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count • Sentence Chunking - split by sentence boundaries • Document-Based Chunking - split by sections/paragraphs → 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (when things got sophisticated) • Semantic Chunking - split by meaning • LLM-Based Chunking - let the model decide But each chunking method separates text at defined points, meaning context is lost within the document from one chunk to the next. → 𝗘𝗻𝘁𝗲𝗿 𝗟𝗮𝘁𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the game changer) Traditional approach: Chunk first → Embed each chunk separately Late chunking approach: Embed the entire document → Then chunk with context preserved 𝗪𝗵𝘆 𝗰𝗵𝗼𝗼𝘀𝗲 𝗹𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴? When you chunk first, each piece loses its contextual relationship to the rest of the document. It's like reading a book by randomly picking paragraphs - you miss the flow. With late chunking, every chunk maintains awareness of its neighbors because the embedding happens at the document level first. Mean pooling is done on segments AFTER the full context is embedded. Jina AI tested and saw significant improvements in retrieval quality - chunks that were previously disconnected now maintain their semantic relationships. As documents get longer and context windows expand, late chunking might just become the new standard for high-quality retrieval systems. 𝗪𝗵𝗮𝘁 𝗱𝗼 𝘆𝗼𝘂 𝗻𝗲𝗲𝗱 𝘁𝗼 𝗺𝗮𝗸𝗲 𝘁𝗵𝗶𝘀 𝘄𝗼𝗿𝗸? No modifications to your retrieval pipeline are needed. 1. Long context embedding models (8192+ tokens) 2. Chunking logic that tracks token spans 3. Less than 30 lines of code to implement All you need is to switch the order at which you chunk and embed. Embed FIRST, then chunk, not the other way around. Dive deeper into late chunking:

Femke Plantinga

125,283 次观看 • 10 个月前

Just built a better & nicer Bank Statement Converter ($50K MRR) using Rocket + PDF Vector just 15 minutes: 1️⃣ Use Rocket to build the frontend and integrate with PDF Vector API 2️⃣ Use PDF Vector API to process the documents (PDF, Word, or images) & convert them to CSV/Excel

Just built a better & nicer Bank Statement Converter ($50K MRR) using Rocket + PDF Vector just 15 minutes: 1️⃣ Use Rocket to build the frontend and integrate with PDF Vector API 2️⃣ Use PDF Vector API to process the documents (PDF, Word, or images) & convert them to CSV/Excel

Minh-Phuc Tran

30,185 次观看 • 7 个月前