Loading video...

Video Failed to Load

Go Home

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off...

796,224 views • 2 days ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

971,622 views • 5 months ago

PDF parsing is still painful because LLMs reorder text in complex layouts, break tables across pages, and fail on graphs or images. 💡Testing the new open-source OCRFlux model, and here the results are really good for a change. So OCRFlux is a multimodal, LLM based toolkit for converting PDFs and images into clean, readable, plain Markdown text. Because the underlying VLM is only 3B param, it runs even on a 3090 GPU. The model is available on Hugging Face . The engine that powers the OCRFlux, teaches the model to rebuild every page and then stitch fragments across pages into one clean Markdown file. It bundles one vision language model with 3B parameters that was fine-tuned from Qwen 2.5-VL-3B-Instruct for both page parsing and cross-page merging. OCRFlux reads raw page images and, guided by task prompts, outputs Markdown for each page and merges split elements across pages. The evaluation shows Edit Distance Similarity (EDS) 0.967 and cross‑page table Tree Edit Distance 0.950, so the parser is both accurate and layout aware. How it works while parsing each page - Convert into text with a natural reading order, even in the presence of multi-column layouts, figures, and insets - Support for complicated tables and equations - Automatically removes headers and footers Cross-page table/paragraph merging - Cross-page table merging - Cross-page paragraph merging A compact vision‑language models can beat bigger models once cross‑page context is added. 🧵 1/n Read on 👇

Rohan Paul

149,264 views • 11 months ago

Claude Code + computer use is f*cking cracked 🤯 Build a landing page → Claude opens Chrome, looks at it, spots every issue, and fixes it — without you describing a single thing. All inside Claude Code. Perfect for DTC brands and agencies who are still vibe-coding landing pages and advertorials in Claude Code, then manually opening them in Chrome, spotting 15 things wrong, and describing every visual issue back to Claude one at a time. If you're building pages in Claude Code and your workflow looks like this — build the page, open it in Chrome, spot broken spacing, go back to Claude, type "the CTA button is too low and the hero image is cut off," wait for the fix, open Chrome again, find 3 new issues, describe those too ... Claude Code + computer use eliminates the entire loop: → Claude writes the full landing page or advertorial → Opens Chrome and navigates to it → Spots layout issues, broken spacing, off-brand colors, missing elements → Fixes everything and re-checks until the page looks right → Tests your Shopify product pages by clicking through like a real customer → Walks through your checkout flow and flags friction before customers hit it → You only see the finished, visually verified result No describing what you see on screen. No "the CTA button needs more contrast" back-and-forth. No being the eyeballs for an AI that can't see. What you get: → Landing pages and advertorials Claude builds AND visually QAs before you ever look at them → Product pages Claude clicks through — testing layout, images, and CTAs like a real user → HTML dashboards Claude opens and verifies the charts actually render → Checkout flows Claude walks through step by step to catch friction → All of it happening in one session — build, test, fix, done One prompt. Claude builds it, checks it, and fixes it. You just review the finished page. I put together a full playbook with the exact setup, the prompts, and 5 DTC workflows that use Claude Code + computer use. Want it for free? > Like this post > Comment "CLAUDE" And I'll send it over (must be following so I can DM)

Mike Futia

19,000 views • 2 months ago

Figma Capture + Claude Fable 5 = clone any competitor's landing page in minutes 🤯 Figma just dropped a Chrome extension that copy/pastes any live website into Figma as editable layers. Point Claude Code — running the new Fable 5 model — at the capture, and it rebuilds the whole page in YOUR brand: structure, copy, design system, photography. All inside Claude Code. Perfect for DTC brands and agencies who keep losing weeks rebuilding proven landers from scratch. If you're still cloning competitor pages the old way, You're screenshotting their site and praying the AI guesses the structure right. You're rebuilding sections by hand because the fonts and spacing never come out true. You're paying a designer $3K to recreate a page that already exists. This workflow eliminates the entire loop: → Capture the competitor's landing page with Figma's new Chrome extension — editable layers, not screenshots → Claude Code reads the page structure through the Figma MCP: every section, every headline, in order → It rebuilds the page in YOUR brand — your colors, your fonts, your voice (it even respects your banned-words list) → GPT Image 2 generates on-brand product photography from the layer names → Claude places every image automatically — finished page, ready to ship No screenshots. No guessing at structure. No design invoice. What you get: → A pixel-structured clone of any proven lander, rebuilt in your brand → Every line of copy rewritten in your voice → 16 on-brand images generated and placed for ~$2 in API costs → Start to finish in about 15 minutes Built 100% with Figma + Claude Code on Claude Fable 5. I put together a step-by-step playbook showing you exactly how to set it up. Want access for free? > Like this post > Comment "CLONE" And I'll send it over (must be following so I can DM)

Mike Futia

45,133 views • 9 days ago

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: ​ 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. ​ 2. The connector ecosystem to load data from unstructured data sources is very immature. ​ 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. ​ The goal of a RAG Pipeline is to solve these problems. ​ The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. ​ At a high level, there are four different stages in the architecture of a RAG pipeline: ​ 1. Ingestion: Here is where the pipeline loads the information from the data source. ​ 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. ​ 3. Transform: Where the pipeline chunks the data and generates document embeddings. ​ 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. ​ There are different rabbit holes at each one of these stages. Here are three of them: ​ 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. ​ 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. ​ 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. ​ In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. ​ I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. ​ ​ If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 views • 1 year ago

I just built a plugin with Claude Fable 5 that turns Claude Code into a $5,000/mo SEO consultant 🤯 9 skills, one plugin: it connects straight to your Search Console + GA4 data, finds the wins, ships the fixes, and renders a live SEO dashboard that looks like a $200/mo SaaS product. All inside Claude Code. Perfect for DTC brands and agencies sitting on months of Search Console data nobody has time to read. Right now, you probably can't answer: Which keywords are sitting on page 2, one title tag away from page 1, Which pages are bleeding traffic to redirect chains and broken canonicals, Which blog posts rank for commercial terms but never link to a product page. This plugin answers all of it from your live data, then ships the fixes: → Finds your page-2 keywords and ships the fix: new title, headings, content, paste-ready → Clusters every query into a hub-and-spoke content map with the gaps flagged → Drafts posts from your actual search data, not guesses → Writes dev tickets for redirect chains and slow pages, ranked by traffic at risk → Builds the internal links between your blog and your money pages → Flags toxic backlinks and ranks outreach targets → Drops a Monday report with 3 priorities before the client even asks → Renders it all as a one-file HTML dashboard with a 0-100 SEO health score No dashboard staring. No CSV archaeology. No $5K/mo retainer for a PDF. What you get: → Page-2 keywords moved to page 1 → A content calendar that fills itself from data → Dev tickets that write themselves → A live SEO dashboard on command Built 100% in Claude Code with Claude Fable 5. I put the entire build into a step-by-step Playbook: all 8 workflow prompts (including the dashboard), how to turn them into a plugin, and the full Google setup (Including the 2 landmines Google doesn't tell you about). Want access for free? > Like this post > Comment "SEO" And I'll send it over (must be following so I can DM)

Mike Futia

78,315 views • 10 days ago