Loading video...

Video Failed to Load

Go Home

33,989 views • 10 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels. Why that matters: parsing is where web RAG quietly loses information. - A single HTML-to-text parser can drop 40%+ of a page. - Tables, charts, and layout get flattened or thrown out. - Swapping parsers alone can move accuracy ~10 points on the same docs. PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA. The repo also ships a Claude Code plugin that gives Claude eyes. It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like. One setup script. No MCP server, no backend. How the pipeline works: - Renders each document (web, PDF, image) to image tiles. - Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots. - Builds a FAISS index and serves a search API. A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels. Everything is open-source under Apache-2.0. GitHub repo: Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x. The article is quoted below.

Akshay 🚀

794,194 views • 1 day ago