
Jerry Liu
@jerryjliu0 • 75,193 subscribers
Parsing the world's hardest PDFs @llama_index. cofounder/CEO Careers: https://t.co/EUnMNmbCtx Enterprise: https://t.co/Ht5jwxSrQB
Shorts
Videos

Parse PDFs at lightspeed (this video is at 1x) Absolute cinema
Jerry Liu125,463 görüntüleme • 5 gün önce

Introducing LiteParse - the best model-free document parsing tool for AI agents 💫 ✅ It’s completely open-source and free. ✅ No GPU required, will process ~500 pages in 2 seconds on commodity hardware ✅ More accurate than PyPDF, PyMuPDF, Markdown. Also way more readable - see below for how we parse tables!! ✅ Supports 50+ file formats, from PDFs to Office docs to images ✅ Is designed to plug and play with Claude Code, OpenClaw, and any other AI agent with a one-line skills install. Supports native screenshotting capabilities. We spent years building up LlamaParse by orchestrating state-of-the-art VLMs over the most complex documents. Along the way we realized that you could get quite far on most docs through fast and cheap text parsing. Take a look at the video below. For really complex tables within PDFs, we output them in a spatial grid that’s both AI and human-interpretable. Any other free/light parser light PyPDF will destroy the representation of this table and output a sequential list. This is not a replacement for a VLM-based OCR tool (it requires 0 GPUs and doesn’t use models), but it is shocking how good it is to parse most documents. Huge shoutout to Logan Markewich and Clelia Bertelli (🦙/acc) for all the work here. Come check it out: Repo:
Jerry Liu255,674 görüntüleme • 2 ay önce

We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench. Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work: ✅ It optimizes for semantic correctness (instead of exact similarity) ✅ It has the most comprehensive distribution of real-world enterprise documents It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding. We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings: 💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost. 💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better. 💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better. 💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions. This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper. 🌐: Blog: 📄 Paper: 💻 Code: 📊 Dataset: 🎥 YouTube:
Jerry Liu107,433 görüntüleme • 1 ay önce

LiteParse is the best model-free, open-source document parser for AI agents. It now gets a first-class landing page on our website 💫 Our company mission is building the world's best agentic document processing platform, and liteparse is the central pillar behind our OSS efforts. It's blazing fast (and getting faster soon!), supports 50+ file formats, and is one-shot installable as an agent skill. Webpage: Come check it out:
Jerry Liu69,844 görüntüleme • 1 ay önce

I built a Claude Code skill that allows it to generate a deep research report over any collection of complex docs (PDFs, Word, Pptx)….and generate word-level citations and bounding boxes directly back to the source! 📝 Check out “/research-docs”. 1. It parses out text and bounding boxes from every doc with liteparse, in seconds. 2. It then generates a full HTML report of the outputs that let you see word-level citations in each page. Raw Claude obviously has deep research capabilities, but it lacks an audit trail back to the source. This skill gives you a researched report that can be audited by others. Check it out: LiteParse:
Jerry Liu76,329 görüntüleme • 1 ay önce

Claude Cowork is great at reading markdown files, and not great at reading PDFs 📑 I made a simple command that lets you batch parse PDFs -> markdown in an output folder, so you can just point Cowork to it. It lets the agent understand massive amounts of docs much more quickly and accurately (much lower hallucination errors over complex visuals, tables, etc.) In this example, after I parsed 100+ Supreme Court filings into markdown, cowork is able to return the answer in seconds. My wrapper is a simple extension of our very own semtools. Very DIY, will clean this up:
Jerry Liu65,057 görüntüleme • 4 ay önce

Introducing NotebookLlama - an open-source version of NotebookLM! 📓🦙 NotebookLlama is a full implementation of NotebookLM that includes all the capabilities that makes it so great for researchers+business users: ✅ Create a knowledge repository of documents. Has likely higher accuracy than NotebookLM since it’s using LlamaCloud under the hood for high-quality parsing/extraction over complex docs. ✅ Generate summaries and knowledge graph mind-maps 🤯 ✅ Generate podcasts thanks to ElevenLabs 🗣️ ✅ Agentic chat with docs and view metrics with OpenTelemetry This all lives within an open-source repo so you can clone/modify at will to swap in your own components! Huge shoutout to Clelia Bertelli (🦙/acc) for leading this. Repo: LlamaCloud helps power the parsing/ingestion. You can always use your own stuff too, but in the meantime you can check out LlamaCloud here:
Jerry Liu137,708 görüntüleme • 11 ay önce

Tutorial: Automating KYC with AI agents 🪪🕵 I’m creating a new tutorial series of automating practical document workflows with agents. Every financial institution needs to perform KYC (know your customer) to verify a customer’s identity, and this involves manually sifting through IDs, bank statements, etc. and doing the cross-checking by hand. This is a great first use case for agentic document workflows: 1. Extract identification information from the user supplied ID (license, passport) 2. Extract fields from utility bills/bank statements and then use LLMs to cross-validate extracted fields with the extracted ID fields It obviously doesn’t cover the full e2e process and uses publicly available online data, but should be a good reference guide to get started. To make this work well, you do need high-quality document extraction with confidence scores and citations! Check out the tutorial: If you’re interested come check out LlamaParse:
Jerry Liu29,887 görüntüleme • 1 ay önce

Claude Code over Excel++ 🤖📊 Claude already 'works' over Excel, but in a naive manner - it writes raw python/openpyxl to analyze an Excel sheet cell-by-cell and generally lacks a semantic understanding of the content. Basically the coding abstractions used are too low-level to have the coding agent accurately do more sophisticated analysis. Our new LlamaSheets API lets you automatically segment structure complex Excel sheets into well-formatted 2D tables. This both gives Claude Code immediate semantic awareness of the sheet, and allows it to run Pandas/SQL over well-structured dataframes. We've written a guide showing you how specifically to use LlamaSheets with coding agents! Guide: Sign up to LlamaCloud:
Jerry Liu75,700 görüntüleme • 6 ay önce

We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown - that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesn’t have high-fidelity understanding over graphics/charts/tables. Our project Surreal Slides uses LlamaParse to convert presentations into clean structured data that you can put into a db (SurrealDB) for simple retrieval, without having to take screenshots of the data on the fly. Thanks to Clelia Bertelli (🦙/acc) for this project, check it out:
Jerry Liu39,872 görüntüleme • 2 ay önce

Parse text from any PDF in seconds and give it to Claude Code 📑🤖 LiteParse is our open-source, model-free document parser that lets you digitalize text from any document in seconds. This is especially useful for coding agents, which are great at reading plaintext files but terrible at reading traditional document formats (PDF, Office docs). We have a one-line installable skill that lets you plug LiteParse into Claude Code and 40+ other agents. Repo is here:
Jerry Liu29,746 görüntüleme • 2 ay önce

Document OCR benchmarks are still an open problem Existing document OCR benchmarks are either too narrowly focused on a specific type (e.g. FinTabNet, ChartQA), or on documents that aren’t reflective of real-world tasks (e.g. OmniDocBench, OlmOCR-bench on over academic papers) ParseBench is a step towards solving this problem. * It tries to comprehensively cover real-world document distributions within the enterprise. * It contains comprehensive evaluations across 5 different dimensions (tables, charts, content faithfulness, formatting, grounding). * It tries to use metrics that optimize for agent semantic understanding rather than structural similarity. We released this yesterday, and there’s a TON of content: 1. Whitepaper 2. HF dataset 3. Github repo 4. Blog 5. Video And today, we’re excited to feature our home page website for ParseBench 💫 come check it out! Take a look at some of our other materials if you’re interested: Blog: Paper:
Jerry Liu21,373 görüntüleme • 1 ay önce

Give Claude Code a semantic filesystem 🗃️🛠️ Giving Claude Code access to the right CLI tools over your filesystem turns it into a general agent capable of automating far more knowledge work beyond code - it can do dynamic financial/legal/medical/technical/backoffice analysis over any subset of documents. With our latest release of semtools 💫, you can now manually or *agentically* create a persistent workspace over any subset of files. This gives Claude Code the ability to get blazing-fast, local semantic search over any data, while still allowing it to chain with commands like grep/cat/etc. so that it can load in dynamic context instead of naive top-k vector search. The coding agent can dynamically index data and use those indexes, instead of having to rebuild it every time. So you get the benefits of fast search along with agentic reasoning over CLI tools mentioned above. Come check it out!
Jerry Liu77,373 görüntüleme • 8 ay önce

Building “RAG 2.0” is just making Claude Code running over your filesystem 🤖🗂️ To make this work well, you need to solve three things 1️⃣ Virtualize your filesystem to prevent the agent from messing stuff up. AgentFS by Turso is a nice example of how you can give the agent access to a copy of all your files without messing up your raw data. 2️⃣ Parse unstructured documents like PDFs, pptx, Word into an LLM-ready format. Agentic OCR solutions like LlamaParse can help here 3️⃣ Creating an agentic loop with human-in-the-loop. If you want to control the agent implementation instead of using Claude Code out of the box, you can use LlamaIndex 🦙 workflows to help orchestrate these long-running agent tasks. Shoutout Clelia Bertelli (🦙/acc), check it out! Blog: Repo:
Jerry Liu55,620 görüntüleme • 5 ay önce

I built a form-filling agent that anyone can use 💫 This is an extremely simple but useful (I hope) app. Upload a fillable form 📋, some context files, and chat with the agent to fill the form out automatically ✍️ 1️⃣ Yes it is a Claude Code SDK wrapper 2️⃣ It is better and faster than ChatGPT/Claude UI out of the box 3️⃣ We use LlamaParse to parse the context files, so you can have more trust that we are able to read context without hallucinations (e.g. messy scanned handwriting, drivers license photo, and more). This was one of my holiday Claude Code vibe-coding projects. Built with Opus 4.5, and also powered by Opus 4.5. Feeling the AGI 🫡 App is here: Repo is open-source:
Jerry Liu48,342 görüntüleme • 4 ay önce

I made an AI agent that can fill out complicated forms from unstructured context 📝 For instance: automatically fill out your expense report 💳 by drag and dropping 5-10 receipt pictures/scans 🧾 Uses Claude Agent SDK + LlamaParse to parse unstructured docs + custom tools for form understanding. It now semantically understands each field, handles multi-turn conversations, and lets you drag up to 10 files. Deployed on Vercel/Render. App: Repo: LlamaCloud:
Jerry Liu34,714 görüntüleme • 4 ay önce

Build a loan underwriting agent workflow in ~5 mins 💵🤖 (No code, just English) I built an agentic workflow that can process incoming PDFs of financial statements: checking statements, bank statements, brokerage accounts, and extract/normalize the financials in each source of data. I did this by specifying the schemas and classification logic in plain English; our agent compiles this into a deterministic workflow with calls to agentic parsing to extract out all the information that you need. Our agent builder is now in LlamaCloud, you can sign up and try it yourself! Blog: Sign up:
Jerry Liu30,015 görüntüleme • 4 ay önce

I used Claude Code Skills + souped up PDF parsing to create an M&A deal comp agent 🤖 Given a directory of public M&A filings (DEF 14A), it parses and analyzes each pdf, and generates an Excel sheet with deal terms and comparables. 1️⃣ The native parsing is pypdf which sucks, so I swapped it out with LlamaIndex semtools. This uses LlamaCloud for parsing - letting it handle deeply complex financial tables, charts, and anything else you throw at it. 2️⃣ Claude has native skills to write Excel spreadsheets! Full prompt is in the video below 👇 LlamaCloud + Semtools access also takes < 5 mins to setup, come check it out! Some notes: - disclaimer: Some of the values in the Excel spreadsheet are 100x bigger because they’re formatted as percentages not raw values - We’re working on native Claude skill integrations too! Semtools: LlamaCloud:
Jerry Liu46,702 görüntüleme • 7 ay önce

We built LobsterX 🦞, an OpenClaw🦞 specialized for document work on your computer. It uses high-accuracy document parsing, extraction, classification through LlamaCloud, meaning it can comb through complicated PDFs (with scans, tables, diagrams) and extract out 100% accurate context! It can run as a Telegram bot and is built on top of agentfs (Turso) as a file system. Big shoutout to Clelia Bertelli (🦙/acc). This is a fun project inspired by OpenClaw🦞’s success, and besides being a fun tool to use, it can be a great reference for building your own generalized coding agents! Readme: LlamaCloud:
Jerry Liu27,071 görüntüleme • 3 ay önce