
LlamaIndex ๐ฆ
@llama_index โข 115,303 subscribers
The world's best AI Document OCR LlamaParse: https://t.co/yQGTiRSfFL Docs: https://t.co/us6GCS14vD
Videos

We've spent years building LlamaParse into the most accurate document parser for production AI. Along the way, we learned a lot about what fast, lightweight parsing actually looks like under the hood. Today, we're open-sourcing a light-weight core of that tech as LiteParse ๐ฆ It's a CLI + TS-native library for layout-aware text parsing from PDFs, Office docs, and images. Local, zero Python dependencies, and built specifically for agents and LLM pipelines. Think of it as our way of giving the community a solid starting point for document parsing: npm i -g @llamaindex/liteparse lit parse anything.pdf - preserves spatial layout (columns, tables, alignment) - built-in local OCR, or bring your own server - screenshots for multimodal LLMs - handles PDFs, office docs, images Blog: Repo:
LlamaIndex ๐ฆ580,651 ๆฌก่ง็ โข 2 ไธชๆๅ

LiteParse hit 4.3K+ GitHub stars in a few weeks. Today it officially joins the LlamaIndex ecosystem, with its own page at ~500 pages in 2 sec. 50+ formats. Zero cloud dependency. Already powering agents in Claude Code, Cursor, and production pipelines. In a few days out head of OSS, Logan Markewich, is hosting a live workshop: build a fintech due diligence agent with LiteParse โ
LlamaIndex ๐ฆ85,279 ๆฌก่ง็ โข 1 ไธชๆๅ

๐ The team at Google just released the Agents API, a service for building and running custom agents inside a sandboxed Linux environment, and we built a template that gives these agents access to LlamaParse / LiteParse, enabling them to process unstructured documents automatically ๐โก Hereโs how it works: ๐น Configure a Git repository where data and outputs will be stored ๐น Clone the repository into the agent sandbox ๐น Install the LiteParse CLI, the LlamaParse SDK, and agent skills to use both ๐น Prompt the agent with a task and watch it process documents autonomously ๐ค The result? An agent that can work directly with messy, real-world documents using LlamaParse and LiteParse within Googleโs new agent runtime. Check out the GitHub repository: Get started with LlamaParse:
LlamaIndex ๐ฆ20,619 ๆฌก่ง็ โข 15 ๅคฉๅ

Introducing RAGs, a Streamlit app that allows you to create and customize your own RAG agent and then use it over your own data, all with natural language ๐ฅ Directly inspired by OpenAI GPTs, you can converse with an agent to help you do search/retrieval over any data you specify. The app contains three main pages: ๐ Home Page : Have a โbuilder agentโ build your RAG agent through natural language (you specify the data). โ๏ธ RAG Config: Look at configured parameters ๐ค Use your RAG agent! Check out details below ๐ Blog: Repo:
LlamaIndex ๐ฆ475,732 ๆฌก่ง็ โข 2 ๅนดๅ

LlamaParse now has an official Agent Skill you can use across 40+ agents. With built-in instructions for parsing complex documents, including different formats, tables, charts, and images, your agents gain access to deeper document understanding, not just raw text extraction. ๐ Watch the demo ๐ Read the docs: ๐ Get started with LlamaCloud:
LlamaIndex ๐ฆ51,845 ๆฌก่ง็ โข 2 ไธชๆๅ

Let's talk parsing tables. Two days ago we launched ParseBench,the first document OCR benchmark built for AI agents. This deep dive breaks down TableRecordMatch (GTRM), our metric for evaluating complex tables the way your pipeline actually consumes them: as records keyed by column headers.
LlamaIndex ๐ฆ25,999 ๆฌก่ง็ โข 1 ไธชๆๅ

Parsing documents with AI agents just got a lot more seamless๐ We've rebuilt the LlamaParse MCP server to handle your document processing workflows, and you can connect it today to any MCP-compatible client at ๐ Once connected, you'll be able to: ๐ Parse documents into clean markdown ๐ Classify files against your own categories โ๏ธ Split long documents into labelled sections โฌ๏ธ Upload files via URL or a browser-based upload flow Building a production MCP server surfaced some non-obvious challenges: getting auth to align with an existing platform identity system using WorkOS, working around MCP's lack of built-in file upload support, and making deployments, rate limiting and observability feel native with Vercel and Axiom. We wrote up all of it, from the OAuth flow, to the token-based upload design, to the tradeoffs we hit along the way๐ ๐ Read the full blog: ๐ฉโ๐ป GitHub repository:
LlamaIndex ๐ฆ19,601 ๆฌก่ง็ โข 1 ไธชๆๅ

Built a vibe-coded presentation app generator that turns natural language into polished slides โจ This project by Jerry Liu combines the Claude Agent SDK with LlamaParse to create an AI-powered presentation tool that handles everything from content creation to PDF export: ๐ฏ Chat-based slide creation - just describe what you want and watch slides appear โ๏ธ Real-time editing through natural conversation - refine slides by chatting with the AI ๐ Smart document parsing with LlamaParse for incorporating reference materials ๐ Full export functionality to PowerPoint and PDF formats Perfect example of how LlamaParse makes document processing seamless while Claude's conversational abilities create an intuitive slide editing experience. Check out the full project:
LlamaIndex ๐ฆ50,184 ๆฌก่ง็ โข 4 ไธชๆๅ

Check out the form-filling agent that automates PDF forms using AI by Jerry Liu ๐๐ค Use any fillable PDF with an agent that fills it out based on your prompts and context files. Our new experiment creates a multi-turn chat experience for form completion. ๐ Upload fillable PDFs and automatically detect form fields using PyMuPDF ๐ Add custom prompts and context files (parsed via LlamaParse) to guide the AI ๐ค Multi-turn conversations let you refine and correct form entries after initial completion ๐พ Download your completed forms when done The agent uses simple tools to list, set, get, and validate form fields. You can chat with it to make corrections and adjustments until your form is perfect. Check out the code on GitHub: Or the deployed app here:
LlamaIndex ๐ฆ55,131 ๆฌก่ง็ โข 4 ไธชๆๅ

๐ The team at Google DeepMind just released Gemini Embedding 2, a frontier embeddings model with 3072 dimensions and state-of-the-art semantic quality. ๐ฉโ๐ป We built a demo showing how to integrate it across the LlamaIndex ecosystem, from LlamaParse to LlamaAgents: ๐ฎ๐๐ฑ๐ถ๐ผ-๐ธ๐ฏ, a knowledge base for your audio notes. With audio-kb, you can: ๐น Upload an MP3 or record directly from your terminal ๐น LlamaParse extracts the transcript from the audio ๐น Gemini Embedding 2 generates embeddings ๐น Metadata + vectors are stored in SurrealDB and indexed with HNSW ๐ Once ingested, you can search all your audio notes directly from the terminal. ๐๏ธ Perfect for turning voice memos, meetings, or lectures into a searchable knowledge base. ๐ Full blog: ๐ป GitHub: โก Try LlamaParse:
LlamaIndex ๐ฆ34,765 ๆฌก่ง็ โข 2 ไธชๆๅ

Weโre excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts ๐๐ than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations ๐ฅ We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now itโs better than ever. LlamaParse contains the following killer features: โ SOTA table/chart extraction โ Seamless integration with LlamaIndex ๐ฆ advanced RAG/agents โ โจ Natural language Parsing Instructions โ โจJSON mode and image extraction โ โจSupport for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at ๐ฆโ๏ธ: Come talk to us:
LlamaIndex ๐ฆ143,087 ๆฌก่ง็ โข 2 ๅนดๅ

Let's talk parsing charts ๐๐. Last week we released ParseBench, the first document OCR benchmark for AI agents. New in ParseBench: ChartDataPointMatch. Most document look at a chart and OCR the caption. Agents need the actual numbers. That's the gap between "OCR'd the text around the chart" and "actually read the chart." More about ParseBench, the GitHub code, Hugging Face dataset, and scientific paperโ
LlamaIndex ๐ฆ13,987 ๆฌก่ง็ โข 1 ไธชๆๅ

Introducing LlamaCloud ๐ฆ๐ค๏ธ Today weโre thrilled to introduce LlamaCloud, a managed service designed to bring production-grade data for your LLM and RAG app. Spend less time data wrangling and more time on application logic. Launching with the following components: 1๏ธโฃ LlamaParse ๐: a proprietary parser designed to be really really good at complex documents with embedded tables. Build advanced RAG over semi-structured PDFs, and ask questions that simply arenโt possible with the naive stack. Available publicly day 1 ๐ฅ 2๏ธโฃ Managed Ingestion/Retrieval API โ๏ธ: An API letting you easily ingest/retrieve data from data sources. Opening up in private beta to select enterprises. Weโre excited to be joined by launch users, partners, and collaborators: Mendable POC MongoDB Qdrant NVIDIA + some awesome hackathon projects at the LlamaIndex ๐ฆ hackathon Check out our FULL blog post on LlamaCloud and LlamaParse: LlamaParse Client Repo: Signup for a LlamaCloud account to use LlamaParse: Interested in the broader LlamaCloud offering? Come talk to us: Also we have a slick new website ๐:
LlamaIndex ๐ฆ141,230 ๆฌก่ง็ โข 2 ๅนดๅ

We're listening ๐LlamaSheets is in beta and we want your feedback Spreadsheets in the wild are messyโmerged cells, broken layouts, headers spanning multiple rows. LlamaSheets (now in beta) extracts regions and tables from these files and outputs clean Parquet files you can actually use. What it does: ยท Identifies and isolates regions in your spreadsheet ยท Extracts them as Parquet files (load directly into pandas/polars/DuckDB) ยท Generates cell-level metadata (40+ features: formatting, position, data types) ยท Creates titles and descriptions for sheets and regions Built for the spreadsheets nobody wants to deal with manually. We need your feedback. While in beta and actively improving based on real-world use cases. Try it out and let us know what works, what doesn't, and what you need. Get started here:
LlamaIndex ๐ฆ35,405 ๆฌก่ง็ โข 5 ไธชๆๅ

Today weโre excited to feature RAGApp v0.1 - which lets any user construct a multi-agent application ๐จ๐ค without writing a single line of code ๐ซ Add any number of agents that you wish, and assign each agent a role, system prompt, and a set of tools. In this example, use a researcher, analyst, and report generation agent to write a news article. This directly generates a full chat interface where you can ask questions and get back answers with full streaming and sources. Huge shoutout to Marcus Schiesser for working on this! RAGApp: create-llama: If you want finer control, you can define your own agentic workflows through code:
LlamaIndex ๐ฆ92,941 ๆฌก่ง็ โข 1 ๅนดๅ

Weโre excited to introduce RAGs v2 - build, customize, and use multiple ChatGPTs over your data, all with natural language ๐ฌ A huge upgrade vs. the initial launch: ๐ซ Easily create multiple RAG pipelines and save them ๐ซ Easily swap between and customize each one (e.g. over different data, or w/ different system prompts) ๐ซ Delete unused RAG pipelines ๐ซ (dev quality) added much-needed linting/CI Check out the video ๐ฅ for details. Itโs super easy to setup and use. Some additional features: ๐ง Supports a lot of LLMs both for building RAG and within each RAG pipeline ๐ Supports loading load files or web pages. Check out our repo here:
LlamaIndex ๐ฆ124,174 ๆฌก่ง็ โข 2 ๅนดๅ

After the release of Parse v2, Extract is also getting an upgrade โ ๐ถ๐ป๐๐ฟ๐ผ๐ฑ๐๐ฐ๐ถ๐ป๐ด ๐๐ ๐๐ฟ๐ฎ๐ฐ๐ ๐๐ฎ! ๐ We've been reworking the experience from the ground up to make document extraction more powerful and easier to use than ever. Here's what's new: โฆ ๐ฆ๐ถ๐บ๐ฝ๐น๐ถ๐ณ๐ถ๐ฒ๐ฑ ๐๐ถ๐ฒ๐ฟ๐: we've replaced modes with cleaner, more intuitive tiers. (And stay tuned: agentic plus is coming to Extract too, very soon.) โฆ ๐ฃ๐ฟ๐ฒ-๐๐ฎ๐๐ฒ๐ฑ ๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐ ๐ฐ๐ผ๐ป๐ณ๐ถ๐ด๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป๐: load your saved extraction configs directly, so you can skip the setup and get straight to extracting. โฆ ๐๐ผ๐ป๐ณ๐ถ๐ด๐๐ฟ๐ฎ๐ฏ๐น๐ฒ ๐ฑ๐ผ๐ฐ๐๐บ๐ฒ๐ป๐ ๐ฝ๐ฎ๐ฟ๐๐ถ๐ป๐ด: now you can control how your documents get parsed before extraction, giving you more flexibility and better results end to end. And for those who need a transition period: Extract v1 will remain accessible via the UI under 'Settings โ General' for a limited time. Try Extract v2 today โ
LlamaIndex ๐ฆ15,041 ๆฌก่ง็ โข 2 ไธชๆๅ

Our OSS engineer Clelia Bertelli (๐ฆ/acc) recently built ๐น๐ถ๐๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต, a fully local document ingestion and retrieval CLI/TUI application powered by LiteParse โก litesearch demonstrates how developers can assemble a high-performance, local-first retrieval pipeline using open tools from across the ecosystem: โข Parsing: LiteParse, the fast and accurate document parser we recently open sourced โข Chunking: Chonkie โข Embeddings: A local Nomic model via Hugging Face transformers.js โข Vector storage: A local Qdrant edge shard (custom-built in Rust and compiled as a native add-on) โข Retrieval: Query stored files with optional path-based filtering and configurable relevance thresholds โข Runtime: Bun for speed and versatility ๐ป Check out the repository and try it yourself: ๐ LiteParse docs:
LlamaIndex ๐ฆ14,806 ๆฌก่ง็ โข 2 ไธชๆๅ

๐ The Google DeepMind team just added Gemini 3.1 to the Live API, so we built a small demo showing how Gemini voice agents can plug directly into the document processing ecosystem powered by LlamaIndex. ๐ฅ In this example, we integrate LiteParse to enable fast, fully-local document parsing. With our TUI-based voice assistant, you can literally talk to your terminal: - Speak commands - Trigger live document parsing via tool calls - Hear the agent read back results in real time ๐ The assistant can extract content from single files or entire folders, leveraging the lightning-fast local parsing that LiteParse provides โก Take a look at the demo๐ ๐ฉโ๐ป GitHub repo ๐ LiteParse docs
LlamaIndex ๐ฆ14,766 ๆฌก่ง็ โข 2 ไธชๆๅ