Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about... show more

Andrew Ng

1,600,350 subscribers

689,047 views • 1 year ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

11 Comments

Arnav Jaitly1 year ago

This is such a pain point for a lot of companies. Agentic information extraction will be such a productivity and efficiency boost

PDF GPT1 year ago

This is my favorite AI tool for reviewing reports. Just upload a report, ask for a summary, and get one in seconds. It's like ChatGPT, but built for documents. Try it for free.

Andrei Hasna1 year ago

Well done and well needed.

Vasilis K.1 year ago

@JackPNicholls

The AI Agent Architect1 year ago

Great use case. Using an agentic approach to this is kinda like getting the LLM to add metadata to what it finds in the PDF. So many thing you could do with that. If you want to see the opposite side of that coin - the creation of a structured Word document by an agent from only two simple commands, there will be a video demo showing this in my next article tomorrow.

🔮1 year ago

Love what you’re doing! For what it’s worth the URL in your video is wrong and hits a 404

Mr Rio1 year ago

this is what I need

RyanRejoice1 year ago

This is incredible and a game changer for handling and analyzing pdf data.

Clara Data1 year ago

Innovative spin on PDFs! Looking promising for future efficiency!

Huxley MindOS1 year ago

Mastering the PDF mystery: deciphering both scroll & scribe! Sounds innovative. Looking forward to the details.

Javier Modified1 year ago

Agentic Document Extraction? Finally, AI can stop pretending it understands charts. This is how you handle a PDF! 🚀

Related Videos

Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.

Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.

Andrew Ng

290,336 views • 1 year ago

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

Andrew Ng

199,328 views • 4 months ago

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

Andrew Ng

298,970 views • 8 months ago

Parse text from any PDF in seconds and give it to Claude Code 📑🤖 LiteParse is our open-source, model-free document parser that lets you digitalize text from any document in seconds. This is especially useful for coding agents, which are great at reading plaintext files but terrible at reading traditional document formats (PDF, Office docs). We have a one-line installable skill that lets you plug LiteParse into Claude Code and 40+ other agents. Repo is here:

Parse text from any PDF in seconds and give it to Claude Code 📑🤖 LiteParse is our open-source, model-free document parser that lets you digitalize text from any document in seconds. This is especially useful for coding agents, which are great at reading plaintext files but terrible at reading traditional document formats (PDF, Office docs). We have a one-line installable skill that lets you plug LiteParse into Claude Code and 40+ other agents. Repo is here:

Jerry Liu

29,746 views • 2 months ago

Agentic Document Extraction now supports field extraction! Many doc extraction use cases extract specific fields from forms and other structured documents. You can now input a picture or PDF of an invoice, request the vendor name, item list, and prices, and get back the extracted fields. Or input a medical form and specify a schema to extract patient name, patient ID, insurance number, etc. One cool feature: If you don't feel like writing a schema (json specification of what fields to extract) yourself, upload one sample document and write a natural language prompt saying what you want, and we automatically generate a schema for you. See the video for details!

Agentic Document Extraction now supports field extraction! Many doc extraction use cases extract specific fields from forms and other structured documents. You can now input a picture or PDF of an invoice, request the vendor name, item list, and prices, and get back the extracted fields. Or input a medical form and specify a schema to extract patient name, patient ID, insurance number, etc. One cool feature: If you don't feel like writing a schema (json specification of what fields to extract) yourself, upload one sample document and write a natural language prompt saying what you want, and we automatically generate a schema for you. See the video for details!

Andrew Ng

192,833 views • 11 months ago

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀 Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for each patch retain contextual information, similarly to text embeddings in methods like ColBERT. During retrieval, user queries are embedded in the same space, and then compared to document patches using the MaxSim operator. ColPali recipe POC Weaviate AI Database: ColPali paper: As always, shoutout to the awesome Daniel Williams for the help! 💙

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀 Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for each patch retain contextual information, similarly to text embeddings in methods like ColBERT. During retrieval, user queries are embedded in the same space, and then compared to document patches using the MaxSim operator. ColPali recipe POC Weaviate AI Database: ColPali paper: As always, shoutout to the awesome Daniel Williams for the help! 💙

Victoria Slocum

67,802 views • 1 year ago

OCR can process characters but it doesn’t understand pixels. OCR has no way to reason about the headers, totals, or checkboxes found in tables, invoices, or forms. In our course with LandingAI, "Document AI: From OCR to Agentic Doc Extraction," we build agents to address these failure modes by breaking documents into pieces, applying the right tools, and mapping information to expected formats. Learn more and enroll today:

OCR can process characters but it doesn’t understand pixels. OCR has no way to reason about the headers, totals, or checkboxes found in tables, invoices, or forms. In our course with LandingAI, "Document AI: From OCR to Agentic Doc Extraction," we build agents to address these failure modes by breaking documents into pieces, applying the right tools, and mapping information to expected formats. Learn more and enroll today:

DeepLearning.AI

21,490 views • 4 months ago

Let's talk parsing charts 📊📈. Last week we released ParseBench, the first document OCR benchmark for AI agents. New in ParseBench: ChartDataPointMatch. Most document look at a chart and OCR the caption. Agents need the actual numbers. That's the gap between "OCR'd the text around the chart" and "actually read the chart." More about ParseBench, the GitHub code, Hugging Face dataset, and scientific paper→

Let's talk parsing charts 📊📈. Last week we released ParseBench, the first document OCR benchmark for AI agents. New in ParseBench: ChartDataPointMatch. Most document look at a chart and OCR the caption. Agents need the actual numbers. That's the gap between "OCR'd the text around the chart" and "actually read the chart." More about ParseBench, the GitHub code, Hugging Face dataset, and scientific paper→

LlamaIndex 🦙

13,987 views • 1 month ago

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Andrew Ng

397,697 views • 1 year ago

This is the future of web design. I generated a website in seconds from a PDF document (with drawings, text, and images). No input from my part. No code. No work whatsoever other than uploading the document. This is incredible.

This is the future of web design. I generated a website in seconds from a PDF document (with drawings, text, and images). No input from my part. No code. No work whatsoever other than uploading the document. This is incredible.

Santiago

103,690 views • 1 year ago

Most organizations run on unstructured content, but it’s the data locked within that content moves business forward. Box Extract is now generally available, delivering agentic data extraction at scale for smart process automation. Combining the latest AI models with advanced OCR capabilities, and agentic approaches that understand document structure and meaning, Box Extract automatically and accurately extracts high quality data from a variety of content to automate workflows, speed content discovery, and drive smarter business decisions.

Most organizations run on unstructured content, but it’s the data locked within that content moves business forward. Box Extract is now generally available, delivering agentic data extraction at scale for smart process automation. Combining the latest AI models with advanced OCR capabilities, and agentic approaches that understand document structure and meaning, Box Extract automatically and accurately extracts high quality data from a variety of content to automate workflows, speed content discovery, and drive smarter business decisions.

Box

759,033 views • 4 months ago

TikTok user reportedly discover a way to read through redacted text in the Epstein files. The user explained that in certain documents, text was only visually covered in black and could be revealed by copying and pasting it into a blank document. “If you simply copy paste some of them into a blank doc the original text appears!” [🎥: evangeleeeen/TT ]

TikTok user reportedly discover a way to read through redacted text in the Epstein files. The user explained that in certain documents, text was only visually covered in black and could be revealed by copying and pasting it into a blank document. “If you simply copy paste some of them into a blank doc the original text appears!” [🎥: evangeleeeen/TT ]

Complex

491,588 views • 5 months ago

IN NEWS: Extend raises a $17M round. "More documents will be processed in the next 6 months than all of history combined since the PDF was invented." - Kushal Byatnal "The global economy runs off of PDF files." "With LLMs, the capabilities are so much higher (for document processing and today's demand for document processing is through the roof.

IN NEWS: Extend raises a $17M round. "More documents will be processed in the next 6 months than all of history combined since the PDF was invented." - Kushal Byatnal "The global economy runs off of PDF files." "With LLMs, the capabilities are so much higher (for document processing and today's demand for document processing is through the roof.

TBPN

34,026 views • 11 months ago

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

LlamaIndex 🦙

143,087 views • 2 years ago

Gemini has always been my go-to LLM for document extraction - but accurate bounding boxes with agentic vision are a huge value-add in the latest Gemini models. Shown: extracting structured data from an oil & gas lease in Ector County Texas with the best bboxes I've seen yet.

Gemini has always been my go-to LLM for document extraction - but accurate bounding boxes with agentic vision are a huge value-add in the latest Gemini models. Shown: extracting structured data from an oil & gas lease in Ector County Texas with the best bboxes I've seen yet.

Kyle Walker

79,504 views • 3 months ago

we built a 40% better pdf search than traditional rag. introducing Nozomio Labs document search. index research papers, sec filings, books, and other complex PDFs, then make them searchable for your AI (thread):

we built a 40% better pdf search than traditional rag. introducing Nozomio Labs document search. index research papers, sec filings, books, and other complex PDFs, then make them searchable for your AI (thread):

Arlan

61,446 views • 3 months ago

LlamaParse now has an official Agent Skill you can use across 40+ agents. With built-in instructions for parsing complex documents, including different formats, tables, charts, and images, your agents gain access to deeper document understanding, not just raw text extraction. 👇 Watch the demo 📖 Read the docs: 🚀 Get started with LlamaCloud:

LlamaParse now has an official Agent Skill you can use across 40+ agents. With built-in instructions for parsing complex documents, including different formats, tables, charts, and images, your agents gain access to deeper document understanding, not just raw text extraction. 👇 Watch the demo 📖 Read the docs: 🚀 Get started with LlamaCloud:

LlamaIndex 🦙

51,845 views • 2 months ago

Pulse (Pulse) has just launched Meridian, an AI-powered financial document processor that can automatically convert any PDF, Word doc, PowerPoint presentation, or image into a structured Excel export with charts and graphs. Congrats on the launch, sid and Ritvik Pandey!

Pulse (Pulse) has just launched Meridian, an AI-powered financial document processor that can automatically convert any PDF, Word doc, PowerPoint presentation, or image into a structured Excel export with charts and graphs. Congrats on the launch, sid and Ritvik Pandey!

Y Combinator

52,596 views • 11 months ago

Discover how to integrate the Box MCP server into agentic workflows using Pydantic and OpenAI's GPT-4.1 mini model. Automate tasks like invoice extraction, file uploads, and document generation with minimal code and maximum efficiency.

Discover how to integrate the Box MCP server into agentic workflows using Pydantic and OpenAI's GPT-4.1 mini model. Automate tasks like invoice extraction, file uploads, and document generation with minimal code and maximum efficiency.

Box

184,439 views • 1 year ago

Build better RAG by letting a team of agents extract and connect your reference materials into a knowledge graph. Our new short course, “Agentic Knowledge Graph Construction,” taught by Neo4j Innovation Lead Andreas Kollegger, shows you how. Knowledge graphs are an important way to store information accurately but they are a lot of work to build manually. In this course you’ll learn how to build a team of agents that turn data– in this case product reviews and invoices from suppliers–into structured graphs of entities and relationships for RAG. Learn how agents can automatically handle the time-consuming work of building graphs — extracting entities and relationships (e.g., Product "contains" Assembly, Part "supplied_by" Supplier, Customer review "mentions" Product), deduplicating them, fact-checking them, and committing them to a graph database — so your retrieval system can find right information to generate accurate output. For example, you can use agents to help trace customer complaints directly to specific suppliers, manufacturing processes, and product hierarchies, thus turning fragmented information into queryable business intelligence. Skills you’ll gain: - Build, store, and access knowledge graphs using the Neo4j graph database - Build multi-agent systems using Google’s Agent Development Kit (ADK) - Set up a loop of agentic workflows to propose and refine a graph schema through fact-checking - Connect agent-generated graphs of unstructured and structured data into a unified knowledge graph This course gets into the practicum of why knowledge graphs give more accurate information retrieval than vector search alone, especially for high-stakes applications where precision matters more than fuzzy similarity matching. Sign up here:

Build better RAG by letting a team of agents extract and connect your reference materials into a knowledge graph. Our new short course, “Agentic Knowledge Graph Construction,” taught by Neo4j Innovation Lead Andreas Kollegger, shows you how. Knowledge graphs are an important way to store information accurately but they are a lot of work to build manually. In this course you’ll learn how to build a team of agents that turn data– in this case product reviews and invoices from suppliers–into structured graphs of entities and relationships for RAG. Learn how agents can automatically handle the time-consuming work of building graphs — extracting entities and relationships (e.g., Product "contains" Assembly, Part "supplied_by" Supplier, Customer review "mentions" Product), deduplicating them, fact-checking them, and committing them to a graph database — so your retrieval system can find right information to generate accurate output. For example, you can use agents to help trace customer complaints directly to specific suppliers, manufacturing processes, and product hierarchies, thus turning fragmented information into queryable business intelligence. Skills you’ll gain: - Build, store, and access knowledge graphs using the Neo4j graph database - Build multi-agent systems using Google’s Agent Development Kit (ADK) - Set up a loop of agentic workflows to propose and refine a graph schema through fact-checking - Connect agent-generated graphs of unstructured and structured data into a unified knowledge graph This course gets into the practicum of why knowledge graphs give more accurate information retrieval than vector search alone, especially for high-stakes applications where precision matters more than fuzzy similarity matching. Sign up here:

Andrew Ng

167,710 views • 9 months ago