Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.

Andrew Ng

1,638,962 subscribers

290,420 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

11 Yorum

⟁ndrew V profil fotoğrafı

⟁ndrew V1 yıl önce

I challenge you to prove that this can accurately and precisely extract typed hierarchy JSON output from 25 PDFs and not miss a beat I don’t care if it takes an hour. And I don’t want to have to predefine the schema. It should iteratively learn, using any reasonable language model, NLP, NER processes and functions to develop the appropriate REGEX and Pydantic models to accomplish perfect fidelity of data extraction. Do you think this is possible?

Matt Figdore profil fotoğrafı

Matt Figdore2 yıl önce

This is the biggest productivity cheat code right now. Kiss reading documents goodbye. You can get an instant summary of any document with this tool.

Awan Farz profil fotoğrafı

Awan Farz1 yıl önce

I was building this with agentic document advance extraction, which is useful.

Vincent Valentine (CEO of UnOpen.ai) profil fotoğrafı

Vincent Valentine (CEO of UnOpen.ai)1 yıl önce

incredible progress. this will definitely streamline the workflow.

Robert Rizk profil fotoğrafı

Robert Rizk1 yıl önce

KEEP BUILDING!

Reverend Christian Moore profil fotoğrafı

Reverend Christian Moore1 yıl önce

عشق منی اندرو جان دلم میخواد بوست کنم ❤️😘

Shine Gupta profil fotoğrafı

Shine Gupta1 yıl önce

That's an insane speed-up! From 135s to just 8s?🔥 Total game-changer for document pipelines. I was working on a similar project recently but used PaddleOCR for extraction.

Mohammed Lubbad, PhD profil fotoğrafı

Mohammed Lubbad, PhD1 yıl önce

The reduction in processing time is impressive. Efficiency allows more room for innovation. How do we leverage this technology for broader applications? 🚀 #InnovationInsights

Tony Sousan profil fotoğrafı

Tony Sousan1 yıl önce

@AndrewYNg, how does this breakthrough change your workflow? Speed is crucial, but what about accuracy? #InnovationPotential

ash prasad profil fotoğrafı

ash prasad1 yıl önce

Amazing, will look to experiment and integrate.

Richy Hashmore profil fotoğrafı

Richy Hashmore1 yıl önce

The improvement drastically cuts down on waiting time.

Benzer Videolar

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Andrew Ng

689,130 görüntüleme • 1 yıl önce

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

Andrew Ng

299,065 görüntüleme • 8 ay önce

Agentic Document Extraction now supports field extraction! Many doc extraction use cases extract specific fields from forms and other structured documents. You can now input a picture or PDF of an invoice, request the vendor name, item list, and prices, and get back the extracted fields. Or input a medical form and specify a schema to extract patient name, patient ID, insurance number, etc. One cool feature: If you don't feel like writing a schema (json specification of what fields to extract) yourself, upload one sample document and write a natural language prompt saying what you want, and we automatically generate a schema for you. See the video for details!

Agentic Document Extraction now supports field extraction! Many doc extraction use cases extract specific fields from forms and other structured documents. You can now input a picture or PDF of an invoice, request the vendor name, item list, and prices, and get back the extracted fields. Or input a medical form and specify a schema to extract patient name, patient ID, insurance number, etc. One cool feature: If you don't feel like writing a schema (json specification of what fields to extract) yourself, upload one sample document and write a natural language prompt saying what you want, and we automatically generate a schema for you. See the video for details!

Andrew Ng

192,883 görüntüleme • 11 ay önce

This AI extracts text, tables, and charts from PDFs with perfect structure and turns them into clean Markdown or JSON. link in comment

This AI extracts text, tables, and charts from PDFs with perfect structure and turns them into clean Markdown or JSON. link in comment

Farhan

23,398 görüntüleme • 8 ay önce

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

New course: Document AI: From OCR to Agentic Doc Extraction, built with LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently:

Andrew Ng

199,562 görüntüleme • 5 ay önce

PDFs suck. We just raised $17,000,000 in funding to fix this problem once and for all. Extend is building the modern document processing cloud. See how Brex, Square, Checkr, and Fortune 500s use it to process millions of documents:

PDFs suck. We just raised $17,000,000 in funding to fix this problem once and for all. Extend is building the modern document processing cloud. See how Brex, Square, Checkr, and Fortune 500s use it to process millions of documents:

Kushal Byatnal

210,936 görüntüleme • 1 yıl önce

Agent Fill (our agentic PDF filler) just got smarter: - Hover to see exactly why each field was filled - Review and complete missing fields directly in the editor We've been working with finance and ops teams to save hours every week - thank you all for the feedback. Try it out today and automate PDFs.

Agent Fill (our agentic PDF filler) just got smarter: - Hover to see exactly why each field was filled - Review and complete missing fields directly in the editor We've been working with finance and ops teams to save hours every week - thank you all for the feedback. Try it out today and automate PDFs.

Ramp Labs

13,513 görüntüleme • 9 ay önce

PLEASE give me some direction… …but more importantly some time to recover after watching Caroline Bowman perform “They Just Keep Moving The Line” from #SmashBway.

PLEASE give me some direction… …but more importantly some time to recover after watching Caroline Bowman perform “They Just Keep Moving The Line” from #SmashBway.

SmashBway

11,960 görüntüleme • 1 yıl önce

I just built a Voice AI Agent for insurance claims using Gemini 3.1 Flash Live and Google ADK. Talk to it. It fills the intake form, extracts claim details, and routes to an adjuster in real-time. 100% Opensource.

I just built a Voice AI Agent for insurance claims using Gemini 3.1 Flash Live and Google ADK. Talk to it. It fills the intake form, extracts claim details, and routes to an adjuster in real-time. 100% Opensource.

Shubham Saboo

71,739 görüntüleme • 1 ay önce

Got questions about NELFUND student loans? You’re not alone From application status to disbursement details, we’ve sorted out some of the most frequently asked questions from students just like you Still need help? Visit or check your student dashboard for updates #NELFUNDStudentLoan

Got questions about NELFUND student loans? You’re not alone From application status to disbursement details, we’ve sorted out some of the most frequently asked questions from students just like you Still need help? Visit or check your student dashboard for updates #NELFUNDStudentLoan

NELFUND

12,980 görüntüleme • 10 ay önce

NotebookLM from Google is just mindblowing stuff. 🤯 Feed any document, pdfs, URLs or just plain text, and it will create such a realistic discussion on that content. Absolutely falling in love with it. I let NotebookLM, talk about NotebookLM by feeding it just some plain text about NotebookLM 😄 The possibilities are endless.

NotebookLM from Google is just mindblowing stuff. 🤯 Feed any document, pdfs, URLs or just plain text, and it will create such a realistic discussion on that content. Absolutely falling in love with it. I let NotebookLM, talk about NotebookLM by feeding it just some plain text about NotebookLM 😄 The possibilities are endless.

Rohan Paul

68,398 görüntüleme • 1 yıl önce

Check this!! You can get LLM-ready text from ANY website in 2 mins. Using Firecrawl's /llmstxt endpoint, transform ANY website into clean LLM-ready text—by just specifying a URL. Use this data for RAG, training LLMs, and more. Everything is just 5 lines of code!

Check this!! You can get LLM-ready text from ANY website in 2 mins. Using Firecrawl's /llmstxt endpoint, transform ANY website into clean LLM-ready text—by just specifying a URL. Use this data for RAG, training LLMs, and more. Everything is just 5 lines of code!

Avi Chawla

12,936 görüntüleme • 1 yıl önce

Our Compute Agent (Operator) Got Much Faster!! It's about 3- 4x faster. In this example, it finds the cheapest flights from London to NYC and emails you the details. In all fairness, we are still 2x slower than OpenAI. We will continue to make it faster and more agentic. The next update will be later this week.

Our Compute Agent (Operator) Got Much Faster!! It's about 3- 4x faster. In this example, it finds the cheapest flights from London to NYC and emails you the details. In all fairness, we are still 2x slower than OpenAI. We will continue to make it faster and more agentic. The next update will be later this week.

Bindu Reddy

20,716 görüntüleme • 1 yıl önce

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Andrej Baranovskij

30,645 görüntüleme • 1 yıl önce

Get ready for lots of silly Subverse clips from the latest stream~ I wanted to give some thanks and gratitude in light of Christmas time but the kloi were not having it!

Get ready for lots of silly Subverse clips from the latest stream~ I wanted to give some thanks and gratitude in light of Christmas time but the kloi were not having it!

SingingSamine🎵

74,132 görüntüleme • 1 yıl önce

please give him more screen time… the screen time of him we got in these two episodes was this much, and it’s just not enough for us 😭😭 #StudyGroup

please give him more screen time… the screen time of him we got in these two episodes was this much, and it’s just not enough for us 😭😭 #StudyGroup

kdrama diary

26,085 görüntüleme • 1 yıl önce

The median home-buyer just hit 59 years old. Up from 39. Meanwhile, first time homebuyers just hit 40 — a little late to be starting a family. Home prices were supposed to come down after rate hikes and Bidenflation. They did not.

The median home-buyer just hit 59 years old. Up from 39. Meanwhile, first time homebuyers just hit 40 — a little late to be starting a family. Home prices were supposed to come down after rate hikes and Bidenflation. They did not.

Peter St Onge, Ph.D.

227,035 görüntüleme • 6 ay önce

Typing is great, but speaking is faster. Bring Gemini directly into your desktop workflow to seamlessly reformat text between apps using just your voice. Watch it pull raw details from a plain document and transform them into a festive, emoji-filled email invite exactly where you need it.

Typing is great, but speaking is faster. Bring Gemini directly into your desktop workflow to seamlessly reformat text between apps using just your voice. Watch it pull raw details from a plain document and transform them into a festive, emoji-filled email invite exactly where you need it.

Google Gemini

30,759 görüntüleme • 1 ay önce

Open-weight MiniMax M3 filled out a US customs form from a driver's license photo For this test we deployed MiniMax M3 Q4 using MLX-VLM on a Mac Studio M3 Ultra 512GB RAM. The model was tasked with reading a scanned document and an ID card photo, then completing a declaration form Output: 736 tokens · Input: 1,847 tokens · Time: ~31s The model analyzed both inputs, streamed its reasoning, and then called three tools: write_field for text fields, mark for Yes/No checkboxes, and sign for the signature and date. It extracted the required information, mapped it to the correct fields and completed the form without any manual input

Open-weight MiniMax M3 filled out a US customs form from a driver's license photo For this test we deployed MiniMax M3 Q4 using MLX-VLM on a Mac Studio M3 Ultra 512GB RAM. The model was tasked with reading a scanned document and an ID card photo, then completing a declaration form Output: 736 tokens · Input: 1,847 tokens · Time: ~31s The model analyzed both inputs, streamed its reasoning, and then called three tools: write_field for text fields, mark for Yes/No checkboxes, and sign for the signature and date. It extracted the required information, mapped it to the correct fields and completed the form without any manual input

atomic.chat

107,294 görüntüleme • 7 gün önce

OpenAI just dropped GPT-4.1. It’s a huge jump over GPT-4o on basically every metric, and importantly document processing and data extraction from enterprise content. Box is now offering it in beta in the Box AI Studio, and will be rolling out to everyone shortly.

OpenAI just dropped GPT-4.1. It’s a huge jump over GPT-4o on basically every metric, and importantly document processing and data extraction from enterprise content. Box is now offering it in beta in the Box AI Studio, and will be rolling out to everyone shortly.

Aaron Levie

108,961 görüntüleme • 1 yıl önce