Loading video...

Video Failed to Load

Go Home

Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana

27,886 views • 2 years ago •via X (Twitter)

10 Comments

ViGa's profile picture
ViGa2 years ago

Cross page tables ?

Andrej Baranovskij's profile picture
Andrej Baranovskij2 years ago

Work in progress.

Nasser Builds's profile picture
Nasser Builds2 years ago

Thank you

Sumit Shekhar's profile picture
Sumit Shekhar2 years ago

How is the performance on borderless tables?

Andrej Baranovskij's profile picture
Andrej Baranovskij2 years ago

I tested it with bank statements, they are borderless. And it performs with 95% accuracy

Ashish's profile picture
Ashish2 years ago

Very useful

Marlon's profile picture
Marlon2 years ago

This is a lot more challenging than people realize - I went through a ton of approaches for something table extraction recently, and ended up with a pipeline revolving around a fin tuned table-transformer and gpt4-v with visual cues. Excited to try this out as well

Andrej Baranovskij's profile picture
Andrej Baranovskij2 years ago

Agree 💯

Khalid Jamal- خالد جمال's profile picture
Khalid Jamal- خالد جمال1 year ago

Can it extract equations from scientific PDF papers?

Andrej Baranovskij's profile picture
Andrej Baranovskij1 year ago

Haven’t tried, 7b model I doubt, but 72b model should handle it, depends on complexity

Related Videos

Major program launch: Data Analytics Professional Certificate! This large, five-course sequence takes you all the way to being job-ready as a data analyst, and shows how to use Generative AI as a thought partner to enhance your work in this role. Offered by on Coursera, this is taught by Sean Barnes, Ph.D., a Data Science & Engineering Leader at Netflix. Analyzing data remains one of the most important skills in where the world is going with AI. This comprehensive certificate takes you all the way to being job-ready. Each course comes with practical projects demonstrated in real-world contexts, such as analyzing sales data for a Korean bakery, video game sales trends across different regions, or identifying factors impacting customer retention for a communications company. You'll also work on estimating fire distribution for forest fire prevention, analyzing how a diamond's properties affect its market value, and developing predictive models for retail sales analysis, carbon emissions, and coral reef conservation. Here's some of what you'll learn: - How to define data and categorize it into its many types such as discrete & continuous numerical, structured & unstructured, time series, categorical, and know what insights can be derived from the different types of data categories. - How to differentiate between data-related job roles and their responsibilities, and how data flows through an organization from the moment of capture to decision-making. - How to perform data processing functions and apply conditional formatting in spreadsheets to extract business value from your data using statistical calculations and best practices for visualizing and interpreting data. - How to use LLMs for stakeholder analysis, data exploration, and data visualization. - Best practices for using LLMs for as a thought partner to data analysis work By the end of this professional certificate program, you will have learned core statistical concepts, analysis techniques, and visualization methodologies that will serve as the foundation for working as a data analyst. The world needs more data analysts, especially ones who know how to use modern generative AI. With data science roles projected to grow 36% by 2033, the skills taught in this program create new professional opportunities in data. Sign up here!

Andrew Ng

84,686 views • 1 year ago