Loading video...
Video Failed to Load
Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana
27,886 views • 2 years ago •via X (Twitter)
10 Comments

ViGa2 years ago
Cross page tables ?

Andrej Baranovskij2 years ago
Work in progress.

Nasser Builds2 years ago
Thank you

Sumit Shekhar2 years ago
How is the performance on borderless tables?

Andrej Baranovskij2 years ago
I tested it with bank statements, they are borderless. And it performs with 95% accuracy

Ashish2 years ago
Very useful

Marlon2 years ago
This is a lot more challenging than people realize - I went through a ton of approaches for something table extraction recently, and ended up with a pipeline revolving around a fin tuned table-transformer and gpt4-v with visual cues. Excited to try this out as well

Andrej Baranovskij2 years ago
Agree 💯

Khalid Jamal- خالد جمال1 year ago
Can it extract equations from scientific PDF papers?

Andrej Baranovskij1 year ago
Haven’t tried, 7b model I doubt, but 72b model should handle it, depends on complexity


