Video wird geladen...
Video konnte nicht geladen werden
Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana
27,886 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

ViGavor 2 Jahren
Cross page tables ?

Andrej Baranovskijvor 2 Jahren
Work in progress.

Nasser Buildsvor 2 Jahren
Thank you

Sumit Shekharvor 2 Jahren
How is the performance on borderless tables?

Andrej Baranovskijvor 2 Jahren
I tested it with bank statements, they are borderless. And it performs with 95% accuracy

Ashishvor 2 Jahren
Very useful

Marlonvor 2 Jahren
This is a lot more challenging than people realize - I went through a ton of approaches for something table extraction recently, and ended up with a pipeline revolving around a fin tuned table-transformer and gpt4-v with visual cues. Excited to try this out as well

Andrej Baranovskijvor 2 Jahren
Agree 💯

Khalid Jamal- خالد جمالvor 1 Jahr
Can it extract equations from scientific PDF papers?

Andrej Baranovskijvor 1 Jahr
Haven’t tried, 7b model I doubt, but 72b model should handle it, depends on complexity


