Загрузка видео...
Не удалось загрузить видео
Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana
27,886 просмотров • 2 лет назад •via X (Twitter)
Комментарии: 10

ViGa2 лет назад
Cross page tables ?

Andrej Baranovskij2 лет назад
Work in progress.

Nasser Builds2 лет назад
Thank you

Sumit Shekhar2 лет назад
How is the performance on borderless tables?

Andrej Baranovskij2 лет назад
I tested it with bank statements, they are borderless. And it performs with 95% accuracy

Ashish2 лет назад
Very useful

Marlon2 лет назад
This is a lot more challenging than people realize - I went through a ton of approaches for something table extraction recently, and ended up with a pipeline revolving around a fin tuned table-transformer and gpt4-v with visual cues. Excited to try this out as well

Andrej Baranovskij2 лет назад
Agree 💯

Khalid Jamal- خالد جمال1 год назад
Can it extract equations from scientific PDF papers?

Andrej Baranovskij1 год назад
Haven’t tried, 7b model I doubt, but 72b model should handle it, depends on complexity


