Loading video...

Video Failed to Load

Go Home

๐Ÿš€ New in Heptabase: PDF Parser - Extract text, tables, equations & images (even from scanned PDFs) โ†’ save as clean Markdown - Ask AI with precise context (page ranges & paragraphs) โ†’ responses include link refs back to the source - MAX mode ensures AI reads every word...

36,363 views โ€ข 9 months ago โ€ขvia X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

PDF parsing is still painful because LLMs reorder text in complex layouts, break tables across pages, and fail on graphs or images. ๐Ÿ’กTesting the new open-source OCRFlux model, and here the results are really good for a change. So OCRFlux is a multimodal, LLM based toolkit for converting PDFs and images into clean, readable, plain Markdown text. Because the underlying VLM is only 3B param, it runs even on a 3090 GPU. The model is available on Hugging Face . The engine that powers the OCRFlux, teaches the model to rebuild every page and then stitch fragments across pages into one clean Markdown file. It bundles one vision language model with 3B parameters that was fine-tuned from Qwen 2.5-VL-3B-Instruct for both page parsing and cross-page merging. OCRFlux reads raw page images and, guided by task prompts, outputs Markdown for each page and merges split elements across pages. The evaluation shows Edit Distance Similarity (EDS) 0.967 and crossโ€‘page table Tree Edit Distance 0.950, so the parser is both accurate and layout aware. How it works while parsing each page - Convert into text with a natural reading order, even in the presence of multi-column layouts, figures, and insets - Support for complicated tables and equations - Automatically removes headers and footers Cross-page table/paragraph merging - Cross-page table merging - Cross-page paragraph merging A compact visionโ€‘language models can beat bigger models once crossโ€‘page context is added. ๐Ÿงต 1/n Read on ๐Ÿ‘‡

Rohan Paul

149,264 views โ€ข 11 months ago