AlphaSignal AI's banner

AlphaSignal AI

@AlphaSignalAI • 13,130 subscribers

The latest news from the top 100 companies in AI. Over 280,000 devs read our newsletter.

Shorts

A peanut-sized Chinese model just dethroned Gemini at reading documents. GLM-OCR is a 0.9B parameter vision-language model. It scores 94.62 on OmniDocBench V1.5, ranking #1 overall. For context, it outperforms models 100x its size. 100% open-source. It works in two stages. 1. A layout engine detects every region in a document. 2. Each region gets read in parallel. The model predicts multiple tokens per step instead of one. That's what makes it so fast at small size. It handles things most OCR tools struggle with: > Complex tables and nested layouts > Handwritten text and stamps > Math formulas and code blocks > Mixed image-and-text documents You can run it locally through Ollama. It fits on edge devices with limited compute. Every expensive OCR API just got a free competitor.

A peanut-sized Chinese model just dethroned Gemini at reading documents. GLM-OCR is a 0.9B parameter vision-language model. It scores 94.62 on OmniDocBench V1.5, ranking #1 overall. For context, it outperforms models 100x its size. 100% open-source. It works in two stages. 1. A layout engine detects every region in a document. 2. Each region gets read in parallel. The model predicts multiple tokens per step instead of one. That's what makes it so fast at small size. It handles things most OCR tools struggle with: > Complex tables and nested layouts > Handwritten text and stamps > Math formulas and code blocks > Mixed image-and-text documents You can run it locally through Ollama. It fits on edge devices with limited compute. Every expensive OCR API just got a free competitor.

91,821 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

This is so cool. Generate sleek architecture diagrams for any GitHub repo or local project in seconds. It’s a single Markdown skill file that you can import and use right away in Claude.

This is so cool. Generate sleek architecture diagrams for any GitHub repo or local project in seconds. It’s a single Markdown skill file that you can import and use right away in Claude.

83,617 views • 3 months ago

Google released a 1MB AI model that quickly catches malware disguised as PDFs. Magika is an open source file detection system. It doesn't trust file extensions. Instead, it reads the actual content to figure out what a file really is. That means malware disguised as a PDF gets caught. Hidden scripts inside images get flagged. Fake extensions don't fool it. It runs on a single CPU and classifies files in about 5ms. It supports 200+ file types and hits ~99% accuracy across ~100M training samples. It works as: > A Rust command line tool > A Python library via pip > JavaScript and Go bindings Google already uses it across Gmail, Drive, and Safe Browsing. Now anyone can run it locally, scan entire directories, and build it into their own security pipelines.

Google released a 1MB AI model that quickly catches malware disguised as PDFs. Magika is an open source file detection system. It doesn't trust file extensions. Instead, it reads the actual content to figure out what a file really is. That means malware disguised as a PDF gets caught. Hidden scripts inside images get flagged. Fake extensions don't fool it. It runs on a single CPU and classifies files in about 5ms. It supports 200+ file types and hits ~99% accuracy across ~100M training samples. It works as: > A Rust command line tool > A Python library via pip > JavaScript and Go bindings Google already uses it across Gmail, Drive, and Safe Browsing. Now anyone can run it locally, scan entire directories, and build it into their own security pipelines.

13,164 views • 3 months ago

No more content to load