
John T Davies 🇪🇺
@jtdavies • 2,954 subscribers
Entrepreneur, CTO in AI & FinTech, investor, father to 3 grown boys, husband to Rachel, astrophysicist, keen photographer, cyclist, über-geek, travelled a lot.
Videos

MLX + TurboQuant = Local Super Power Take a local (private) document(s) or codebase, pre-fill the 256k KV cache (the context) with the document(s) and system prompt, quantise and run on Apple's MLX and you have almost instantaneous, lossless document queries with total privacy. For a 75-page PDF (some 30k tokens), I was seeing sub-150ms TTFT (Time To First Token), that's "instant" for a human. We run this on an office server (one of our Mac Minis) for staff to work with, instant answers on critical, confidential company/client documents on phone, iPads or laptops. Imagine the "What were the 3Q25 profits?", "When did Joe start working for us?", "Please list the project milestones and team count" answered within seconds of hitting the return key. It works with voice too. This example is 17k lines of code from 177 source files plus an index of the entire 1,902 files, the entire Claude Code codebase (oops Anthropic!). Zero loss (unlike Anthropic's security). What this means is the ability to work better on local (private) code bases, off-the-shelf private document analysis with better response than the best public models - all locally.
John T Davies 🇪🇺85,589 görüntüleme • 2 ay önce

I finally got tool-calling working on the Qwen3.5 on iPhone. Just over 20 toks/sec and so far looking very useful. Especially when I get some more local tools working to access things on the phone. Uploading to GitHub when I've done a little more testing.
John T Davies 🇪🇺27,435 görüntüleme • 3 ay önce
Daha fazla içerik yok.