
Hrishi
@hrishioa • 11,477 subscribers
Building artificially intelligent bridges at Southbridge, prev-CTO Greywing (YC W21). Chop wood carry water.
Shorts
Videos

Kimi is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE. Here's a slice* of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so. The task involved editing multiple files, reading new context, maintaining agentic state (not forgetting where you were or forgetting instructions). This is a repo with included prompts, notes, plans, lots of things to mistake as instructions and be poisoned by. The output was over 1M tokens of exactly what I asked for, and it wasn't an easy task. What do I mean by agentic model? Very simply put, it's the ability to hold macro instructions in view across an increasing number of turns, and to use primary tools (read, write, edit, shell) consistently without getting lost. Added bonus is the ability to learn from mistakes further up the chain! *edited for brevity and data concerns haha
Hrishi209,968 views • 11 months ago

Opus 4.5 (new model from Anthropic) can do something no model could do until now: It can close the design loop. This is (in my testing) the first model that can reliably do the visual loop of Generate Code ⇛ Render ⇛ Look at it ⇛ Improve. Step 3 (Visual critique) has been a real problem for so long. Models will be myopic or defensive when reviewing designs - it's been hard to tell if this has been a training or a vision problem. Opus 4.5 seems to have fixed it - it's not amazing, but it finally *can* do it. Claude models have historically (once again, in my testing) lagged behind in terms of visual understanding. Gemini for the last year has been the best - but there might be a new leader.
Hrishi57,822 views • 6 months ago

This is scary - ETL pipelines and ORMs are likely going away - or at least I shouldn't be getting paid for doing them anymore. This is AI generating thousands of lines of typespecs and DDLs (with no more context than the dataset), and somehow it's all 100% correct. Rant?👇
Hrishi165,597 views • 2 years ago
No more content to load