Hrishi's banner

Hrishi

@hrishioa • 11,477 subscribers

Building artificially intelligent bridges at Southbridge, prev-CTO Greywing (YC W21). Chop wood carry water.

Shorts

Just a reminder that Opus 1m is still way cheaper than GPT-4 (the 32K version)

Just a reminder that Opus 1m is still way cheaper than GPT-4 (the 32K version)

50,633 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Kimi is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE. Here's a slice* of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so. The task involved editing multiple files, reading new context, maintaining agentic state (not forgetting where you were or forgetting instructions). This is a repo with included prompts, notes, plans, lots of things to mistake as instructions and be poisoned by. The output was over 1M tokens of exactly what I asked for, and it wasn't an easy task. What do I mean by agentic model? Very simply put, it's the ability to hold macro instructions in view across an increasing number of turns, and to use primary tools (read, write, edit, shell) consistently without getting lost. Added bonus is the ability to learn from mistakes further up the chain! *edited for brevity and data concerns haha

Kimi is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE. Here's a slice* of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so. The task involved editing multiple files, reading new context, maintaining agentic state (not forgetting where you were or forgetting instructions). This is a repo with included prompts, notes, plans, lots of things to mistake as instructions and be poisoned by. The output was over 1M tokens of exactly what I asked for, and it wasn't an easy task. What do I mean by agentic model? Very simply put, it's the ability to hold macro instructions in view across an increasing number of turns, and to use primary tools (read, write, edit, shell) consistently without getting lost. Added bonus is the ability to learn from mistakes further up the chain! *edited for brevity and data concerns haha

209,968 views • 1 year ago

This is genuinely blowing my mind - four years of everything we've done at Greywing, finished in 60 seconds The rest is just me fooling around. Before you ask it's not the Assistants API - that's why we have interactive charts, abort, <200ms latency.

This is genuinely blowing my mind - four years of everything we've done at Greywing, finished in 60 seconds The rest is just me fooling around. Before you ask it's not the Assistants API - that's why we have interactive charts, abort, <200ms latency.

453,035 views • 2 years ago

Opus 4.5 (new model from Anthropic) can do something no model could do until now: It can close the design loop. This is (in my testing) the first model that can reliably do the visual loop of Generate Code ⇛ Render ⇛ Look at it ⇛ Improve. Step 3 (Visual critique) has been a real problem for so long. Models will be myopic or defensive when reviewing designs - it's been hard to tell if this has been a training or a vision problem. Opus 4.5 seems to have fixed it - it's not amazing, but it finally *can* do it. Claude models have historically (once again, in my testing) lagged behind in terms of visual understanding. Gemini for the last year has been the best - but there might be a new leader.

Opus 4.5 (new model from Anthropic) can do something no model could do until now: It can close the design loop. This is (in my testing) the first model that can reliably do the visual loop of Generate Code ⇛ Render ⇛ Look at it ⇛ Improve. Step 3 (Visual critique) has been a real problem for so long. Models will be myopic or defensive when reviewing designs - it's been hard to tell if this has been a training or a vision problem. Opus 4.5 seems to have fixed it - it's not amazing, but it finally can do it. Claude models have historically (once again, in my testing) lagged behind in terms of visual understanding. Gemini for the last year has been the best - but there might be a new leader.

57,822 views • 8 months ago

This is scary - ETL pipelines and ORMs are likely going away - or at least I shouldn't be getting paid for doing them anymore. This is AI generating thousands of lines of typespecs and DDLs (with no more context than the dataset), and somehow it's all 100% correct. Rant?👇

This is scary - ETL pipelines and ORMs are likely going away - or at least I shouldn't be getting paid for doing them anymore. This is AI generating thousands of lines of typespecs and DDLs (with no more context than the dataset), and somehow it's all 100% correct. Rant?👇

165,597 views • 2 years ago

WalkingRAG is finally out! After so many versions and prototypes the UI is done, I couldn't help but record a quick demo Glad to have this out as we close the week, pretty excited to get this in the hands of customers 🚀

WalkingRAG is finally out! After so many versions and prototypes the UI is done, I couldn't help but record a quick demo Glad to have this out as we close the week, pretty excited to get this in the hands of customers 🚀

40,508 views • 2 years ago

No more content to load