
Surya
@sdand • 15,603 subscribers
prev #1 on polymarket by pnl, made the #1 chatgpt plugin
Shorts
Videos

Can computer-use models play games now, one-shot? I gave Claude Opus 4.5 a simple prompt like "play league of legends" and it starts clicking and typing around my computer pretty effectively even though it doesn't win due to latency More interestingly between Minecraft, finding car insurance, and booking flights I noticed some emergent behavior: persistently maximizing EV even if that involves shortcuts
Surya410,812 次观看 • 6 个月前

claude playing TFT -- i ran it for two games and it improved in-context from 0 to 3/30 rounds won. it figured out "3-starring" on its own (buying pairs to upgrade units) which is a core mechanic of this game and hasnt been instructed on how to play besides asking it to "play tft and win" comparing this cua agent to OpenAI Five's result from 2017 which is a different game, model(a big LSTM), had access to Dota's API outputted a discrete action at 4.6hz this cua agent runs at 0.15hz considering it takes anywhere from 2-15 seconds to think(0.07hz without), and is just pixels in, any mouse/keyboard out... and (potentially) wasnt trained on TFT specifically but not sure despite this 2-15s seems to be fine for a turned based game like TFT but in prior videos like league/minecraft is considerably harder for a cua agent to do since by the time you take a screenshot the game state may have changed and you've wasted 7 seconds thinking, and another 7 seconds thinking about the new game state, and catastrophically failing like that. the intention here is not to play TFT or league with a bot. please dont use it as such
Surya221,997 次观看 • 6 个月前

I made a RL policy that guesses where a picture was taken without GPS data It continuously learns, updating its weights with every use in realtime -- over the weekend it improved 13.9% with <100 images Best of all, it does this without ever storing any image data, link below
Surya Dantuluri166,853 次观看 • 7 个月前

inspired by SDPO, i made continualcode -- a minimal claude code that learns from your corrections in real-time, built on tinker. when you deny a diff, the model uses your correction as context to teach itself, takes a gradient step on LoRA, and retries with updated weights. claude code but it updates the model weights!
Surya82,476 次观看 • 3 个月前

Introducing vmux - incredibly fast, stateful cloud sandboxes for coding agents for the first time you get persistent GPU/CPU sandboxes via Modal/CF backed by Durable Objects to stream logs live, native preview URLs, and attach a real shell spin up a notebook or train nanogpt via codex - with a Modal sandbox spun up in seconds
Surya75,173 次观看 • 3 个月前

deploy to preview in under 3 seconds cold with vmux! i told a friend i didnt have any modal credits left on saturday, by monday night i rolled my own cpu modal - seems prevalent if llms can code you should give them easy access(besides ssh) to provision, run code, and deploy containers fast
Surya Dantuluri12,275 次观看 • 5 个月前
没有更多内容可加载