will depue's banner
will depue's profile picture

will depue

@willdepue65,929 subscribers

dei ex machina ex-@openai (sora 1 & 2, posttraining o3/4o, pretraining moonshots)

Shorts

⚡️Introducing WebGPT⚡️ Just this month, Chrome announced WebGPU's release. What does this mean? Near-native GPU speeds, from the web! I took the opportunity to build WebGPT: a package to run GPT models entirely on the browser. Here's why this is a big deal:

⚡️Introducing WebGPT⚡️ Just this month, Chrome announced WebGPU's release. What does this mean? Near-native GPU speeds, from the web! I took the opportunity to build WebGPT: a package to run GPT models entirely on the browser. Here's why this is a big deal:

977,711 views

sora is launching today to all chatgpt pro and plus users! it's been a big effort to make this possible + i think the product is really fun & intuitive. my fav thing to do is generate fake historical found footage. video inpainting is also really strong. have fun!

sora is launching today to all chatgpt pro and plus users! it's been a big effort to make this possible + i think the product is really fun & intuitive. my fav thing to do is generate fake historical found footage. video inpainting is also really strong. have fun!

134,004 views

moving into a new apartment. god bless america.

moving into a new apartment. god bless america.

50,349 views

turned 22 and saw the incredible mr bob dylan and willie nelson tonight. i now plan to ride out the singularity in minnesota, spending my days learning the harmonica.

turned 22 and saw the incredible mr bob dylan and willie nelson tonight. i now plan to ride out the singularity in minnesota, spending my days learning the harmonica.

18,246 views

Videos

willdepue's profile picture

you tend to hear this a lot from people outside or new to ML, and I often point to a talk Ilya gave a few years back: 1) think of any decent deep neural net that has enough memory and sequential ops as just a big parallel computer 2) training this neural net is doing search over computer programs that maximize your objective 3)unless you have some large bottleneck (and given you can successfully optimize this system) you’ll find that these parallel computers are highly robust to architectural changes. 4) this is because computers are great at simulating each other. your new architecture can usually be straightforwardly simulated ‘inside’ your old architecture. 5) it’s not that architecture doesn’t matter, but it mostly matters with respect to (1) fundamental bottlenecks in this parallel computer (2) modifications that make models easier to optimize, since this argument only holds if your optimization is good (3) compute efficiency/system efficiency wins that make learning easier or faster. 6) it’s quite possible that new architectures will lead to breakthroughs in machine learning, but we should first start with bottlenecks, not naturalist intuitions about the ‘form’ of AI should take. until you understand this it seems surprising that small models trained longer are better than undertrained big models, that depth and width are surprisingly interchangeable, that talking to a model with an MoE or sparse attention or linear attention is approximately the same iso evals.

will depue

214,428 views • 5 months ago

No more content to load