正在加载视频...

视频加载失败

from Jeff Dean at Dwarkesh Patel podcast: "asynchronous training where each copy of the model does local computation [...] it makes people uncomfortable [...] but it actually works" yep, i can confirm, it does work for real

101,448 次观看 • 1 年前 •via X (Twitter)

10 条评论

Arthur Douillard 的头像
Arthur Douillard1 年前

@dwarkesh_sp some people actually came to me in SF and told me "but DiLoCo is actually working!" being very surprised that it wasn't just another paper misleading with outlandish claims learn more:

Fast Company 的头像
Fast Company1 年前

“In our cross-functional teams, everyone has an equal seat at the table because everyone will bring different perspectives and expertise.” In this @Atlassian-sponsored podcast, learn more about why the @Hot_Wheels #brand has had such staying power. #collaboration #ad

Kal 的头像
Kal1 年前

@JeffDean @dwarkesh_sp Does this diminish the value of ultra-high-bandwidth, low-latency interconnects like InfiniBand, or are they still important?

Arthur Douillard 的头像
Arthur Douillard1 年前

@JeffDean @dwarkesh_sp methods like DiLoCo adds a new axis, in our published experiments but also in @PrimeIntellect's Intellect-1 and @flwrlabs's Photon, you have multiple levels of parallelism: in the order of required bandwidth/latency: tensor parallelism > (fs)dp > diloco

Ben (e/treats) 的头像
Ben (e/treats)1 年前

@JeffDean @dwarkesh_sp diloco is the nightmare of people who think we can just ban technology to prevent danger

O 的头像
O1 年前

@JeffDean @dwarkesh_sp How far is this from hogwild? I'm a bit out of the loop on the latest

Gwen Cheni 的头像
Gwen Cheni1 年前

@JeffDean @dwarkesh_sp Omg I did this! And I thought I was so clever, but of course Jeff's team already did it a decade ago 😂

Blake Camp 的头像
Blake Camp1 年前

@JeffDean @dwarkesh_sp @Ar_Douillard, I'm curious about your thoughts on optimizing the longevity of modules or sub-networks, such that they may still be useful in many future/downstream models. Will tomorrow's large open source models be composed of recycled parts?

Brian Jordan 的头像
Brian Jordan1 年前

@JeffDean @dwarkesh_sp wild

Siddharth 的头像
Siddharth1 年前

@JeffDean @dwarkesh_sp If Jeff says it works, it just does. Even physics can't change that

相关视频