正在加载视频...

视频加载失败

flash attention explained by Andrej Andrej Karpathy

168,185 次观看 • 1 年前 •via X (Twitter)

10 条评论

water 的头像
water1 年前

@karpathy What lecture is this from?

ℏεsam 的头像
ℏεsam1 年前

@karpathy

Mathias Mendoza 的头像
Mathias Mendoza1 年前

@karpathy Andrej Karpathy’s tutorials and classes are gold 🤌🏻

Zack Angelo 的头像
Zack Angelo1 年前

@karpathy He openly asks why pytorch can't figure out how to call FA automatically. I think it's because FA has limitations that make it unusable in some cases. eg., you can't pass it an arbitrary attention mask. FlexAttention should fix some of these.

Albert Buchard 🇪🇺 的头像
Albert Buchard 🇪🇺1 年前

@karpathy He is a master teacher

high_byte 的头像
high_byte1 年前

@karpathy I was wondering what that was about

ℏεsam 的头像
ℏεsam1 年前

@karpathy Let's Reproduce GPT-2

RecurseChat 的头像
RecurseChat1 年前

@karpathy Thanks for sharing this, Andrej's explanations are always gold.

Xirtam Esrevni 的头像
Xirtam Esrevni1 年前

@karpathy This is who I need to be, always investing the time to dissect the ideas in papers & build from the ground up. My suspicion is Karpathy is given the leeway to do what he wants since he probably doesn't need the money & people/companies just want him around.

schatt 的头像
schatt1 年前

@karpathy woah ! It was on my to-do list for tomorrow

相关视频