Video yükleniyor...
Video Yüklenemedi
flash attention explained by Andrej Andrej Karpathy
168,185 görüntüleme • 1 yıl önce •via X (Twitter)
10 Yorum

@karpathy What lecture is this from?

@karpathy

@karpathy Andrej Karpathy’s tutorials and classes are gold 🤌🏻

@karpathy He openly asks why pytorch can't figure out how to call FA automatically. I think it's because FA has limitations that make it unusable in some cases. eg., you can't pass it an arbitrary attention mask. FlexAttention should fix some of these.

@karpathy He is a master teacher

@karpathy I was wondering what that was about

@karpathy Let's Reproduce GPT-2

@karpathy Thanks for sharing this, Andrej's explanations are always gold.

@karpathy This is who I need to be, always investing the time to dissect the ideas in papers & build from the ground up. My suspicion is Karpathy is given the leeway to do what he wants since he probably doesn't need the money & people/companies just want him around.

@karpathy woah ! It was on my to-do list for tomorrow
