Horace He's banner

Horace He

@cHHillee • 49,716 subscribers

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

Shorts

Happy to OSS gpt-fast, a fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more! Code: Blog: (1/12)

Happy to OSS gpt-fast, a fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more! Code: Blog: (1/12)

476,827 görüntüleme

Grok-1 support in gpt-fast at faster(?) than anyone else has reported so far. 75 tok/s for a 300B+ parameter model on an 8xA100 node. If I understand correctly, ColossalAI reported 15 seconds to generate 100 tokens. gpt-fast takes 4.2 seconds to generate *400* tokens.

Grok-1 support in gpt-fast at faster(?) than anyone else has reported so far. 75 tok/s for a 300B+ parameter model on an 8xA100 node. If I understand correctly, ColossalAI reported 15 seconds to generate 100 tokens. gpt-fast takes 4.2 seconds to generate 400 tokens.

54,137 görüntüleme