Loading video...

Video Failed to Load

Go Home

`transformers` + `torchao` quantization + `torch.compile` for faster inference speed and less memory usage 🔥 Demo of "meta-llama/Meta-Llama-3.1-8B-Instruct" quantized in 4-bit weight-only :

24,515 views • 1 year ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos