Loading video...
Video Failed to Load
`transformers` + `torchao` quantization + `torch.compile` for faster inference speed and less memory usage 🔥 Demo of "meta-llama/Meta-Llama-3.1-8B-Instruct" quantized in 4-bit weight-only :
24,515 views • 1 year ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here

