Loading video...

Video Failed to Load

Go Home

Microsoft researchers release bitnet.cpp, the official inference framework for 1-bit LLMs like BitNet b1.58. It has optimized kernels for fast, lossless inference on CPUs, achieving impressive speedups on ARM and x86 CPUs and significant energy reductions.

75,133 views • 1 year ago •via X (Twitter)

9 Comments

BensenHsu's profile picture
BensenHsu1 year ago

This study introduces bitnet.cpp, a software stack designed to enable fast and efficient inference of 1-bit large language models (LLMs), such as BitNet b1.58, on CPUs. The researchers aim to unlock the potential of 1-bit LLMs by developing optimized kernels that can achieve significant speedups and reduce energy consumption compared to existing solutions. The results show that bitnet.cpp significantly outperforms the existing llama.cpp framework in terms of both inference speed and energy consumption: • On the Apple M2 Ultra, bitnet.cpp achieves speedups ranging from 1.37x to 5.07x, with larger models experiencing greater performance gains. • On the Intel i7-13700H, bitnet.cpp achieves speedups ranging from 2.37x to 6.17%, with significant improvements for larger models. • bitnet.cpp reduces energy consumption by 55.4% to 70.0% on the Apple M2 Ultra and 71.9% to 82.2% on the Intel i7-13700H, depending on the model size. full paper:

Emily's profile picture
Emily1 year ago

I hope we see actual world results like LIama 3.2 and other LLMs.

Sandro Hanea's profile picture
Sandro Hanea1 year ago

Cool work! Played a bit with it and indeed it is degrading the quality a bit, but there are definitely usecases for it. Also, worth mentioning that this builds on top of @ggerganov 's llama.cpp and the most of the inference is still using ggml.

Paul Calcraft's profile picture
Paul Calcraft1 year ago

Will you release your own 1.58 bitnet models?

Srini Gundelli's profile picture
Srini Gundelli1 year ago

🫶🏼🤯

ΜΛΛNΙ's profile picture
ΜΛΛNΙ1 year ago

so fast, but sadly there's still degradation of quality...but it's so much better than a say 4bit model of the same size, which is a huge leap. if there's a way to mediate the degradation, you're golden.

تطوير الالعاب - Ludology's profile picture
تطوير الالعاب - Ludology1 year ago

Can this ported to snes ? , jk

🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T's profile picture
🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T1 year ago

Ecosystem services include water , air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Excellent essay!

Desmond's profile picture
Desmond1 year ago

In the implementation of bitnet.cpp’s TL2 Kernel, which compresses every three weights into a 5-bit index with a 1-bit sign, how does the LUT method handle potential collisions or overlapping index values during the computation phase, especially in scenarios involving high-dimensional matrices?

Related Videos