Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Microsoft researchers release bitnet.cpp, the official inference framework for 1-bit LLMs like BitNet b1.58. It has optimized kernels for fast, lossless inference on CPUs, achieving impressive speedups on ARM and x86 CPUs and significant energy reductions.

75,133 görüntüleme • 1 yıl önce •via X (Twitter)

9 Yorum

BensenHsu profil fotoğrafı
BensenHsu1 yıl önce

This study introduces bitnet.cpp, a software stack designed to enable fast and efficient inference of 1-bit large language models (LLMs), such as BitNet b1.58, on CPUs. The researchers aim to unlock the potential of 1-bit LLMs by developing optimized kernels that can achieve significant speedups and reduce energy consumption compared to existing solutions. The results show that bitnet.cpp significantly outperforms the existing llama.cpp framework in terms of both inference speed and energy consumption: • On the Apple M2 Ultra, bitnet.cpp achieves speedups ranging from 1.37x to 5.07x, with larger models experiencing greater performance gains. • On the Intel i7-13700H, bitnet.cpp achieves speedups ranging from 2.37x to 6.17%, with significant improvements for larger models. • bitnet.cpp reduces energy consumption by 55.4% to 70.0% on the Apple M2 Ultra and 71.9% to 82.2% on the Intel i7-13700H, depending on the model size. full paper:

Emily profil fotoğrafı
Emily1 yıl önce

I hope we see actual world results like LIama 3.2 and other LLMs.

Sandro Hanea profil fotoğrafı
Sandro Hanea1 yıl önce

Cool work! Played a bit with it and indeed it is degrading the quality a bit, but there are definitely usecases for it. Also, worth mentioning that this builds on top of @ggerganov 's llama.cpp and the most of the inference is still using ggml.

Paul Calcraft profil fotoğrafı
Paul Calcraft1 yıl önce

Will you release your own 1.58 bitnet models?

Srini Gundelli profil fotoğrafı
Srini Gundelli1 yıl önce

🫶🏼🤯

ΜΛΛNΙ profil fotoğrafı
ΜΛΛNΙ1 yıl önce

so fast, but sadly there's still degradation of quality...but it's so much better than a say 4bit model of the same size, which is a huge leap. if there's a way to mediate the degradation, you're golden.

تطوير الالعاب - Ludology profil fotoğrafı
تطوير الالعاب - Ludology1 yıl önce

Can this ported to snes ? , jk

🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T profil fotoğrafı
🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T1 yıl önce

Ecosystem services include water , air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Excellent essay!

Desmond profil fotoğrafı
Desmond1 yıl önce

In the implementation of bitnet.cpp’s TL2 Kernel, which compresses every three weights into a 5-bit index with a 1-bit sign, how does the LUT method handle potential collisions or overlapping index values during the computation phase, especially in scenarios involving high-dimensional matrices?

Benzer Videolar