Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Microsoft researchers release bitnet.cpp, the official inference framework for 1-bit LLMs like BitNet b1.58. It has optimized kernels for fast, lossless inference on CPUs, achieving impressive speedups on ARM and x86 CPUs and significant energy reductions.

75,133 Aufrufe • vor 1 Jahr •via X (Twitter)

9 Kommentare

Profilbild von BensenHsu
BensenHsuvor 1 Jahr

This study introduces bitnet.cpp, a software stack designed to enable fast and efficient inference of 1-bit large language models (LLMs), such as BitNet b1.58, on CPUs. The researchers aim to unlock the potential of 1-bit LLMs by developing optimized kernels that can achieve significant speedups and reduce energy consumption compared to existing solutions. The results show that bitnet.cpp significantly outperforms the existing llama.cpp framework in terms of both inference speed and energy consumption: • On the Apple M2 Ultra, bitnet.cpp achieves speedups ranging from 1.37x to 5.07x, with larger models experiencing greater performance gains. • On the Intel i7-13700H, bitnet.cpp achieves speedups ranging from 2.37x to 6.17%, with significant improvements for larger models. • bitnet.cpp reduces energy consumption by 55.4% to 70.0% on the Apple M2 Ultra and 71.9% to 82.2% on the Intel i7-13700H, depending on the model size. full paper:

Profilbild von Emily
Emilyvor 1 Jahr

I hope we see actual world results like LIama 3.2 and other LLMs.

Profilbild von Sandro Hanea
Sandro Haneavor 1 Jahr

Cool work! Played a bit with it and indeed it is degrading the quality a bit, but there are definitely usecases for it. Also, worth mentioning that this builds on top of @ggerganov 's llama.cpp and the most of the inference is still using ggml.

Profilbild von Paul Calcraft
Paul Calcraftvor 1 Jahr

Will you release your own 1.58 bitnet models?

Profilbild von Srini Gundelli
Srini Gundellivor 1 Jahr

🫶🏼🤯

Profilbild von ΜΛΛNΙ
ΜΛΛNΙvor 1 Jahr

so fast, but sadly there's still degradation of quality...but it's so much better than a say 4bit model of the same size, which is a huge leap. if there's a way to mediate the degradation, you're golden.

Profilbild von تطوير الالعاب - Ludology
تطوير الالعاب - Ludologyvor 1 Jahr

Can this ported to snes ? , jk

Profilbild von 🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T
🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8Tvor 1 Jahr

Ecosystem services include water , air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Excellent essay!

Profilbild von Desmond
Desmondvor 1 Jahr

In the implementation of bitnet.cpp’s TL2 Kernel, which compresses every three weights into a 5-bit index with a 1-bit sign, how does the LUT method handle potential collisions or overlapping index values during the computation phase, especially in scenarios involving high-dimensional matrices?

Ähnliche Videos