正在加载视频...

视频加载失败

Microsoft researchers release bitnet.cpp, the official inference framework for 1-bit LLMs like BitNet b1.58. It has optimized kernels for fast, lossless inference on CPUs, achieving impressive speedups on ARM and x86 CPUs and significant energy reductions.

75,133 次观看 • 1 年前 •via X (Twitter)

9 条评论

BensenHsu 的头像
BensenHsu1 年前

This study introduces bitnet.cpp, a software stack designed to enable fast and efficient inference of 1-bit large language models (LLMs), such as BitNet b1.58, on CPUs. The researchers aim to unlock the potential of 1-bit LLMs by developing optimized kernels that can achieve significant speedups and reduce energy consumption compared to existing solutions. The results show that bitnet.cpp significantly outperforms the existing llama.cpp framework in terms of both inference speed and energy consumption: • On the Apple M2 Ultra, bitnet.cpp achieves speedups ranging from 1.37x to 5.07x, with larger models experiencing greater performance gains. • On the Intel i7-13700H, bitnet.cpp achieves speedups ranging from 2.37x to 6.17%, with significant improvements for larger models. • bitnet.cpp reduces energy consumption by 55.4% to 70.0% on the Apple M2 Ultra and 71.9% to 82.2% on the Intel i7-13700H, depending on the model size. full paper:

Emily 的头像
Emily1 年前

I hope we see actual world results like LIama 3.2 and other LLMs.

Sandro Hanea 的头像
Sandro Hanea1 年前

Cool work! Played a bit with it and indeed it is degrading the quality a bit, but there are definitely usecases for it. Also, worth mentioning that this builds on top of @ggerganov 's llama.cpp and the most of the inference is still using ggml.

Paul Calcraft 的头像
Paul Calcraft1 年前

Will you release your own 1.58 bitnet models?

Srini Gundelli 的头像
Srini Gundelli1 年前

🫶🏼🤯

ΜΛΛNΙ 的头像
ΜΛΛNΙ1 年前

so fast, but sadly there's still degradation of quality...but it's so much better than a say 4bit model of the same size, which is a huge leap. if there's a way to mediate the degradation, you're golden.

تطوير الالعاب - Ludology 的头像
تطوير الالعاب - Ludology1 年前

Can this ported to snes ? , jk

🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T 的头像
🙂🙏 Özv. Dízelné Hadházy Aranka, 1.8T1 年前

Ecosystem services include water , air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Ecosystem services also include water, air, soil, energy, and biodiversity. Excellent essay!

Desmond 的头像
Desmond1 年前

In the implementation of bitnet.cpp’s TL2 Kernel, which compresses every three weights into a 5-bit index with a 1-bit sign, how does the LUT method handle potential collisions or overlapping index values during the computation phase, especially in scenarios involving high-dimensional matrices?

相关视频