正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

The first natively trained 1-bit model: BitNet 2B. Trained on 4 trillion tokens. that can run on CPUs like Apple M2 Native 1.58-bit weights and 8bit activations W158A8 Outperforms LLaMA &close to Qwen 2.5 1.5B in while using only 0.4GB memory versus 2GB and processes tokens 40%

Md Ismail Šojal 🕷️

43,855 subscribers

43,617 次观看 • 3 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

DeepSeek R1 Qwen 7B 4bit M2 Ultra vs M4 Max on Apple MLX 🤫 Let them think... (video 4x in center part) M2 Ultra: 114.9 tokens per sec M4 Max (14"): 88.3 tokens per sec

DeepSeek R1 Qwen 7B 4bit M2 Ultra vs M4 Max on Apple MLX 🤫 Let them think... (video 4x in center part) M2 Ultra: 114.9 tokens per sec M4 Max (14"): 88.3 tokens per sec

Ivan Fioravanti ᯅ

59,734 次观看 • 1 年前

🔥Apple MLX first 6bit model is on Hugging Face!🔥 Qwen2.5-Coder-32B-Instruct-6bit! 3bit conversion and test in progress! Video 8x below on M4 Max 40GPU: - Prompt: 38 tokens, 61.731 tokens-per-sec - Generation: 1181 tokens, 16.939 tokens-per-sec - Peak memory: 25.122 GB

🔥Apple MLX first 6bit model is on Hugging Face!🔥 Qwen2.5-Coder-32B-Instruct-6bit! 3bit conversion and test in progress! Video 8x below on M4 Max 40GPU: - Prompt: 38 tokens, 61.731 tokens-per-sec - Generation: 1181 tokens, 16.939 tokens-per-sec - Peak memory: 25.122 GB

Ivan Fioravanti ᯅ

46,494 次观看 • 1 年前

DeepSeek R1 distilled to Qwen 1.5B easily runs on my iPhone 16 with MLX swift. Here's the 4-bit model reasoning entirely on device at almost 60 toks/sec:

DeepSeek R1 distilled to Qwen 1.5B easily runs on my iPhone 16 with MLX swift. Here's the 4-bit model reasoning entirely on device at almost 60 toks/sec:

Awni Hannun

1,139,656 次观看 • 1 年前

Latest mlx-lm has faster and lower memory prompt processing! Thanks to causal fused attention from Jagrit Digani 7B 4-bit Mistral v3 can do ~30,000 tokens under a minute on my M4 Max laptop and only needs 8.5GB:

Latest mlx-lm has faster and lower memory prompt processing! Thanks to causal fused attention from Jagrit Digani 7B 4-bit Mistral v3 can do ~30,000 tokens under a minute on my M4 Max laptop and only needs 8.5GB:

Awni Hannun

22,156 次观看 • 1 年前

You are not prepared for this, 250+ tokens/sec, 1B model < 2GB memory

You are not prepared for this, 250+ tokens/sec, 1B model < 2GB memory

anton

372,371 次观看 • 2 年前

Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups. > MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass. Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!

Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups. > MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass. Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!

🚨 AI News | TestingCatalog

46,013 次观看 • 26 天前

GEMMA 4 FINISHED 14 MINUTES FASTER THAN QWEN 3.6 DESPITE LOWER TOKENS PER SECOND BY USING 5X FEWER TOKENS

GEMMA 4 FINISHED 14 MINUTES FASTER THAN QWEN 3.6 DESPITE LOWER TOKENS PER SECOND BY USING 5X FEWER TOKENS

0xMarioNawfal

71,957 次观看 • 1 个月前

🧙🏼‍♂️! We are excited to announce the launch of $MIM and the first cauldrons on Berachain’s Artio Testnet!🐻⛓️ Users can now borrow testnet $MIM using HONEY/MIM LP tokens, natively on Artio!🔮🔥 Interact with Artio directly👇🏻

🧙🏼‍♂️! We are excited to announce the launch of $MIM and the first cauldrons on Berachain’s Artio Testnet!🐻⛓️ Users can now borrow testnet $MIM using HONEY/MIM LP tokens, natively on Artio!🔮🔥 Interact with Artio directly👇🏻

🧙🏼‍♂️

400,352 次观看 • 2 年前

Perplexity's Sonar—built on Llama 3.3 70b—outperforms GPT-4o-mini and Claude 3.5 Haiku while matching or surpassing top models like GPT-4o and Claude 3.5 Sonnet in user satisfaction. At 1200 tokens/second, Sonar is optimized for answer quality and speed.

Perplexity's Sonar—built on Llama 3.3 70b—outperforms GPT-4o-mini and Claude 3.5 Haiku while matching or surpassing top models like GPT-4o and Claude 3.5 Sonnet in user satisfaction. At 1200 tokens/second, Sonar is optimized for answer quality and speed.

Perplexity

565,953 次观看 • 1 年前

MLX + M3 Ultra 512GB + Qwen3-235B-A22B-8bit = 🔥 "write a beautiful p5js particles animation that reacts to mouse clicks movements" Prompt: 22 tokens, 64.471 tokens-per-sec Generation: 6197 tokens, 18.916 tokens-per-sec Peak memory: 251.077 GB

MLX + M3 Ultra 512GB + Qwen3-235B-A22B-8bit = 🔥 "write a beautiful p5js particles animation that reacts to mouse clicks movements" Prompt: 22 tokens, 64.471 tokens-per-sec Generation: 6197 tokens, 18.916 tokens-per-sec Peak memory: 251.077 GB

Ivan Fioravanti ᯅ

24,160 次观看 • 1 年前

🚀 Igniter & Solaxy Wallet Are LIVE! 🛸 The wait is over. You can now launch tokens on Solaxy using Igniter, and interact with the rollup like never before using the official Solaxy Wallet! 🔹 Igniter Solaxy’s native launchpad. Launch your own token directly on the rollup. Tokens that hit a 1M $SOLX market cap are automatically rooted into Neptoon DEX.

🚀 Igniter & Solaxy Wallet Are LIVE! 🛸 The wait is over. You can now launch tokens on Solaxy using Igniter, and interact with the rollup like never before using the official Solaxy Wallet! 🔹 Igniter Solaxy’s native launchpad. Launch your own token directly on the rollup. Tokens that hit a 1M $SOLX market cap are automatically rooted into Neptoon DEX.

SOLAXY

298,591 次观看 • 10 个月前

Qwen has just released a model on par with GPT-4o... And you can run it locally easily 🤯 Yep. GPT-4o level AI running offline on a laptop. - Fully open source - Only 3B active parameters - 262k context length natively Quick steps to run it on your machine and details below

Qwen has just released a model on par with GPT-4o... And you can run it locally easily 🤯 Yep. GPT-4o level AI running offline on a laptop. - Fully open source - Only 3B active parameters - 262k context length natively Quick steps to run it on your machine and details below

Paul Couvert

503,787 次观看 • 10 个月前

Today Meta released "Code Llama", a large language model fine-tuned for coding tasks. It's publicly available and can be used for commercial use! It outperforms GPT 3.5 and you can even run it locally on your Macbook using Ollama.

Today Meta released "Code Llama", a large language model fine-tuned for coding tasks. It's publicly available and can be used for commercial use! It outperforms GPT 3.5 and you can even run it locally on your Macbook using Ollama.

Marcel Pociot 🧪

49,806 次观看 • 2 年前

`transformers` + `torchao` quantization + `torch.compile` for faster inference speed and less memory usage 🔥 Demo of "meta-llama/Meta-Llama-3.1-8B-Instruct" quantized in 4-bit weight-only :

`transformers` + `torchao` quantization + `torch.compile` for faster inference speed and less memory usage 🔥 Demo of "meta-llama/Meta-Llama-3.1-8B-Instruct" quantized in 4-bit weight-only :

Marc Sun

24,515 次观看 • 1 年前

This is amazing! Someone trained a 1.5B model (with GRPO) to solve mazes! model:

This is amazing! Someone trained a 1.5B model (with GRPO) to solve mazes! model:

Victor M

139,407 次观看 • 1 年前

/1 Gemma 4 31B just crushed Qwen 3.6 27B in a local LLM gamedev contest inside atomic.chat (prompt is below) Device: MacBook Pro M5 Max, 64GB RAM Results: Qwen 3.6 27B: 32 tokens/sec · 18m 04s · 33,946 tokens Gemma 4 31B: 27 tokens/sec · 3m 51s · 6,209 tokens So what is more important: tokens per second, or the quality of the final answer? Qwen made a very long response and showed more creativity and visual style. But Gemma gave a shorter, clearer, and more logical answer in much less time. In this one-shot Pac-Man gamedev contest, Gemma 4 31B was the clear winner. Its game logic was stronger: click reactions were smoother, and it handled interactions with elements like walls, ghosts, and particle effects better. But this was only one test. Maybe Qwen 3.6 27B can show better results with better settings. Open the comments, try our prompt, and share your result below.

/1 Gemma 4 31B just crushed Qwen 3.6 27B in a local LLM gamedev contest inside atomic.chat (prompt is below) Device: MacBook Pro M5 Max, 64GB RAM Results: Qwen 3.6 27B: 32 tokens/sec · 18m 04s · 33,946 tokens Gemma 4 31B: 27 tokens/sec · 3m 51s · 6,209 tokens So what is more important: tokens per second, or the quality of the final answer? Qwen made a very long response and showed more creativity and visual style. But Gemma gave a shorter, clearer, and more logical answer in much less time. In this one-shot Pac-Man gamedev contest, Gemma 4 31B was the clear winner. Its game logic was stronger: click reactions were smoother, and it handled interactions with elements like walls, ghosts, and particle effects better. But this was only one test. Maybe Qwen 3.6 27B can show better results with better settings. Open the comments, try our prompt, and share your result below.

Chubby♨️

71,368 次观看 • 1 个月前

BREAKING: Grok set a new record on OpenRouter this year with over 16 trillion tokens of usage. 16,060,000,000,000 tokens. That is about 64% more usage than the next closest model.

BREAKING: Grok set a new record on OpenRouter this year with over 16 trillion tokens of usage. 16,060,000,000,000 tokens. That is about 64% more usage than the next closest model.

DogeDesigner

27,783 次观看 • 5 个月前

We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯

We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯

Together AI

88,229 次观看 • 2 年前

Put your $ZEC, $DOGE, and $XRP to work on Unichain Thousands in daily rewards up for grabs to LPs on USDC pairs. Two ways: 1. Swap to ZEC/DOGE/XRP on Unichain and LP there 2. Bridge your native tokens using the Universal App on your Unichain wallet Zuniswap.

Put your $ZEC, $DOGE, and $XRP to work on Unichain Thousands in daily rewards up for grabs to LPs on USDC pairs. Two ways: 1. Swap to ZEC/DOGE/XRP on Unichain and LP there 2. Bridge your native tokens using the Universal App on your Unichain wallet Zuniswap.

Universal

27,512 次观看 • 7 个月前

Fuck yeah! MaskGCT - New open SoTA Text to Speech model! 🔥 > Zero-shot voice cloning > Emotional TTS > Trained on 100K hours of data > Long form synthesis > Variable speed synthesis > Bilingual - Chinese & English > Available on Hugging Face Fully non-autoregressive architecture: > Stage 1: Predicts semantic tokens from text, using tokens extracted from a speech self-supervised learning (SSL) model > Stage 2: Predicts acoustic tokens conditioned on the semantic tokens. Synthesised: "Would you guys personally like to have a fake fireplace, an electric one, in your house? Or would you rather have a real fireplace? Let me know down below. Okay everybody, that's all for today's video and I hope you guys learned a bunch of furniture vocabulary!" TTS scene keeps getting lit! 🐐

Fuck yeah! MaskGCT - New open SoTA Text to Speech model! 🔥 > Zero-shot voice cloning > Emotional TTS > Trained on 100K hours of data > Long form synthesis > Variable speed synthesis > Bilingual - Chinese & English > Available on Hugging Face Fully non-autoregressive architecture: > Stage 1: Predicts semantic tokens from text, using tokens extracted from a speech self-supervised learning (SSL) model > Stage 2: Predicts acoustic tokens conditioned on the semantic tokens. Synthesised: "Would you guys personally like to have a fake fireplace, an electric one, in your house? Or would you rather have a real fireplace? Let me know down below. Okay everybody, that's all for today's video and I hope you guys learned a bunch of furniture vocabulary!" TTS scene keeps getting lit! 🐐

Vaibhav (VB) Srivastav

139,061 次观看 • 1 年前