Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

1 cpu core - 160 Tokens per second

nisten🇨🇦e/acc

15,509 subscribers

72,142 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

on a 3 year old base model m1 macbook air

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

can't make this up, these are the actual training params for this particular checkpoint: 269 steps, 2469 sequence length, training at 4.2 it/s on 22420mb of ram on an nvidia L4 and it converged at 142 steps. This is art at this point.

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

wget

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

This is in 8bit by theway. it actually did 198 tps in mixed bitnet mode, but was rambling at lot at 76mb gguf size and that was a bitnet trining run

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

If someone wants to code-review, implemented a whole new optimizer from @cognitivecompai to train this and getting some of the best conversational results so far. Also Eric's looking for help to do a pR for hf transformers & axolotl Training run code here

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

it looks like the optimizer step may not be being applied properly

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

If you need to reproduce results, I uploaded a 8bit .gguf of the training checkpoint from this video for you to try. Careful it's very finnicky styll. May have to start prompts with Human: How do raccons meow. The base model is untrained.

Profilbild von Maziyar PANAHI

Maziyar PANAHIvor 1 Jahr

This is a lie! 🔥 don’t believe it people! ❤️ PS: when do you release? @nisten 🤣

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

Try the checkpoint ./llama-cli -n 768 -fa -b 768 --min-p 0.3 --top-p 0.85 -ctk q8_0 -ctv q8_0 --keep -1 -p "You're a Nasa jpl engineer teaching huamn about cats in space." -m model.gguf --temp 1.69 -ngl 0 -t 1 -co -cnv -n 2000 --reverse-prompt "Assistant:"

Profilbild von atharva

atharvavor 1 Jahr

so goooood, trying this out tonight

Profilbild von nisten - e/acc

nisten - e/accvor 1 Jahr

release is coming just hang on , multiple new merges going on at the same time for llama.cpp and aphrodite engine implementation

Ähnliche Videos

OpenAI-GPT-20B runs at 9 tokens per second on a 10 year old CPU (4-core i7-6700k) Could get up to 50 000 token context too.

OpenAI-GPT-20B runs at 9 tokens per second on a 10 year old CPU (4-core i7-6700k) Could get up to 50 000 token context too.

nisten🇨🇦e/acc

12,274 Aufrufe • vor 10 Monaten

671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second

671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second

AK

227,579 Aufrufe • vor 1 Jahr

This is from my conversation with a Former $NVDA Director. If you want more TOKENS per SECOND because you are limited by POWER, you buy the next $NVDA generation. - Limited to 100 MW: $NVDA H100 = 300M tokens/second - Limited to 100 MW: $NVDA GB200 NVL72 = 12B tokens/second

This is from my conversation with a Former $NVDA Director. If you want more TOKENS per SECOND because you are limited by POWER, you buy the next $NVDA generation. - Limited to 100 MW: $NVDA H100 = 300M tokens/second - Limited to 100 MW: $NVDA GB200 NVL72 = 12B tokens/second

Rihard Jarc

57,622 Aufrufe • vor 9 Monaten

Nvidia's Chief Scientist Bill Dally says there's a path to serving relatively large models at 10,000 to 20,000 tokens per user per second. For context, Opus 4.6 is ~43 and Grok 4.2 Beta is ~251 tokens/user/s 🤯

Nvidia's Chief Scientist Bill Dally says there's a path to serving relatively large models at 10,000 to 20,000 tokens per user per second. For context, Opus 4.6 is ~43 and Grok 4.2 Beta is ~251 tokens/user/s 🤯

Marcelo P. Lima

84,626 Aufrufe • vor 2 Monaten

My mind is blown. Groq Inc is serving LLaMA 3 at over 800 tokens per second! 800. Tokens. Per. Second. This unlocks so many incredible use-cases. It's one thing to see my demo — it's another thing entirely to experience it for yourself. Do yourself a favor and try it asap.

My mind is blown. Groq Inc is serving LLaMA 3 at over 800 tokens per second! 800. Tokens. Per. Second. This unlocks so many incredible use-cases. It's one thing to see my demo — it's another thing entirely to experience it for yourself. Do yourself a favor and try it asap.

Matt Shumer

572,518 Aufrufe • vor 2 Jahren

15,000 tokens per second feels absolutely insane. Is this the future? This is if you haven't heard of it from Taalas Inc..

15,000 tokens per second feels absolutely insane. Is this the future? This is if you haven't heard of it from Taalas Inc..

David Gomes

262,584 Aufrufe • vor 1 Monat

Gemma 4 running locally on a Nintendo Switch :) 1.5 tokens per second haha, but it runs! Google Gemma Google AI Developers Google DeepMind

Gemma 4 running locally on a Nintendo Switch :) 1.5 tokens per second haha, but it runs! Google Gemma Google AI Developers Google DeepMind

Maddie D. Reese

188,407 Aufrufe • vor 2 Monaten

FHE has been getting ~10× faster per year over the past few years. Some of that comes from better cryptography, but most of the gains now are from hardware acceleration. Moving from CPU → GPU → FPGA → ASIC is basically how you go from ~10 confidential stablecoin transfers per second to ~100,000 confidential stablecoin transfers per second.

FHE has been getting ~10× faster per year over the past few years. Some of that comes from better cryptography, but most of the gains now are from hardware acceleration. Moving from CPU → GPU → FPGA → ASIC is basically how you go from ~10 confidential stablecoin transfers per second to ~100,000 confidential stablecoin transfers per second.

Zama

101,671 Aufrufe • vor 5 Monaten

Almost 1 new coin launching on pumpfun per second.

Almost 1 new coin launching on pumpfun per second.

Kakashi

31,928 Aufrufe • vor 1 Jahr

Among the fastest DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B inference in the market, per Artificial Analysis benchmarks (April 2026). ⚡️🤖 Sub-1-second TTFT. 230 tokens per second. Co-designed every layer of the stack with Inferact, performance optimized vLLM, all on NVIDIA HGX B300. Live on DigitalOcean Serverless Inference now. Full breakdown in the comments. ⬇️

Among the fastest DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B inference in the market, per Artificial Analysis benchmarks (April 2026). ⚡️🤖 Sub-1-second TTFT. 230 tokens per second. Co-designed every layer of the stack with Inferact, performance optimized vLLM, all on NVIDIA HGX B300. Live on DigitalOcean Serverless Inference now. Full breakdown in the comments. ⬇️

DigitalOcean

39,063 Aufrufe • vor 1 Monat

The Snapdragon X2 Elite Extreme will fuel the fastest, most powerful and efficient Windows PCs on the market. #SnapdragonSummit · 18-core CPU · First Arm compatible 5GHz CPU · 80 TOPs Qualcomm Hexagon NPU

The Snapdragon X2 Elite Extreme will fuel the fastest, most powerful and efficient Windows PCs on the market. #SnapdragonSummit · 18-core CPU · First Arm compatible 5GHz CPU · 80 TOPs Qualcomm Hexagon NPU

Snapdragon

20,914 Aufrufe • vor 8 Monaten

160 second video on the truth behind Mokobara

160 second video on the truth behind Mokobara

anushka

55,174 Aufrufe • vor 1 Jahr

CEO Rene Haas on why AI agents are spiking CPU demand: ▫️use >15x more tokens than humans ▫️a 1GW data centres now need 4x CPU cores (120m vs. 30m) for the AI agents AI agents are doing general asynchronous tasks (browse, spreadsheets) that are best-suited CPUs. So, 4x more CPUs per 1GW of GPUs…but the data centre’s power envelope is the same. Need energy-efficient CPU solutions, which is right down Arm’s lane of expertise (its ubiquitous mobile chips win on energy efficiency)….hence ARM’s recent AI CPU announcement.

CEO Rene Haas on why AI agents are spiking CPU demand: ▫️use >15x more tokens than humans ▫️a 1GW data centres now need 4x CPU cores (120m vs. 30m) for the AI agents AI agents are doing general asynchronous tasks (browse, spreadsheets) that are best-suited CPUs. So, 4x more CPUs per 1GW of GPUs…but the data centre’s power envelope is the same. Need energy-efficient CPU solutions, which is right down Arm’s lane of expertise (its ubiquitous mobile chips win on energy efficiency)….hence ARM’s recent AI CPU announcement.

Bearly AI

129,780 Aufrufe • vor 1 Monat

The era of 1-bit LLMs is here — now with WebGPU acceleration! 🤯 It's incredible to think that a quantized 1.7B model (just 290MB in size) can run at ~100 tokens per second entirely in your browser. Try the demo yourself 👇

The era of 1-bit LLMs is here — now with WebGPU acceleration! 🤯 It's incredible to think that a quantized 1.7B model (just 290MB in size) can run at ~100 tokens per second entirely in your browser. Try the demo yourself 👇

Xenova

105,027 Aufrufe • vor 1 Monat

Qwen3.5-35B-A3B running locally on an M4 chip at 49.5 tokens per second. A 35B model. On a laptop. In real time. LOCAL AI IS GETTING SCARY FAST.

Qwen3.5-35B-A3B running locally on an M4 chip at 49.5 tokens per second. A 35B model. On a laptop. In real time. LOCAL AI IS GETTING SCARY FAST.

0xMarioNawfal

477,675 Aufrufe • vor 3 Monaten

💫🖤 SCHIZO BANNERS IS LIVE 💫🖤 ▪️ Free Milady Noir NFT per banner mint ▪️ 100k $MILADY tokens per banner mint 💫 Like / Repost / 10 winners 💫 1 Milady per winner / Ends March 22

💫🖤 SCHIZO BANNERS IS LIVE 💫🖤 ▪️ Free Milady Noir NFT per banner mint ▪️ 100k $MILADY tokens per banner mint 💫 Like / Repost / 10 winners 💫 1 Milady per winner / Ends March 22

Milady Capital

11,960 Aufrufe • vor 1 Jahr

got myself a new macbook pro 12-core cpu, 16-core gpu, m4 pro, 24gb ram, 512gb ssd, 14” space black (and Dev made a banger unboxing video of me 😭🫡)

got myself a new macbook pro 12-core cpu, 16-core gpu, m4 pro, 24gb ram, 512gb ssd, 14” space black (and Dev made a banger unboxing video of me 😭🫡)

Tushar Mehta

15,907 Aufrufe • vor 1 Jahr

They released 20,000 liters released per second at 50 meters per second 🤯

They released 20,000 liters released per second at 50 meters per second 🤯

Dudes Posting Their W’s

194,208 Aufrufe • vor 6 Monaten

"Siku hizi natumia tokens za thao per week"

"Siku hizi natumia tokens za thao per week"

kimonski

83,767 Aufrufe • vor 1 Jahr