0xSero's banner

0xSero

@0xSero • 58,033 subscribers

I want to be free

Shorts

Codex nooooooo I can't close this popup

Codex nooooooo I can't close this popup

135,833 Aufrufe

What 4 concurrency on GLM sounds like. So gosh damn loud, open air frames are easier to build but so gosh darn loud

What 4 concurrency on GLM sounds like. So gosh damn loud, open air frames are easier to build but so gosh darn loud

11,809 Aufrufe

GLM-5.2-REAP-NVFP4 This experiment I calibrated the model on 30,000+ samples of my agent sessions, tweets, writing, and codebases. Pretty charming, every dataset produced something different.

GLM-5.2-REAP-NVFP4 This experiment I calibrated the model on 30,000+ samples of my agent sessions, tweets, writing, and codebases. Pretty charming, every dataset produced something different.

22,535 Aufrufe

The fact that a 284B parameter model is able to in a single 400K token session - ssh into my Homelab to find docs - ssh into lambda / prime intellect - rewrite reap to support the DS4 attention - download models & datasets - run tests to check it works

The fact that a 284B parameter model is able to in a single 400K token session - ssh into my Homelab to find docs - ssh into lambda / prime intellect - rewrite reap to support the DS4 attention - download models & datasets - run tests to check it works

22,762 Aufrufe

Asking Codex (GPU) to find and kill Codex (Cerebras)

Asking Codex (GPU) to find and kill Codex (Cerebras)

24,131 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

GLM-5.2 Nvidia NVFP4 No pruning, no quantising, exact same model card. 110 tok/s single stream. My friend is a genius, holy fudge. Here's the repo, he got full DS4-Flash on a 5090 + DDR5 at 38.5 tok/s This is a breakthrough.

GLM-5.2 Nvidia NVFP4 No pruning, no quantising, exact same model card. 110 tok/s single stream. My friend is a genius, holy fudge. Here's the repo, he got full DS4-Flash on a 5090 + DDR5 at 38.5 tok/s This is a breakthrough.

155,785 Aufrufe • vor 11 Tagen

GLM-5.2 on 4x 6000s in ZCode /goal build a nintendo ds emulator in C++, do not use a ready made emulator. Don't stop until you can run the Mario 64 DS rom.

GLM-5.2 on 4x 6000s in ZCode /goal build a nintendo ds emulator in C++, do not use a ready made emulator. Don't stop until you can run the Mario 64 DS rom.

38,993 Aufrufe • vor 12 Tagen

Kimi IS COOKING holy mackerel this is way better than anything I can get out of opus or GPT Has some bugs.. but looks soooo unique and well into my brand, for 1 shot I can't complain.

235,591 Aufrufe • vor 5 Monaten

Exo No more worrying about it. Best model best config best harness for your hardware based on local . ai benchmarks just: exo run

Exo No more worrying about it. Best model best config best harness for your hardware based on local . ai benchmarks just: exo run

43,386 Aufrufe • vor 1 Monat

I promise this is the last pleometric video I drop on you.

I promise this is the last pleometric video I drop on you.

131,800 Aufrufe • vor 3 Monaten

If you're a frontend dev let me make your life easier: 1. Pay for mobbin. 2. Setup mobbin MCP 3. Model will be able to pull all designs off the screens 4. Basically tons of free Figma designs

If you're a frontend dev let me make your life easier: 1. Pay for mobbin. 2. Setup mobbin MCP 3. Model will be able to pull all designs off the screens 4. Basically tons of free Figma designs

73,323 Aufrufe • vor 2 Monaten

It's not a dream anymore, this is frontier at home. My next mission is to make this cheaper to run. Do you see the length of the session? Do you see the successful tool calls?

It's not a dream anymore, this is frontier at home. My next mission is to make this cheaper to run. Do you see the length of the session? Do you see the successful tool calls?

31,883 Aufrufe • vor 24 Tagen

Here's my conversation with Philip Kiely. A deep dive into inference engineering, his book his available online for free, and it's worth reading. Inference is becoming more important to people and organisations by the day, learning about the topic will prepare you well.

Here's my conversation with Philip Kiely. A deep dive into inference engineering, his book his available online for free, and it's worth reading. Inference is becoming more important to people and organisations by the day, learning about the topic will prepare you well.

24,290 Aufrufe • vor 18 Tagen

. is continuing to grow and improve, it's gotten to a place where I'm very proud of it. Now to release it.

. is continuing to grow and improve, it's gotten to a place where I'm very proud of it. Now to release it.

11,221 Aufrufe • vor 7 Tagen

Holy moly, MiniMax-M2.7 is amazing, watch till the end.

Holy moly, MiniMax-M2.7 is amazing, watch till the end.

110,349 Aufrufe • vor 4 Monaten

GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45.6 tok/s decode (gen) - 1340 tok/s prefill I could get 2x decode if I limit to 64k context (100 tok/s) In this video it operates Figma (:

GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45.6 tok/s decode (gen) - 1340 tok/s prefill I could get 2x decode if I limit to 64k context (100 tok/s) In this video it operates Figma (:

74,772 Aufrufe • vor 3 Monaten

Local AI can be so good, but you’d need about 12k USD to get it. Then it’s not so great Here’s a Q4 of Qwen3.5-262B-REAP Weights 131GB KVCache 50GB 256,000 context 350 tokens/s prefill warmup 4,000 tokens/s prefill cache 36 tokens/s generation Vision enabled REAP is good

Local AI can be so good, but you’d need about 12k USD to get it. Then it’s not so great Here’s a Q4 of Qwen3.5-262B-REAP Weights 131GB KVCache 50GB 256,000 context 350 tokens/s prefill warmup 4,000 tokens/s prefill cache 36 tokens/s generation Vision enabled REAP is good

87,959 Aufrufe • vor 3 Monaten

GLM-5.2 running under my desk, modifying GGUFs to enable MTP on the Spark. What a world we live in, where you can have a megamind running off electrons going really fast.

GLM-5.2 running under my desk, modifying GGUFs to enable MTP on the Spark. What a world we live in, where you can have a megamind running off electrons going really fast.

23,422 Aufrufe • vor 24 Tagen

MiniMax-M3 running locally 100+ tok/s Good questions not often that models ask good questions.

MiniMax-M3 running locally 100+ tok/s Good questions not often that models ask good questions.

32,537 Aufrufe • vor 1 Monat

Finally, full precision MiniMax-M2.7 running at home. 100 tokens/s decode 5050 tokens/s prefill

Finally, full precision MiniMax-M2.7 running at home. 100 tokens/s decode 5050 tokens/s prefill

72,465 Aufrufe • vor 3 Monaten

OAI has to be prioritising low/no thinking GPT-5.5 inference, it's so fast. I think this is better than high thinking, so much less annoying, so much more token efficient.

OAI has to be prioritising low/no thinking GPT-5.5 inference, it's so fast. I think this is better than high thinking, so much less annoying, so much more token efficient.

65,255 Aufrufe • vor 2 Monaten

Guess who got GLM-5.2 at home? You can't deny I'm elite at inference.

Guess who got GLM-5.2 at home? You can't deny I'm elite at inference.

27,083 Aufrufe • vor 1 Monat

Let me save you hours of testing frontends. If you're ever working on a front-end, instead of writing tests, and adding puppeteer slop to your repo 1. Get an llm to write you with whatever needs to be tested 2. Copy that, go to browser 3. Open localhost with your selected app 4. Use Claude Chrome Extension or Parchi 5. Send it the prompt 6. QA engineering, there you go. Use models results and pass it back to your coding agent to fix whatever is flagged.

Let me save you hours of testing frontends. If you're ever working on a front-end, instead of writing tests, and adding puppeteer slop to your repo 1. Get an llm to write you with whatever needs to be tested 2. Copy that, go to browser 3. Open localhost with your selected app 4. Use Claude Chrome Extension or Parchi 5. Send it the prompt 6. QA engineering, there you go. Use models results and pass it back to your coding agent to fix whatever is flagged.

109,943 Aufrufe • vor 5 Monaten

I LOVE Deepseek-v4-flash, incredibly reliable and capable, logical. It's lacking in frontend but I have MiMo for that. I would recommend any company spending 100k+ a year on AI to purchase 8-10~ 6000s and have a few of the works to have them blind test these models for work.

I LOVE Deepseek-v4-flash, incredibly reliable and capable, logical. It's lacking in frontend but I have MiMo for that. I would recommend any company spending 100k+ a year on AI to purchase 8-10~ 6000s and have a few of the works to have them blind test these models for work.

51,035 Aufrufe • vor 2 Monaten

I yapped about LLM Compression for 40 minutes, how much misinformation did i spread this time (,:

I yapped about LLM Compression for 40 minutes, how much misinformation did i spread this time (,:

35,181 Aufrufe • vor 1 Monat