Elliot Arledge's banner

Elliot Arledge

@elliotarledge • 30,601 subscribers

Shorts

Introducing x-cli! Use it in claude code, codex, openclaw, opencode, or anything you'd like. It's a cli tool, not a MCP. It won't waste context space when not being used. Simply paste this into your agent session: "Setup

Introducing x-cli! Use it in claude code, codex, openclaw, opencode, or anything you'd like. It's a cli tool, not a MCP. It won't waste context space when not being used. Simply paste this into your agent session: "Setup

187,260 次观看

How DDA raymarching for graphics work.

How DDA raymarching for graphics work.

98,202 次观看

I got a very efficient neural net playing mario kart on my macbook.

I got a very efficient neural net playing mario kart on my macbook.

70,402 次观看

8 coding agents (written in rust) who are each writing rust for a simulator while I scroll Grokipedia.

8 coding agents (written in rust) who are each writing rust for a simulator while I scroll Grokipedia.

26,976 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

timelapse #83 (22 hrs): - it was very easy to dive super deep into anything i needed to (this is what i focused on today because not all days are like this) - finding the grok code fast 1 + grok 4 for deep thinking and verification combo to be super useful in cursor. speed was solid - hard to imagine myself spending many more mental clock cycles in a 24 hr period - had to pull out qwen3-next’s gated deltanet + linear attention from bleeding edge hf transformers to begin implementing a multi-gpu fp8 trainer from scratch. this is so damn bleeding edge and i underestimated how much effort this has and will require lol - lots of diet coke and oats - shipped the template which the core chapters of my book will be built on: - all im missing now is flash attention 1/2 mastery (fa2 tmrw), intuition on making topk faster (for arbitrary row length), what i should and shouldnt teach in cutlass/cute, hopper/blackwell gemm kernel mastery (down to fp4) — shoutout to Pranjal for making this easier for me. his blog post is amazing - caught up w/ Mati Roy - im feeling great mentally but not so good physically as im writing this and about to pass out

timelapse #83 (22 hrs): - it was very easy to dive super deep into anything i needed to (this is what i focused on today because not all days are like this) - finding the grok code fast 1 + grok 4 for deep thinking and verification combo to be super useful in cursor. speed was solid - hard to imagine myself spending many more mental clock cycles in a 24 hr period - had to pull out qwen3-next’s gated deltanet + linear attention from bleeding edge hf transformers to begin implementing a multi-gpu fp8 trainer from scratch. this is so damn bleeding edge and i underestimated how much effort this has and will require lol - lots of diet coke and oats - shipped the template which the core chapters of my book will be built on: - all im missing now is flash attention 1/2 mastery (fa2 tmrw), intuition on making topk faster (for arbitrary row length), what i should and shouldnt teach in cutlass/cute, hopper/blackwell gemm kernel mastery (down to fp4) — shoutout to Pranjal for making this easier for me. his blog post is amazing - caught up w/ Mati Roy - im feeling great mentally but not so good physically as im writing this and about to pass out

2,255,154 次观看 • 10 个月前

timelapse #162 (17 hrs) - Idk what it is, grok makes it super easy to get into flow state. No model does this for me - Went deep into spec decoding architecture learning with grok - Continued Minecraft reimplementation in C. The biggest bottleneck is samples to verify against (capturing my own gameplay for a few mins rn) as well as fly wheel speed for an agent to iterate. was bottlenecked by replay and compile, now bottlenecked by tokens. this is the hardest task ive given an llm to date. - Contract work - Getting tinygrad’s glm 5.2 at c=1 to be stable - more meetings

timelapse #162 (17 hrs) - Idk what it is, grok makes it super easy to get into flow state. No model does this for me - Went deep into spec decoding architecture learning with grok - Continued Minecraft reimplementation in C. The biggest bottleneck is samples to verify against (capturing my own gameplay for a few mins rn) as well as fly wheel speed for an agent to iterate. was bottlenecked by replay and compile, now bottlenecked by tokens. this is the hardest task ive given an llm to date. - Contract work - Getting tinygrad’s glm 5.2 at c=1 to be stable - more meetings

47,900 次观看 • 8 天前

timelapse #147 (15.5 hrs) - woke up to MiniMax (official) M3 launch including my kernelbench-hard (lowest score at below 30% which emphasizes the hardness) - did a space with the minimax and together ai folks - burned 1.1B tokens - got nanogpt at nvfp4 training stability to match bf16. this is a prereq for another problem im trying to solve - got my timelapse workflow nailed with a solid html page lol - loosing patience from anthropic rate limits

timelapse #147 (15.5 hrs) - woke up to MiniMax (official) M3 launch including my kernelbench-hard (lowest score at below 30% which emphasizes the hardness) - did a space with the minimax and together ai folks - burned 1.1B tokens - got nanogpt at nvfp4 training stability to match bf16. this is a prereq for another problem im trying to solve - got my timelapse workflow nailed with a solid html page lol - loosing patience from anthropic rate limits

257,031 次观看 • 1 个月前

rewriting minecraft from scratch in C/CUDA. my verifiers are very strong and i think i can finish today/tmrw. this is the C version (no gpu render... yet) done w/ fable orchestrator and gpt 5.5 medium subagents

rewriting minecraft from scratch in C/CUDA. my verifiers are very strong and i think i can finish today/tmrw. this is the C version (no gpu render... yet) done w/ fable orchestrator and gpt 5.5 medium subagents

61,277 次观看 • 12 天前

Chang: "Explain this to me. Daniela runs day-to-day operations. All the leadership team reports to you." Dario: "Yes." Chang: "No one reports to you. That sounds like a pretty sweet job." Dario: "It's incredibly freeing. It lets me do all the things that I do much more easily than I would otherwise..." Chang: "And she does all the work. Is that what you're saying?" Dario: "If you had to go through the things I had to go through during the pandemic, the things I had to go through during the DoW... No"

Chang: "Explain this to me. Daniela runs day-to-day operations. All the leadership team reports to you." Dario: "Yes." Chang: "No one reports to you. That sounds like a pretty sweet job." Dario: "It's incredibly freeing. It lets me do all the things that I do much more easily than I would otherwise..." Chang: "And she does all the work. Is that what you're saying?" Dario: "If you had to go through the things I had to go through during the pandemic, the things I had to go through during the DoW... No"

149,153 次观看 • 1 个月前

Co-Founder of Cerebras explains their WSE simplified design compared to classical GPUs made by NVIDIA.

Co-Founder of Cerebras explains their WSE simplified design compared to classical GPUs made by NVIDIA.

179,235 次观看 • 1 个月前

timelapse #119 (15 hrs): - been trying out the 4am wake up and 8pm sleep schedule and its pretty good for getting ahead in the day. feels like less pressure is on me - upgraded my cpu and ram to ryzen 9 9950x3d + 96gb ddr5 - speedrunning contract work - planning out the entire software and hardware stack for what a space datacenter might look like and how its manufactured - built an MCP server called "thinkingcap" which ill be pushing out for you all today - getting used to opus 4.5 and gemini 3 pro still, although grok 4.1 has been very helpful to me

timelapse #119 (15 hrs): - been trying out the 4am wake up and 8pm sleep schedule and its pretty good for getting ahead in the day. feels like less pressure is on me - upgraded my cpu and ram to ryzen 9 9950x3d + 96gb ddr5 - speedrunning contract work - planning out the entire software and hardware stack for what a space datacenter might look like and how its manufactured - built an MCP server called "thinkingcap" which ill be pushing out for you all today - getting used to opus 4.5 and gemini 3 pro still, although grok 4.1 has been very helpful to me

617,066 次观看 • 7 个月前

timelapse #153 (11 hrs) - Woke up early and went to gym - Read Paul graham’s article and did some cold outreach for RL sim engineering (just wanna see where it goes) - got wireless keyboard and mouse. feels cleaner - Co-worked virtually with a friend - Trying out RL sims for battery materials optimization since the space seems pretty open for GPU optimization - Paused some ambitious projects for when fable comes back

timelapse #153 (11 hrs) - Woke up early and went to gym - Read Paul graham’s article and did some cold outreach for RL sim engineering (just wanna see where it goes) - got wireless keyboard and mouse. feels cleaner - Co-worked virtually with a friend - Trying out RL sims for battery materials optimization since the space seems pretty open for GPU optimization - Paused some ambitious projects for when fable comes back

56,353 次观看 • 1 个月前

Timelapse #156 (36 hrs) - Worked with the tiny corp on getting GLM 5.2 running on 8xMI300X (sglang won here) - Launched KernelBench-Mega and updated Kernelbench-Hard with h100 and b200 sweeps - Took care of boring business stuff - Did some training sweeps for specialized technical vocab audio model - Some bugs with putting kernelbench-mega and hard on cloud instances so had to do some reruns. Learned a lot though - Setting up my own local rl infra and profiling concurrency 128 rollouts with vllm. Became clear to me that I need to serve in nvfp4, use MoE only for throughput, reap so training doesn’t OOM, dig into the vllm kernel graph itself to not underutilize my hardware from poor flashinfer/cutlass selections for my rtx pro 6000 sm120 architecture - Might do online distillation from glm 5.2 but for now taking it one step at a time - Slept for a bit then woke up and showered - Fixed an issue with SGLang tensor parallel deadlock on GLM 5.2 architecture with MTP enabled - GLM 5.2 inference is 2-3x faster than coding plans and running on amd boxes - Spent time with family - Hung out with some friends - Recorded some yoctogpt lectures with the revamped notebook (high taste btw) - Setting up dflash training for GLM 5.2

Timelapse #156 (36 hrs) - Worked with the tiny corp on getting GLM 5.2 running on 8xMI300X (sglang won here) - Launched KernelBench-Mega and updated Kernelbench-Hard with h100 and b200 sweeps - Took care of boring business stuff - Did some training sweeps for specialized technical vocab audio model - Some bugs with putting kernelbench-mega and hard on cloud instances so had to do some reruns. Learned a lot though - Setting up my own local rl infra and profiling concurrency 128 rollouts with vllm. Became clear to me that I need to serve in nvfp4, use MoE only for throughput, reap so training doesn’t OOM, dig into the vllm kernel graph itself to not underutilize my hardware from poor flashinfer/cutlass selections for my rtx pro 6000 sm120 architecture - Might do online distillation from glm 5.2 but for now taking it one step at a time - Slept for a bit then woke up and showered - Fixed an issue with SGLang tensor parallel deadlock on GLM 5.2 architecture with MTP enabled - GLM 5.2 inference is 2-3x faster than coding plans and running on amd boxes - Spent time with family - Hung out with some friends - Recorded some yoctogpt lectures with the revamped notebook (high taste btw) - Setting up dflash training for GLM 5.2

45,447 次观看 • 29 天前

timelapse #85 (27.5 hrs): - currently cant rely on any other coding models except grok code fast 1 + grok 4 fast (for complex reasoning grok 4 fast is 20 cents for 1M tokens) - wrote qwen3-next trainer entirely from scratch to make it more managable - each piece completely done by grok-code-fast-1 in cursor as it seems to handle this task pretty well without the grok 4 fast reasoning - take on smaller problems and complete them quickly (makes it easier with 400 toks/sec over the api) - got distributed fp8 qwen3-next trainer running at 0.8 seconds per step on 8xH100s (still need to finish checkpoint loading logic) - perfect timing as the fp8 version of qwen3-next drops as im writing this - ill be in LA in 2 days (will visit SF mid way through as well) - 12.5% margarita - steak dinner with family - gained intuition on FlashAttention in very long context settings - caught up w/ Kearm h/eng and Arnie Ramesh

timelapse #85 (27.5 hrs): - currently cant rely on any other coding models except grok code fast 1 + grok 4 fast (for complex reasoning grok 4 fast is 20 cents for 1M tokens) - wrote qwen3-next trainer entirely from scratch to make it more managable - each piece completely done by grok-code-fast-1 in cursor as it seems to handle this task pretty well without the grok 4 fast reasoning - take on smaller problems and complete them quickly (makes it easier with 400 toks/sec over the api) - got distributed fp8 qwen3-next trainer running at 0.8 seconds per step on 8xH100s (still need to finish checkpoint loading logic) - perfect timing as the fp8 version of qwen3-next drops as im writing this - ill be in LA in 2 days (will visit SF mid way through as well) - 12.5% margarita - steak dinner with family - gained intuition on FlashAttention in very long context settings - caught up w/ Kearm h/eng and Arnie Ramesh

283,820 次观看 • 10 个月前

timelapse #160 (13.5 hrs) - grok 4.5 high in grok build was my main for today. just getting a vibe check. it understands my intent really well and never messed anything up. i do prefer to stay in the loop with it more than claude though. SpaceXAI really cooked! - not sure what it was, but i was able to stay in flow state with grok build for hours and hours. my memory and retention helped me get a lot done - exhausted 4th claude max 20x plan in 48 hours - training dspark drafters on my traces (kept gpu warm the whole time) - did grok 4.5 high sweeps and was quite impressed at how far the model has come. its around gpt 5.5 level (official post coming) - some very important meetings - contract work - optimizing amd kernels for glm 5.2 fp8 (mostly custom all-reduce whilst avoid deadlocks) for the tiny corp - one shotted and ran all frontier models on a new benchmark which ill be revealing shortly. its the toughest benchmark ive ever made. fable doesnt even come close to beating it - had grok 4.5 take over my minecraft C/CUDA implementation. felt like whack a mole but maybe i just wasnt using it right

timelapse #160 (13.5 hrs) - grok 4.5 high in grok build was my main for today. just getting a vibe check. it understands my intent really well and never messed anything up. i do prefer to stay in the loop with it more than claude though. SpaceXAI really cooked! - not sure what it was, but i was able to stay in flow state with grok build for hours and hours. my memory and retention helped me get a lot done - exhausted 4th claude max 20x plan in 48 hours - training dspark drafters on my traces (kept gpu warm the whole time) - did grok 4.5 high sweeps and was quite impressed at how far the model has come. its around gpt 5.5 level (official post coming) - some very important meetings - contract work - optimizing amd kernels for glm 5.2 fp8 (mostly custom all-reduce whilst avoid deadlocks) for the tiny corp - one shotted and ran all frontier models on a new benchmark which ill be revealing shortly. its the toughest benchmark ive ever made. fable doesnt even come close to beating it - had grok 4.5 take over my minecraft C/CUDA implementation. felt like whack a mole but maybe i just wasnt using it right

13,619 次观看 • 11 天前

Timelapse #158 (12 hrs) - pushed myself super hard at the gym. Body hurts. - talked with George Hotz a little bit in the tinygrad discord server - Private contract work - Started a tinygrad bounty - Made some improvements to my local RL infra (rtx pro 6000 for inference/training and 3090 for the isolated compute to have the agents run kernels). Mainly on memory usage and moving away from cold loading as much as possible (SSDs are slow) so trying out vllm sleep mode

Timelapse #158 (12 hrs) - pushed myself super hard at the gym. Body hurts. - talked with George Hotz a little bit in the tinygrad discord server - Private contract work - Started a tinygrad bounty - Made some improvements to my local RL infra (rtx pro 6000 for inference/training and 3090 for the isolated compute to have the agents run kernels). Mainly on memory usage and moving away from cold loading as much as possible (SSDs are slow) so trying out vllm sleep mode

18,280 次观看 • 15 天前

This is my favorite clip of the new Elon pod. He opens up saying xAI struggles with memory usage/bandwidth and CUDA kernel optimization (matmul, attention, MoE, etc). If you are good kernel or performance engineering in general, you should apply. Steer the world in a better direction.

This is my favorite clip of the new Elon pod. He opens up saying xAI struggles with memory usage/bandwidth and CUDA kernel optimization (matmul, attention, MoE, etc). If you are good kernel or performance engineering in general, you should apply. Steer the world in a better direction.

163,922 次观看 • 6 个月前

How DDA raymarching for graphics work.

How DDA raymarching for graphics work.

98,202 次观看 • 4 个月前

daily reminder llms aren't that complicated

daily reminder llms aren't that complicated

227,310 次观看 • 1 年前

timelapse #74 (11.5 hrs): - 95% done the most insane transformer training and inference chapter ever (competing w/ llm.c at this point) - talking with Luminal team - contract work - watching Minecraft videos while waiting for claude code and build scripts - starting learning multiple things at same time so I can parallelize chapter creation in my book based on what im feeling at a given moment - went a layer deeper into quantization: training challenges, group-wise vs block-wise vs tensor-wise vs channel-wise vs all the wises, input type vs compute type vs accumulate type vs epilogue, dealing w/ outliers

timelapse #74 (11.5 hrs): - 95% done the most insane transformer training and inference chapter ever (competing w/ llm.c at this point) - talking with Luminal team - contract work - watching Minecraft videos while waiting for claude code and build scripts - starting learning multiple things at same time so I can parallelize chapter creation in my book based on what im feeling at a given moment - went a layer deeper into quantization: training challenges, group-wise vs block-wise vs tensor-wise vs channel-wise vs all the wises, input type vs compute type vs accumulate type vs epilogue, dealing w/ outliers

121,761 次观看 • 10 个月前

timelapse #86 (15 hrs): - got my first OOM on 8xB200 node - defaulting back to grok-code-fast-1, the fastest reliable coding model with by far most intuitive instruction following, combined with grok 4 fast reasoning to plan before i let grok code work its magic - drank 2 large tim hortons iced capps, loaded myself w/ creatine, daily nootropics - tried out gpt-5-codex but it simply doesnt match the speed i require when i go deep into one thing at a time sequentially - got caught watching youtube videos in the middle, need to make sure i block any and all content that could get in my way - caught up on all book revisions so getting super ahead with other chapters - developed an overnight addiction to switching color themes in cursor - did some pair programming w/ Kearm h/eng using Tuple on free trial - applying for O-1

timelapse #86 (15 hrs): - got my first OOM on 8xB200 node - defaulting back to grok-code-fast-1, the fastest reliable coding model with by far most intuitive instruction following, combined with grok 4 fast reasoning to plan before i let grok code work its magic - drank 2 large tim hortons iced capps, loaded myself w/ creatine, daily nootropics - tried out gpt-5-codex but it simply doesnt match the speed i require when i go deep into one thing at a time sequentially - got caught watching youtube videos in the middle, need to make sure i block any and all content that could get in my way - caught up on all book revisions so getting super ahead with other chapters - developed an overnight addiction to switching color themes in cursor - did some pair programming w/ Kearm h/eng using Tuple on free trial - applying for O-1

102,928 次观看 • 9 个月前

timelapse #152 (18.5 hrs) - been seeing what magic i can do from the opus 4.8 + gpt 5.5 combo while fable is gone - studied some pendulum physics to help build intuition for sims and what type of optimizations are possible on GPUs (caching, reg, data structure level) - pushed GLM 5.2 results to - getting deeper into new kernel contract (can’t say cuz private) - spent time with a girl I like - came back and did a space w/ Adrian Dittmann

timelapse #152 (18.5 hrs) - been seeing what magic i can do from the opus 4.8 + gpt 5.5 combo while fable is gone - studied some pendulum physics to help build intuition for sims and what type of optimizations are possible on GPUs (caching, reg, data structure level) - pushed GLM 5.2 results to - getting deeper into new kernel contract (can’t say cuz private) - spent time with a girl I like - came back and did a space w/ Adrian Dittmann

16,698 次观看 • 1 个月前

synthetic data neural network has been trained

synthetic data neural network has been trained

59,949 次观看 • 6 个月前