AVB

@neural_avb • 12,535 subscribers

Neural Breakdown on YT | Read research with AI: https://t.co/Ef6m4nUpcZ | Latest vid: RLMs, Post Training | Next: Reasoning SLM

Shorts

Most prompt optimizers are designed to evolve a single prompt. This algorithm literally simulates a market (auctions, bids, wallets) to optimize multi-agent prompt systems to collaboratively complete tasks!

Most prompt optimizers are designed to evolve a single prompt. This algorithm literally simulates a market (auctions, bids, wallets) to optimize multi-agent prompt systems to collaboratively complete tasks!

101,973 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Recently trained a tiny lil (135M) SLM through CPT -> SFT -> DPO -> RL for a youtube series DeepSeek-V4-Flash costs pennies and was responsible for 80% of the synthetic data, reasoning traces, and reward evals. Insane value for finetuning SLMs, and now it got a HUGE upgrade.

Recently trained a tiny lil (135M) SLM through CPT -> SFT -> DPO -> RL for a youtube series DeepSeek-V4-Flash costs pennies and was responsible for 80% of the synthetic data, reasoning traces, and reward evals. Insane value for finetuning SLMs, and now it got a HUGE upgrade.

47,115 views • 1 day ago

Damn check this new SVG generation model This looks really impressive!

Damn check this new SVG generation model This looks really impressive!

540,802 views • 3 months ago

Multi-agent RL is just the most beautiful thing when it works

Multi-agent RL is just the most beautiful thing when it works

112,560 views • 1 month ago

Deep learning bros and sisters, don't sleep on this. You can cluster millions of documents in embedding space, mass-annotate them, visualize them... basically for free and within seconds.

Deep learning bros and sisters, don't sleep on this. You can cluster millions of documents in embedding space, mass-annotate them, visualize them... basically for free and within seconds.

169,918 views • 2 months ago

Hands down the craziest RL project of my life. Competitive self playing agents learning to shoot and evade (jump/dash/duck) bullets. Trained from scratch in 4 hours on my mac. This was in 2022, before vibe coding.

Hands down the craziest RL project of my life. Competitive self playing agents learning to shoot and evade (jump/dash/duck) bullets. Trained from scratch in 4 hours on my mac. This was in 2022, before vibe coding.

356,598 views • 5 months ago

This is what you can achieve with 5-6 hours of Self-Play RL training by the way Actors view the projectiles with lidar scans, picks an action using PPO policy, and competes against past versions of itself in a iterative self-improvement loop. Made in Unity with MLAgents.

This is what you can achieve with 5-6 hours of Self-Play RL training by the way Actors view the projectiles with lidar scans, picks an action using PPO policy, and competes against past versions of itself in a iterative self-improvement loop. Made in Unity with MLAgents.

97,865 views • 2 months ago

Been thinking about what this paper really means. "Video diffusion" and "World Models" are becoming synonymous. Neural Computers are basically video diffusion world models for terminal envs and GUI. Lots of talk last week about automating Manim videos. In theory, we should be able to train these world models on a 10000 hours of diverse manim videos and "see where it goes" If a NC can generate outputs to terminal commands, it should be able to generate videos like this directly from prompt too. Without writing code.

Been thinking about what this paper really means. "Video diffusion" and "World Models" are becoming synonymous. Neural Computers are basically video diffusion world models for terminal envs and GUI. Lots of talk last week about automating Manim videos. In theory, we should be able to train these world models on a 10000 hours of diverse manim videos and "see where it goes" If a NC can generate outputs to terminal commands, it should be able to generate videos like this directly from prompt too. Without writing code.

119,949 views • 3 months ago

My favourite piece of Schmidhuber lore is when he challenged Ian Goodfellow during a NIPS presentation on GANs Live in public Deep Learning drama peaked here. You have seen nothing like this.

My favourite piece of Schmidhuber lore is when he challenged Ian Goodfellow during a NIPS presentation on GANs Live in public Deep Learning drama peaked here. You have seen nothing like this.

106,665 views • 3 months ago

Watch this 45 min video to learn how to create synthetic datasets and train tiny (100M params) local language models that expertise on narrow tasks. Code, datasets, models, harnesses all in comments.

Watch this 45 min video to learn how to create synthetic datasets and train tiny (100M params) local language models that expertise on narrow tasks. Code, datasets, models, harnesses all in comments.

39,424 views • 2 months ago

My next video will be positively awesome! - Train sub-nano 0.1B language models on custom narrow tasks - Write model SDKs/APIs that run blazingly fast in client machine. I'm talking 350 tok/s with 0.3GB peak memory - How to create synthetic datasets and ship vertical SLMs

My next video will be positively awesome! - Train sub-nano 0.1B language models on custom narrow tasks - Write model SDKs/APIs that run blazingly fast in client machine. I'm talking 350 tok/s with 0.3GB peak memory - How to create synthetic datasets and ship vertical SLMs

64,720 views • 3 months ago

One of the the coolest RLM trajectories that made me go "woah" RLMs (Minimax M3) launching subagent swarms with clear pydantic contracts, type checking, schema validation... Reduces hallucination rates and failed subagent calls. Article goes through details!

One of the the coolest RLM trajectories that made me go "woah" RLMs (Minimax M3) launching subagent swarms with clear pydantic contracts, type checking, schema validation... Reduces hallucination rates and failed subagent calls. Article goes through details!

29,966 views • 1 month ago

New RLM trajectory that blew my mind! I will use this one as the main example in the YT tutorial. I passed in a CSV containing transcripts of 320 episodes of the Lex Fridman podcast and asked it to find what his first 10 ML guests had to say about AGI. The context had 60,855,062 characters. > Main agent explored data format, understood its CSV > extracted all 320 guests, identified the first 10 ML guys (Benegio, Brockman, Goodfellow etc) > Launched parallel subagents passing just their corresponding transcripts (about 35K chars each) > Subagents performed find operations to search for AGI, read the context and returned outputs > Main agent gathered all the data, generated a summary of all AGI conversations It took 4 minutes to crunch, and the fun part is it cost me 0.2$ with Minimax-M2.5. It read 1M tokens (825K was cache hits so it was quite cheap), produced just 69K tokens (19K were reasoning). ---- My notes: - This would be basically impossible to do at this quality with a base LM. (context rot, since 99% of the data is useless) - It will cost 20x more with ReAct model (too many tasks) - It will cost 10x more with a React + Subagent model (read/write contexts instead of using symbolic variables) - I'm a happy panda. (thanks for reading)

New RLM trajectory that blew my mind! I will use this one as the main example in the YT tutorial. I passed in a CSV containing transcripts of 320 episodes of the Lex Fridman podcast and asked it to find what his first 10 ML guests had to say about AGI. The context had 60,855,062 characters. > Main agent explored data format, understood its CSV > extracted all 320 guests, identified the first 10 ML guys (Benegio, Brockman, Goodfellow etc) > Launched parallel subagents passing just their corresponding transcripts (about 35K chars each) > Subagents performed find operations to search for AGI, read the context and returned outputs > Main agent gathered all the data, generated a summary of all AGI conversations It took 4 minutes to crunch, and the fun part is it cost me 0.2$ with Minimax-M2.5. It read 1M tokens (825K was cache hits so it was quite cheap), produced just 69K tokens (19K were reasoning). ---- My notes: - This would be basically impossible to do at this quality with a base LM. (context rot, since 99% of the data is useless) - It will cost 20x more with ReAct model (too many tasks) - It will cost 10x more with a React + Subagent model (read/write contexts instead of using symbolic variables) - I'm a happy panda. (thanks for reading)

43,794 views • 5 months ago

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.

38,241 views • 5 months ago

SAM-2 is apache license - you can indeed just build crazy apps with them without fear. This weekend I ported SAM-2 models to run optimally on Apple Silicon. I have also quantized all the SAM models to 16bit, 8bit, and 4bit variants (collection below) Repo is open source and pip installable. Comes with all you'll need for interactive video segmentation - - bidirectional propagation, - positive and negative clicks, - bounding boxes, - streaming masks back to user, - lightweight fast-api server that you can extend I basically asked Codex to port over the original Facebook sam2 codebase to mlx and optimize it for mlx. And it just worked. This demo for example is a 4bit base-plus model (~100MB download) To speed up, you can also do spatial or temporal downsampling. That speeds inference on my old macbook by ~3x and the hit in parity is negligible.

SAM-2 is apache license - you can indeed just build crazy apps with them without fear. This weekend I ported SAM-2 models to run optimally on Apple Silicon. I have also quantized all the SAM models to 16bit, 8bit, and 4bit variants (collection below) Repo is open source and pip installable. Comes with all you'll need for interactive video segmentation - - bidirectional propagation, - positive and negative clicks, - bounding boxes, - streaming masks back to user, - lightweight fast-api server that you can extend I basically asked Codex to port over the original Facebook sam2 codebase to mlx and optimize it for mlx. And it just worked. This demo for example is a 4bit base-plus model (~100MB download) To speed up, you can also do spatial or temporal downsampling. That speeds inference on my old macbook by ~3x and the hit in parity is negligible.

17,363 views • 2 months ago

Watch this 50 minute video to learn low-levels of GRPO and training tiny models (<1B) on RLVR envs Also: - text-based gym envs - visual/animated tour of how GRPO works - deep dive into PPO math: we will literally see logits update with each policy update - code

Watch this 50 minute video to learn low-levels of GRPO and training tiny models (<1B) on RLVR envs Also: - text-based gym envs - visual/animated tour of how GRPO works - deep dive into PPO math: we will literally see logits update with each policy update - code

10,288 views • 3 months ago

Today, I am launching Paper Breakdown. - PBD gets you academic paper recommendations and lets you study CS/ML/AI research with LLM agents. - It highlights relevant sections directly in the actual PDF - generates flowcharts/illustrations too - we provide an in-build screenshot tool to send images to the agent directly from the paper. - we also got agentic paper search that allows you to search our database of 70,000+ CS and ML Arxiv papers in seconds using natural language. I have been building PBD for almost half a year - it all started as a means for me to keep up with research and use AI to produce visuals and scripts for my own YouTube videos. I have developed it enough to confidently recommend it to you. Visit our landing page to learn more.

Today, I am launching Paper Breakdown. - PBD gets you academic paper recommendations and lets you study CS/ML/AI research with LLM agents. - It highlights relevant sections directly in the actual PDF - generates flowcharts/illustrations too - we provide an in-build screenshot tool to send images to the agent directly from the paper. - we also got agentic paper search that allows you to search our database of 70,000+ CS and ML Arxiv papers in seconds using natural language. I have been building PBD for almost half a year - it all started as a means for me to keep up with research and use AI to produce visuals and scripts for my own YouTube videos. I have developed it enough to confidently recommend it to you. Visit our landing page to learn more.

14,579 views • 7 months ago

No more content to load