
AVB
@neural_avb • 11,749 subscribers
Neural Breakdown on YT | Read research with AI: https://t.co/Ef6m4nUpcZ | Latest vid: RLMs, Synthetic data gen | Next: DPO
Shorts
Videos

Damn check this new SVG generation model This looks really impressive!
AVB539,908 views • 2 months ago

This is what you can achieve with 5-6 hours of Self-Play RL training by the way Actors view the projectiles with lidar scans, picks an action using PPO policy, and competes against past versions of itself in a iterative self-improvement loop. Made in Unity with MLAgents.
AVB97,865 views • 1 month ago

Been thinking about what this paper really means. "Video diffusion" and "World Models" are becoming synonymous. Neural Computers are basically video diffusion world models for terminal envs and GUI. Lots of talk last week about automating Manim videos. In theory, we should be able to train these world models on a 10000 hours of diverse manim videos and "see where it goes" If a NC can generate outputs to terminal commands, it should be able to generate videos like this directly from prompt too. Without writing code.
AVB119,949 views • 2 months ago

My next video will be positively awesome! - Train sub-nano 0.1B language models on custom narrow tasks - Write model SDKs/APIs that run blazingly fast in client machine. I'm talking 350 tok/s with 0.3GB peak memory - How to create synthetic datasets and ship vertical SLMs
AVB64,720 views • 2 months ago

SAM-2 is apache license - you can indeed just build crazy apps with them without fear. This weekend I ported SAM-2 models to run optimally on Apple Silicon. I have also quantized all the SAM models to 16bit, 8bit, and 4bit variants (collection below) Repo is open source and pip installable. Comes with all you'll need for interactive video segmentation - - bidirectional propagation, - positive and negative clicks, - bounding boxes, - streaming masks back to user, - lightweight fast-api server that you can extend I basically asked Codex to port over the original Facebook sam2 codebase to mlx and optimize it for mlx. And it just worked. This demo for example is a 4bit base-plus model (~100MB download) To speed up, you can also do spatial or temporal downsampling. That speeds inference on my old macbook by ~3x and the hit in parity is negligible.
AVB17,227 views • 28 days ago

New RLM trajectory that blew my mind! I will use this one as the main example in the YT tutorial. I passed in a CSV containing transcripts of 320 episodes of the Lex Fridman podcast and asked it to find what his first 10 ML guests had to say about AGI. The context had 60,855,062 characters. > Main agent explored data format, understood its CSV > extracted all 320 guests, identified the first 10 ML guys (Benegio, Brockman, Goodfellow etc) > Launched parallel subagents passing just their corresponding transcripts (about 35K chars each) > Subagents performed find operations to search for AGI, read the context and returned outputs > Main agent gathered all the data, generated a summary of all AGI conversations It took 4 minutes to crunch, and the fun part is it cost me 0.2$ with Minimax-M2.5. It read 1M tokens (825K was cache hits so it was quite cheap), produced just 69K tokens (19K were reasoning). ---- My notes: - This would be basically impossible to do at this quality with a base LM. (context rot, since 99% of the data is useless) - It will cost 20x more with ReAct model (too many tasks) - It will cost 10x more with a React + Subagent model (read/write contexts instead of using symbolic variables) - I'm a happy panda. (thanks for reading)
AVB38,963 views • 4 months ago

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.
AVB38,241 views • 4 months ago

Today, I am launching Paper Breakdown. - PBD gets you academic paper recommendations and lets you study CS/ML/AI research with LLM agents. - It highlights relevant sections directly in the actual PDF - generates flowcharts/illustrations too - we provide an in-build screenshot tool to send images to the agent directly from the paper. - we also got agentic paper search that allows you to search our database of 70,000+ CS and ML Arxiv papers in seconds using natural language. I have been building PBD for almost half a year - it all started as a means for me to keep up with research and use AI to produce visuals and scripts for my own YouTube videos. I have developed it enough to confidently recommend it to you. Visit our landing page to learn more.
AVB14,579 views • 5 months ago
No more content to load