Hao AI Lab's banner

Hao AI Lab

@haoailab • 6,586 subscribers

Hao AI Lab at UCSD. Our mission is to democratize large machine learning models, algorithms, and their underlying systems.

Shorts

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: (Thanks to @gmicloud for the support!) 🔗 Blog: 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: (Thanks to @gmicloud for the support!) 🔗 Blog: 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

78,660 просмотров

Reasoning models often waste tokens self-doubting. Dynasor saves you up to 81% tokens to arrive at the correct answer! 🧠✂️ - Probe the model halfway to get the certainty - Use Certainty to stop reasoning - 100% Training-Free, Plug-and-play 🎮Demo:

Reasoning models often waste tokens self-doubting. Dynasor saves you up to 81% tokens to arrive at the correct answer! 🧠✂️ - Probe the model halfway to get the certainty - Use Certainty to stop reasoning - 100% Training-Free, Plug-and-play 🎮Demo:

99,563 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

🚀Generate a 30-second 1080p video in just 7 seconds! We’re open-sourcing FastVideo Dreamverse: real-time vibe directing for video generation on a single NVIDIA B200 GPU with LTX-2 model LTX Repo: Blog:

🚀Generate a 30-second 1080p video in just 7 seconds! We’re open-sourcing FastVideo Dreamverse: real-time vibe directing for video generation on a single NVIDIA B200 GPU with LTX-2 model LTX Repo: Blog:

225,241 просмотров • 1 месяц назад

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.

999,297 просмотров • 1 год назад

(1/N) We're launching Dreamverse. Most AI video models take minutes to generate a 5 s 1080p clip. In 4.5 seconds, we can generate 30 s 1080p clips on a single GPU. Our videos generate faster than you can watch them: stop waiting on prompts and start directing scenes live. 🕹️Demo: 📑 Blog: Welcome to the era of vibe-directing 👇

(1/N) We're launching Dreamverse. Most AI video models take minutes to generate a 5 s 1080p clip. In 4.5 seconds, we can generate 30 s 1080p clips on a single GPU. Our videos generate faster than you can watch them: stop waiting on prompts and start directing scenes live. 🕹️Demo: 📑 Blog: Welcome to the era of vibe-directing 👇

90,330 просмотров • 4 месяцев назад

Claude-3.7 was tested on Pokémon Red, but what about more real-time games like Super Mario 🍄🌟? We threw AI gaming agents into LIVE Super Mario games and found Claude-3.7 outperformed other models with simple heuristics. 🤯 Claude-3.5 is also strong, but less capable of planning complex maneuvers. Gemini-1.5-pro and GPT-4o perform less well.

Claude-3.7 was tested on Pokémon Red, but what about more real-time games like Super Mario 🍄🌟? We threw AI gaming agents into LIVE Super Mario games and found Claude-3.7 outperformed other models with simple heuristics. 🤯 Claude-3.5 is also strong, but less capable of planning complex maneuvers. Gemini-1.5-pro and GPT-4o perform less well.

234,336 просмотров • 1 год назад

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: Code: Checkpoints:

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: Code: Checkpoints:

37,512 просмотров • 3 месяцев назад

(1/5) 5 seconds of video. 1.8s seconds of generation. One NVIDIA GeForce RTX 5090 on FastVideo. 🤯🚀 - FastWan-QAD, a new family of video generation models - Trained with FastVideo's Quantization-Aware Distillation (QAD) recipe. - Powered by FastVideo, we push a single NVIDIA GeForce RTX 5090 to its absolute limit: generating a 5-second 480P video in 1.8s end-to-end! 📜 Blog: 💻 Code: 💽 Model:

(1/5) 5 seconds of video. 1.8s seconds of generation. One NVIDIA GeForce RTX 5090 on FastVideo. 🤯🚀 - FastWan-QAD, a new family of video generation models - Trained with FastVideo's Quantization-Aware Distillation (QAD) recipe. - Powered by FastVideo, we push a single NVIDIA GeForce RTX 5090 to its absolute limit: generating a 5-second 480P video in 1.8s end-to-end! 📜 Blog: 💻 Code: 💽 Model:

12,020 просмотров • 27 дней назад

(1/N) Content creators have been stuck with costly and slow video generation APIs for far too long. We couldn’t take it anymore.😅😭 FastVideo’s new real-time inference stack has the fastest 1080p TI2AV pipeline ever.😍🚀🚀 Our optimized LTX-2.3 pipeline creates 5-second 1080p videos with audio in 4.55 s, on a single GPU! 3.9x faster than the next fastest option. 🕹️Live demo: 📜Blog:

(1/N) Content creators have been stuck with costly and slow video generation APIs for far too long. We couldn’t take it anymore.😅😭 FastVideo’s new real-time inference stack has the fastest 1080p TI2AV pipeline ever.😍🚀🚀 Our optimized LTX-2.3 pipeline creates 5-second 1080p videos with audio in 4.55 s, on a single GPU! 3.9x faster than the next fastest option. 🕹️Live demo: 📜Blog:

29,601 просмотров • 4 месяцев назад

🔥 Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1️⃣ Navigation tasks are too hard. 2️⃣ Combat control is too simple. 3️⃣ Raising a strong Pokémon team is slow and expensive as an eval. We find most of the problems are not fundamental to games themselves, but how they have been used. We believe game-as-an-eval remains a compelling and underutilized evaluation strategy. We introduce Lmgame Bench to standardize game-as-an-eval. More details and findings in our blogpost:

🔥 Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1️⃣ Navigation tasks are too hard. 2️⃣ Combat control is too simple. 3️⃣ Raising a strong Pokémon team is slow and expensive as an eval. We find most of the problems are not fundamental to games themselves, but how they have been used. We believe game-as-an-eval remains a compelling and underutilized evaluation strategy. We introduce Lmgame Bench to standardize game-as-an-eval. More details and findings in our blogpost:

69,004 просмотров • 1 год назад

[1/N]🚀New decoding paradigm drop!🚀 Introducing Lookahead Reasoning(LR): step-level speculation that stacks with Speculative Decoding(SD). It has been accepted to #NeurIPS2025 🎉 📖 Blog: 💻 Code: 📄 Paper:

[1/N]🚀New decoding paradigm drop!🚀 Introducing Lookahead Reasoning(LR): step-level speculation that stacks with Speculative Decoding(SD). It has been accepted to #NeurIPS2025 🎉 📖 Blog: 💻 Code: 📄 Paper:

43,177 просмотров • 10 месяцев назад

🎥 Frustrated by Sora's credit limits? Still waiting for Veo 2? 🚀 Open-source video DiTs are actually on par. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P 5-second video in 62 seconds.

🎥 Frustrated by Sora's credit limits? Still waiting for Veo 2? 🚀 Open-source video DiTs are actually on par. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P 5-second video in 62 seconds.

69,735 просмотров • 1 год назад

🎥 Videos DiTs are painfully slow, HunyuanVideo takes 16 min to generate a 5s 720P video on H100. 🤯 Announcing Sliding Tile Attention (STA): * Accelerate 3D full attention (FA3) by up to 10x * Slash the end-to-end time from 16 --> 5 mins * NO extra training. NO quality loss! 🚀 Can you tell which videos are generated by the original HunyuanVideo, and which by STA? 👀 Blog:

🎥 Videos DiTs are painfully slow, HunyuanVideo takes 16 min to generate a 5s 720P video on H100. 🤯 Announcing Sliding Tile Attention (STA): * Accelerate 3D full attention (FA3) by up to 10x * Slash the end-to-end time from 16 --> 5 mins * NO extra training. NO quality loss! 🚀 Can you tell which videos are generated by the original HunyuanVideo, and which by STA? 👀 Blog:

58,034 просмотров • 1 год назад

🔧🤖 New wave of open-source LLMs like Deekseek-R1-0528 and Qwen3-235B-A22B are leveling up with stronger agentic performance. We test them in head-to-head gameplay — the upgraded Deekseek-R1-0528 outsmarts strong reasoning models like o4-mini across several games and it nearly matches SOTA performance on Tetris, going toe-to-toe with o3. ✨🧠 Check out how R1 manages to clear lines in Tetris while other models still struggle 👇

🔧🤖 New wave of open-source LLMs like Deekseek-R1-0528 and Qwen3-235B-A22B are leveling up with stronger agentic performance. We test them in head-to-head gameplay — the upgraded Deekseek-R1-0528 outsmarts strong reasoning models like o4-mini across several games and it nearly matches SOTA performance on Tetris, going toe-to-toe with o3. ✨🧠 Check out how R1 manages to clear lines in Tetris while other models still struggle 👇

36,265 просмотров • 1 год назад

LLaMA-4 Maverick performs well on reasoning benchmarks and ranks 2nd on the Chatbot Arena, yet its true performance remains controversial. What if we put them in a transparent gaming environment? 🎮 Our benchmark tells a different story...🤔 Will true intelligence shine through play? Let’s find out 👇

LLaMA-4 Maverick performs well on reasoning benchmarks and ranks 2nd on the Chatbot Arena, yet its true performance remains controversial. What if we put them in a transparent gaming environment? 🎮 Our benchmark tells a different story...🤔 Will true intelligence shine through play? Let’s find out 👇

39,050 просмотров • 1 год назад

[1/5] Do you know random gameplay in 2048 can yield 128 tiles and only 1% of human gameplay can reach a 2048 tile? Check out how today’s top AI models compare! ⚖️ For top reasoning models, the results were wild: only Claude-3.7 (with reasoning) and o1 managed to outperform random moves, achieving a 256 tile in 114 and 116 steps respectively! 😱

[1/5] Do you know random gameplay in 2048 can yield 128 tiles and only 1% of human gameplay can reach a 2048 tile? Check out how today’s top AI models compare! ⚖️ For top reasoning models, the results were wild: only Claude-3.7 (with reasoning) and o1 managed to outperform random moves, achieving a 256 tile in 114 and 116 steps respectively! 😱

26,104 просмотров • 1 год назад

You might have heard top reasoning models now match AIME gold medalists in 2025 🏅, but watch them crumble in box-pushing Sokoban (倉庫番) from the 80s! 🧩 Again, we put top reasoning models into the game, o3-mini (medium) took the crown, reaching level 4 before tangled with just two boxes. 😵‍💫 Claude-3.7-thinking managed two levels, Deepseek-R1 cleared one level. Gemini-2.0-flash-thinking solved none.

You might have heard top reasoning models now match AIME gold medalists in 2025 🏅, but watch them crumble in box-pushing Sokoban (倉庫番) from the 80s! 🧩 Again, we put top reasoning models into the game, o3-mini (medium) took the crown, reaching level 4 before tangled with just two boxes. 😵‍💫 Claude-3.7-thinking managed two levels, Deepseek-R1 cleared one level. Gemini-2.0-flash-thinking solved none.

23,760 просмотров • 1 год назад

🚨 New Challenger: GROK joins the Game Arena Benchmark! We evaluated Grok3-mini-beta: thinkining on four games: 🧩 2048 | 🧱 Sokoban | 🍬 Candy Crush | 🎮 Phoenix Wright With fast progress, it’s already comparable to top models like OpenAI’s O1, previous O3-mini, and Gemini-2.5-Pro, ranked 🥇 1st in 2048 and 🥈 2nd in Sokoban. But the Grok3-beta Thinking API is still unreleased—leaving an unanswered question: Can it match or surpass O3’s dominant reasoning performance? For now, let’s take a closer look at how Grok3-mini-beta: thinking performs in gameplay so far. 👇

🚨 New Challenger: GROK joins the Game Arena Benchmark! We evaluated Grok3-mini-beta: thinkining on four games: 🧩 2048 | 🧱 Sokoban | 🍬 Candy Crush | 🎮 Phoenix Wright With fast progress, it’s already comparable to top models like OpenAI’s O1, previous O3-mini, and Gemini-2.5-Pro, ranked 🥇 1st in 2048 and 🥈 2nd in Sokoban. But the Grok3-beta Thinking API is still unreleased—leaving an unanswered question: Can it match or surpass O3’s dominant reasoning performance? For now, let’s take a closer look at how Grok3-mini-beta: thinking performs in gameplay so far. 👇

19,487 просмотров • 1 год назад

We built Gaming agents to run platformers and puzzle video games in real time. Check out our demos and try our repo yourself to customize your own gaming agent! 🎮 In addition to Super Mario Bros, we also support 2048, as well as Tetris. More games are coming soon! 👾

We built Gaming agents to run platformers and puzzle video games in real time. Check out our demos and try our repo yourself to customize your own gaming agent! 🎮 In addition to Super Mario Bros, we also support 2048, as well as Tetris. More games are coming soon! 👾

17,238 просмотров • 1 год назад

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog:

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog:

15,003 просмотров • 1 год назад

This week, we tested 3 latest models in our Game Arena Benchmark: → O3 → O4-mini → Gemini 2.5 Flash Across 4 games—Phoenix Wright, Sokoban, Candy Crush, and 2048—O3 dominated the zero-shot leaderboard, ranking #1 or #2 in nearly every task and outperforming previous SOTA models like O3-mini and Gemini 2.5 Flash. 🔥 Beyond our customized tests, we took on a real challenge: Sokoban (1989)—the classic, unforgiving original. 🗣️ Many say O3 shows strong image reasoning, but does it really? Let’s see what our Game Arena Benchmark reveals. 👇

This week, we tested 3 latest models in our Game Arena Benchmark: → O3 → O4-mini → Gemini 2.5 Flash Across 4 games—Phoenix Wright, Sokoban, Candy Crush, and 2048—O3 dominated the zero-shot leaderboard, ranking #1 or #2 in nearly every task and outperforming previous SOTA models like O3-mini and Gemini 2.5 Flash. 🔥 Beyond our customized tests, we took on a real challenge: Sokoban (1989)—the classic, unforgiving original. 🗣️ Many say O3 shows strong image reasoning, but does it really? Let’s see what our Game Arena Benchmark reveals. 👇

14,636 просмотров • 1 год назад