正在加载视频...
视频加载失败
Today we're announcing #GAIA1: a 9B parameter world model, trained on 4,700 hours of driving data, able to simulate complex and diverse driving scenes from video, text and action inputs. This model is 480x larger than the preview we shared earlier this year and the results are incredible. These... show more
10 条评论

Here's a thread with some of my favourite examples!

Exciting advancements with #GAIA1, @alexgkendall! However, how will we ensure the synthetic data's diversity truly represents real-world scenarios without bias? How do we validate the correctness of AI-imagined outcomes against real-world driving nuances? Isn't there a risk of overfitting to the generated scenarios, and how do we bridge the gap between synthetic training and real-world robustness? While the promise is grand, isn't real-world data still invaluable for nuances hard to replicate synthetically?

You're right these are the key questions and where we've been focused to make sure these aren't just pretty videos but can accelerate the robustness of our driving policies. With the right approach, you can build a world model which can balance realism and diversity (and even able to train your policy adversarially).

the videos are really coherent how many seconds it can generate? do you use any driving video or just from text to video?

It can keep generating videos perpetually, so no limit to the length... here's an example of a long scene

Very impressive

You guys should be training AI's on the roads of India The model will be way better than any made up till this date.

This is awesome. Now we can generate the most absurd cases that would be at the tails of standard deviation of occurrence. Major unlock

@WholeMarsBlog Next level !! ♥️♥️

I think you are off by several orders of magnitude on the amount of training data, but having said that, the fundamental issue is the need for variance in the data, rather than just the amount of data.
