Loading video...

Video Failed to Load

Go Home

Wow, diffusion models (used in AI image generation) are also game engines - a type of world simulation. By predicting the next frame of the classic shooter DOOM, you get a playable game at 20 fps without any underlying real game engine. This video is from the diffusion model.

1,768,653 views • 1 year ago •via X (Twitter)

9 Comments

Ethan Mollick's profile picture
Ethan Mollick1 year ago

Paper and details:

Elon Musk's profile picture
Elon Musk1 year ago

Tesla can do something similar with real world video

Kristoph's profile picture
Kristoph1 year ago

I honestly don’t find this compelling. They obviously trained on a large corpus of games screenshots and it’s just generating the screen that probilistixally follows the current one. The issue is that this can only be achieved by having an initial game from which a corpus can be derived. If there is no game there is no corpus so where is the value add?

Kristoph's profile picture
Kristoph1 year ago

How does the game maintain state? When you turn around how does it know what came before?

Gurwinder's profile picture
Gurwinder1 year ago

Gives a whole new meaning to P(doom)!

George Saoulidis ⚡'s profile picture
George Saoulidis ⚡1 year ago

So you made DOOM run inside an LLM. Just say it

Rammy's profile picture
Rammy1 year ago

So you’re saying that it shows you images based on where you’re looking? As in it only renders when observed? 👀 *Tinfoil hat intensifies*

Bart Trzynadlowski's profile picture
Bart Trzynadlowski1 year ago

Amazing but also supremely irritating that the source code isn’t available given that it’s based on an open source model.

Lucidyn's profile picture
Lucidyn1 year ago

Ok a few questions that i have what's the difference in power consumption between the original dos release the doom 64 bit release and the AI generated version here. Its cool but i am guessing it will come at a massive cost showing a traditional rendering will be more efficient.

Related Videos

At Avalon we are building "Real-time creating" - the ability to generate gameplay ready persistent worlds prompted from text. While others are building real-time video world models, Avalon is building real-time world generation inside a fully playable, persistent multiplayer engine. Internally running at 3840×2180 at 60 FPS. Built on Unreal Engine. Multiplayer by default. Persistent by default. Gameplay-ready by default. This is not a video latent replay. Not a simulation of interaction. It is a real 3D world with physics, logic, and authoritative multiplayer state. Avalon is trained on proprietary Avalon interaction data and powered by a hybrid system that combines language understanding, 3D model generation, procedural systems, and structured gameplay logic synthesis. Players can walk through a live world and generate environments, assets, mechanics, and entirely new gameplay modes using natural language. We accomplish this through a combination of 3D model generation, game logic generation based on our proprietary systems, and AI driven world creation. While other players are inside it. Changes persist instantly. State is synchronized in real time. Creation happens inside the world, not outside of it. Describe a biome. Spawn a civilization. Create a survival mode. Build a dungeon crawler. Launch a new game inside the world. Avalon interprets intent and integrates it directly into the live multiplayer environment. This is not a world model predicting video. This is a gameplay engine that understands language. If you can describe it, you can build it. And others can walk into it instantly.

AVALON

59,410 views • 4 months ago

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

AK

366,948 views • 1 year ago