正在加载视频...

视频加载失败

Wow, diffusion models (used in AI image generation) are also game engines - a type of world simulation. By predicting the next frame of the classic shooter DOOM, you get a playable game at 20 fps without any underlying real game engine. This video is from the diffusion model.

1,768,653 次观看 • 1 年前 •via X (Twitter)

9 条评论

Ethan Mollick 的头像
Ethan Mollick1 年前

Paper and details:

Elon Musk 的头像
Elon Musk1 年前

Tesla can do something similar with real world video

Kristoph 的头像
Kristoph1 年前

I honestly don’t find this compelling. They obviously trained on a large corpus of games screenshots and it’s just generating the screen that probilistixally follows the current one. The issue is that this can only be achieved by having an initial game from which a corpus can be derived. If there is no game there is no corpus so where is the value add?

Kristoph 的头像
Kristoph1 年前

How does the game maintain state? When you turn around how does it know what came before?

Gurwinder 的头像
Gurwinder1 年前

Gives a whole new meaning to P(doom)!

George Saoulidis ⚡ 的头像
George Saoulidis ⚡1 年前

So you made DOOM run inside an LLM. Just say it

Rammy 的头像
Rammy1 年前

So you’re saying that it shows you images based on where you’re looking? As in it only renders when observed? 👀 *Tinfoil hat intensifies*

Bart Trzynadlowski 的头像
Bart Trzynadlowski1 年前

Amazing but also supremely irritating that the source code isn’t available given that it’s based on an open source model.

Lucidyn 的头像
Lucidyn1 年前

Ok a few questions that i have what's the difference in power consumption between the original dos release the doom 64 bit release and the AI generated version here. Its cool but i am guessing it will come at a massive cost showing a traditional rendering will be more efficient.

相关视频

At Avalon we are building "Real-time creating" - the ability to generate gameplay ready persistent worlds prompted from text. While others are building real-time video world models, Avalon is building real-time world generation inside a fully playable, persistent multiplayer engine. Internally running at 3840×2180 at 60 FPS. Built on Unreal Engine. Multiplayer by default. Persistent by default. Gameplay-ready by default. This is not a video latent replay. Not a simulation of interaction. It is a real 3D world with physics, logic, and authoritative multiplayer state. Avalon is trained on proprietary Avalon interaction data and powered by a hybrid system that combines language understanding, 3D model generation, procedural systems, and structured gameplay logic synthesis. Players can walk through a live world and generate environments, assets, mechanics, and entirely new gameplay modes using natural language. We accomplish this through a combination of 3D model generation, game logic generation based on our proprietary systems, and AI driven world creation. While other players are inside it. Changes persist instantly. State is synchronized in real time. Creation happens inside the world, not outside of it. Describe a biome. Spawn a civilization. Create a survival mode. Build a dungeon crawler. Launch a new game inside the world. Avalon interprets intent and integrates it directly into the live multiplayer environment. This is not a world model predicting video. This is a gameplay engine that understands language. If you can describe it, you can build it. And others can walk into it instantly.

AVALON

59,487 次观看 • 4 个月前

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

AK

366,948 次观看 • 1 年前