Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Wow, diffusion models (used in AI image generation) are also game engines - a type of world simulation. By predicting the next frame of the classic shooter DOOM, you get a playable game at 20 fps without any underlying real game engine. This video is from the diffusion model.

Ethan Mollick

361,263 subscribers

1,768,653 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

9 Kommentare

Profilbild von Ethan Mollick

Ethan Mollickvor 1 Jahr

Paper and details:

Profilbild von Elon Musk

Elon Muskvor 1 Jahr

Tesla can do something similar with real world video

Profilbild von Kristoph

Kristophvor 1 Jahr

I honestly don’t find this compelling. They obviously trained on a large corpus of games screenshots and it’s just generating the screen that probilistixally follows the current one. The issue is that this can only be achieved by having an initial game from which a corpus can be derived. If there is no game there is no corpus so where is the value add?

Profilbild von Kristoph

Kristophvor 1 Jahr

How does the game maintain state? When you turn around how does it know what came before?

Profilbild von Gurwinder

Gurwindervor 1 Jahr

Gives a whole new meaning to P(doom)!

Profilbild von George Saoulidis ⚡

George Saoulidis ⚡vor 1 Jahr

So you made DOOM run inside an LLM. Just say it

Profilbild von Rammy

Rammyvor 1 Jahr

So you’re saying that it shows you images based on where you’re looking? As in it only renders when observed? 👀 *Tinfoil hat intensifies*

Profilbild von Bart Trzynadlowski

Bart Trzynadlowskivor 1 Jahr

Amazing but also supremely irritating that the source code isn’t available given that it’s based on an open source model.

Profilbild von Lucidyn

Lucidynvor 1 Jahr

Ok a few questions that i have what's the difference in power consumption between the original dos release the doom 64 bit release and the AI generated version here. Its cool but i am guessing it will come at a massive cost showing a traditional rendering will be more efficient.

Ähnliche Videos

This may look like a game of Counterstrike running slowly, but it actually extraordinary. The entire game is created, frame-by-frame, on my *home computer* by an AI diffusion model in response to my actions. There is no game engine, just a "world model" trained on Counterstrike

This may look like a game of Counterstrike running slowly, but it actually extraordinary. The entire game is created, frame-by-frame, on my home computer by an AI diffusion model in response to my actions. There is no game engine, just a "world model" trained on Counterstrike

Ethan Mollick

376,399 Aufrufe • vor 1 Jahr

Google researchers just developed GameNGen, an AI that can simulate DOOM at over 20 fps It works by predicting each frame in real time with a diffusion model At scale this could mean AI will be able to create games on the fly, personalized to each player

Google researchers just developed GameNGen, an AI that can simulate DOOM at over 20 fps It works by predicting each frame in real time with a diffusion model At scale this could mean AI will be able to create games on the fly, personalized to each player

Rowan Cheung

880,304 Aufrufe • vor 1 Jahr

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 Aufrufe • vor 2 Jahren

Recap on Tencent Hunyuan Game: the first industrial-grade AIGC engine and AI-model series for game asset generation. Beyond the features currently available in the Hunyuan Game platform, the team is actively developing more capabilities (many not yet live). These advancements were unveiled in the Hunyuan Game technical report, which details four image-generation models and five video-generation models for game visuals. Watch the video for a glimpse of these innovations and check out the technical report to learn more 👉🏻

Recap on Tencent Hunyuan Game: the first industrial-grade AIGC engine and AI-model series for game asset generation. Beyond the features currently available in the Hunyuan Game platform, the team is actively developing more capabilities (many not yet live). These advancements were unveiled in the Hunyuan Game technical report, which details four image-generation models and five video-generation models for game visuals. Watch the video for a glimpse of these innovations and check out the technical report to learn more 👉🏻

Hunyuan

46,802 Aufrufe • vor 1 Jahr

At Avalon we are building "Real-time creating" - the ability to generate gameplay ready persistent worlds prompted from text. While others are building real-time video world models, Avalon is building real-time world generation inside a fully playable, persistent multiplayer engine. Internally running at 3840×2180 at 60 FPS. Built on Unreal Engine. Multiplayer by default. Persistent by default. Gameplay-ready by default. This is not a video latent replay. Not a simulation of interaction. It is a real 3D world with physics, logic, and authoritative multiplayer state. Avalon is trained on proprietary Avalon interaction data and powered by a hybrid system that combines language understanding, 3D model generation, procedural systems, and structured gameplay logic synthesis. Players can walk through a live world and generate environments, assets, mechanics, and entirely new gameplay modes using natural language. We accomplish this through a combination of 3D model generation, game logic generation based on our proprietary systems, and AI driven world creation. While other players are inside it. Changes persist instantly. State is synchronized in real time. Creation happens inside the world, not outside of it. Describe a biome. Spawn a civilization. Create a survival mode. Build a dungeon crawler. Launch a new game inside the world. Avalon interprets intent and integrates it directly into the live multiplayer environment. This is not a world model predicting video. This is a gameplay engine that understands language. If you can describe it, you can build it. And others can walk into it instantly.

At Avalon we are building "Real-time creating" - the ability to generate gameplay ready persistent worlds prompted from text. While others are building real-time video world models, Avalon is building real-time world generation inside a fully playable, persistent multiplayer engine. Internally running at 3840×2180 at 60 FPS. Built on Unreal Engine. Multiplayer by default. Persistent by default. Gameplay-ready by default. This is not a video latent replay. Not a simulation of interaction. It is a real 3D world with physics, logic, and authoritative multiplayer state. Avalon is trained on proprietary Avalon interaction data and powered by a hybrid system that combines language understanding, 3D model generation, procedural systems, and structured gameplay logic synthesis. Players can walk through a live world and generate environments, assets, mechanics, and entirely new gameplay modes using natural language. We accomplish this through a combination of 3D model generation, game logic generation based on our proprietary systems, and AI driven world creation. While other players are inside it. Changes persist instantly. State is synchronized in real time. Creation happens inside the world, not outside of it. Describe a biome. Spawn a civilization. Create a survival mode. Build a dungeon crawler. Launch a new game inside the world. Avalon interprets intent and integrates it directly into the live multiplayer environment. This is not a world model predicting video. This is a gameplay engine that understands language. If you can describe it, you can build it. And others can walk into it instantly.

AVALON

59,524 Aufrufe • vor 4 Monaten

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

AK

31,997 Aufrufe • vor 2 Jahren

Introducing MirageLSD: The First Live-Stream Diffusion (LSD) AI Model Input any video stream, from a camera or video chat to a computer screen or game, and transform it into any world you desire, in real-time (<40ms latency). Here’s how it works (w/ demo you can use!):

Introducing MirageLSD: The First Live-Stream Diffusion (LSD) AI Model Input any video stream, from a camera or video chat to a computer screen or game, and transform it into any world you desire, in real-time (<40ms latency). Here’s how it works (w/ demo you can use!):

Decart

1,008,611 Aufrufe • vor 11 Monaten

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

AK

366,948 Aufrufe • vor 1 Jahr

working towards action-conditioned video diffusion models for now this is just cs2 gameplay, and i parse the keypresses from the .dem game file next steps will be to work on a scalable data loader then coding the model

working towards action-conditioned video diffusion models for now this is just cs2 gameplay, and i parse the keypresses from the .dem game file next steps will be to work on a scalable data loader then coding the model

Arnie Ramesh

31,090 Aufrufe • vor 4 Monaten

This video explains how diffusion models are overtaking Large Language Models for generation tasks like: 1. Code Generation 2. Image Generation 3. Video Generation 00:00 Agenda 00:20 How are they different from LLMs? 05:09 Internal Mechanism 10:09 How are vectors generated? 12:08 Conclusion 13:02 Opinion Piece AI Engineering Course: #Diffusion #AI #LLMs

This video explains how diffusion models are overtaking Large Language Models for generation tasks like: 1. Code Generation 2. Image Generation 3. Video Generation 00:00 Agenda 00:20 How are they different from LLMs? 05:09 Internal Mechanism 10:09 How are vectors generated? 12:08 Conclusion 13:02 Opinion Piece AI Engineering Course: #Diffusion #AI #LLMs

Gaurav Sen

27,927 Aufrufe • vor 7 Monaten

This is the end of developers Lovable just launched new AI agent, which can literally build any app, game, or extension without coding under 10 minutes. I cloned 50 Apps and games just by writing a prompt. 1: Bubble Shooter Game (Playable)

This is the end of developers Lovable just launched new AI agent, which can literally build any app, game, or extension without coding under 10 minutes. I cloned 50 Apps and games just by writing a prompt. 1: Bubble Shooter Game (Playable)

Hamza Khalid

363,602 Aufrufe • vor 11 Monaten

AI DOOM: Fully AI Generated Video Games Are HERE! Google's AI DOOM project generates games with NO GAME ENGINE. This will be the future of video games. Here's a full breakdown:

AI DOOM: Fully AI Generated Video Games Are HERE! Google's AI DOOM project generates games with NO GAME ENGINE. This will be the future of video games. Here's a full breakdown:

Matthew Berman

79,239 Aufrufe • vor 1 Jahr

Introducing The Matrix --- a foundation world model for generating infinite-length, hyper-realistic videos with real-time, frame-level control: - Infinite-length video generation - 720p high-quality rendering - Real-time, frame-level control at 16 FPS - Generalization to real-world video control 🔗Blog: 📄Paper: 💻Code & Playable Demo: Coming soon! Key Innovation: A brand new technique called the shift-window denoise process model, enabling auto-regressive generation for diffusion and consistency models in real-time. Special thanks to project leader Ruili Feng and the entire Matrix team for their dedication and hard work over the year-long project.

Hongyang Zhang

178,322 Aufrufe • vor 1 Jahr

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

AK

294,442 Aufrufe • vor 2 Jahren

Google presents VLOGGER Multimodal Diffusion for Embodied Avatar Synthesis We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of

AK

66,375 Aufrufe • vor 2 Jahren

GameFactory Creating New Games with Generative Interactive Videos present GameFactory, a generalizable world model that learns from a small-scale dataset of Minecraft game videos. By leveraging the prior knowledge of a pretrained video diffusion model, it can create new games in an open domain.

GameFactory Creating New Games with Generative Interactive Videos present GameFactory, a generalizable world model that learns from a small-scale dataset of Minecraft game videos. By leveraging the prior knowledge of a pretrained video diffusion model, it can create new games in an open domain.

AK

70,029 Aufrufe • vor 1 Jahr

Genie-3 just achieved what AAA game engines do - but WITHOUT any 3D models. Interactive REAL-TIME video generation @ 24 fps Wild how this model figured out complex effects like exposure shifts, volumetric god rays, and phenomena we need to code explicitly in 3D engines TL;DR 🧵

Genie-3 just achieved what AAA game engines do - but WITHOUT any 3D models. Interactive REAL-TIME video generation @ 24 fps Wild how this model figured out complex effects like exposure shifts, volumetric god rays, and phenomena we need to code explicitly in 3D engines TL;DR 🧵

Bilawal Sidhu

324,666 Aufrufe • vor 10 Monaten

Just a month later and the first AI model that can generate live and interactive playable dynamic worlds is here It's called GameNGen and is trained on the classic game DOOM

Just a month later and the first AI model that can generate live and interactive playable dynamic worlds is here It's called GameNGen and is trained on the classic game DOOM

@levelsio

519,154 Aufrufe • vor 1 Jahr

Diffusion is the foundational ML framework behind state-of-the-art AI image and video generation, including Sora, Midjourney and Google Veo. In this episode of Decoded, Ankit Gupta sits down with Francois Chaubard to discuss how diffusion works, walk through a code sample, and explain why everyone training models should understand it. 00:00 Intro 00:33 What is diffusion? 02:50 What are applications of diffusion today? 04:06 Key innovations 07:01 Code examples 19:25 The "squint test" 22:27 Other areas diffusion is widely accessible 24:49 Outro

Diffusion is the foundational ML framework behind state-of-the-art AI image and video generation, including Sora, Midjourney and Google Veo. In this episode of Decoded, Ankit Gupta sits down with Francois Chaubard to discuss how diffusion works, walk through a code sample, and explain why everyone training models should understand it. 00:00 Intro 00:33 What is diffusion? 02:50 What are applications of diffusion today? 04:06 Key innovations 07:01 Code examples 19:25 The "squint test" 22:27 Other areas diffusion is widely accessible 24:49 Outro

Y Combinator

74,604 Aufrufe • vor 4 Monaten

Another day, another odd and old-school technical issue in #Nioh3. This game doesn’t just expect the frame rate to hit its FPS cap to work properly, it also wants the FPS to line up with the display refresh rate. Using the 60 fps cap at 120Hz (or higher) can result in lots of frame-time spikes and microstuttering. the only way to avoid this is to switch the refresh rate to 60 Hz, which gives a much smoother frame time graph. Frame Generation on top of a 60 FPS base at 120 Hz works flawlessly without any frame time issues.

Another day, another odd and old-school technical issue in #Nioh3. This game doesn’t just expect the frame rate to hit its FPS cap to work properly, it also wants the FPS to line up with the display refresh rate. Using the 60 fps cap at 120Hz (or higher) can result in lots of frame-time spikes and microstuttering. the only way to avoid this is to switch the refresh rate to 60 Hz, which gives a much smoother frame time graph. Frame Generation on top of a 60 FPS base at 120 Hz works flawlessly without any frame time issues.

BenchmarKing

37,677 Aufrufe • vor 4 Monaten