Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

I'm playing around with generative AI tools and stitching them together into visual stories. Here I took the first few sentences of Pride and Prejudice and made it into a video. The gen stack used for this one: - Anthropic Claude took the first chapter, generated the scenes and... the individual prompts to to the image generator. - Ideogram took the prompts and generate the images - Luma took the images and animated them - for narration - VEED | AI Video Creation to stitch it together (Many of these choices are just what I happened to use for this one while exploring a bunch of things). Anyway honestly it was pretty messy and there is a ton of copy pasting between all of the tools, and even this little video with 3 scenes took me about an hour. There is a huge storytelling opportunity here for whoever can make this convenient. Who is building the first 100% AI-native movie maker?show more

Andrej Karpathy

3,210,067 subscribers

608,574 views • 2 years ago •via X (Twitter)

Science & Technology Arts Education

Anya Rossi• Live Now

Private livecam show

10 Comments

Andrej Karpathy1 year ago

Tried Runway Gen-3 now that they support image prompting. A lot better results on this scene. Dam this is fun. Now if I just tweak the prompt a little more and roll the dice again...

near2 years ago

@AnthropicAI and just like that, an entire new genre of youtube videos was created from a single tweet

illusion diffusion2 years ago

@AnthropicAI 1. stable diffusion checkpoint/lora from @HelloCivitaior or @midjourney 2. @runwayml gen 3 or @KlingAIOfficial for image-to-vid 3. @elevenlabsio for voiceover/soundfx/folly 4. @sudo_ai or @udiomusic for bg music 5. non ai/premiere text overlay (no logo)

Andrej Karpathy2 years ago

@AnthropicAI @midjourney @runwayml @KlingAIOfficial @elevenlabsio @sudo_ai @udiomusic doh I totally forgot background music fail 🤦‍♂️

Felix Wang2 years ago

@AnthropicAI Hey Andrej! I'm working on this exact problem: building an AI-native tool that integrates various models (LLMs, image, video, audio models) to help folks tell stories. Here's a video I made (the tool is at

Michael Kuliasov2 years ago

@AnthropicAI Tried similar thing a bit earlier too

Andrej Karpathy2 years ago

@AnthropicAI Very cool!!

AshutoshShrivastava2 years ago

@AnthropicAI Andrej try out Gen-3.

Andrej Karpathy2 years ago

@AnthropicAI I'm trying! People seem to be getting really good results with it but I can't quite get that myself so far. It's kind of ignoring my instructions and generating videos that look way too modern, or just wrong or unrelated. I'll keep trying because the consistency is really great.

Dave Lee2 years ago

@AnthropicAI Yeah it's a big opportunity.

Related Videos

August 1, 2024: The Music Video Fun hack just stitching up gen AI tools :), in this case to create a music video for today. - copy paste the entire WSJ front page into Claude - ask it to generate multiple scenes and give visual descriptions for them - copy paste scene descriptions into image generator (Ideogram here) - copy paste generated images into Runway Gen 3 Alpha to make each image into a 10-second video - ask Claude to generate lyrics that depict that day - copy paste lyrics into Suno to generate music - stitch things up in iMovie :D :D :D

August 1, 2024: The Music Video Fun hack just stitching up gen AI tools :), in this case to create a music video for today. - copy paste the entire WSJ front page into Claude - ask it to generate multiple scenes and give visual descriptions for them - copy paste scene descriptions into image generator (Ideogram here) - copy paste generated images into Runway Gen 3 Alpha to make each image into a 10-second video - ask Claude to generate lyrics that depict that day - copy paste lyrics into Suno to generate music - stitch things up in iMovie :D :D :D

Andrej Karpathy

416,211 views • 1 year ago

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

58,571 views • 9 months ago

Part 1/2 This one took me a while. I made the images in #Midjourney using a bunch of images for ref. I ran them through RUNWAY then used @lumaAI to create the transitions between each video. Music made with #Sunoai. #fyp #ai #aiart #aiimages #aivideo #aimusic #airt

Part 1/2 This one took me a while. I made the images in #Midjourney using a bunch of images for ref. I ran them through RUNWAY then used @lumaAI to create the transitions between each video. Music made with #Sunoai. #fyp #ai #aiart #aiimages #aivideo #aimusic #airt

Kelly Boesch🏳️‍🌈

17,728 views • 1 year ago

A student built a whole faceless passive business by creating AI backyards. I kept seeing these AI backyard builds hitting 50M views and thought it took months of editing, but I was wrong. The creators aren't starting with a dirt lot. They start with the perfect final image and make the AI work backward. And the secret is stupidly simple: generate the final picture first, and let the AI reverse-engineer it. Here is the exact 3-step system you can use to build this: - The Blueprint: Upload your finished backyard to the "Restoration Timelapse" GPT. It reverse-engineers the final image into text prompts for the empty lot. - The Setup: Paste those prompts into Dzine. This generates your "before" images with perfectly matched geometry. - The Animation: Upload your empty lot and finished yard into Kling 3.0 to animate the build. Once Kling spits out the video file, drop it into CapCut, keep the raw construction audio, and export. I broke down the complete, step-by-step architecture with GPT + GROK + CAPCUT in my full guide below 👇

A student built a whole faceless passive business by creating AI backyards. I kept seeing these AI backyard builds hitting 50M views and thought it took months of editing, but I was wrong. The creators aren't starting with a dirt lot. They start with the perfect final image and make the AI work backward. And the secret is stupidly simple: generate the final picture first, and let the AI reverse-engineer it. Here is the exact 3-step system you can use to build this: - The Blueprint: Upload your finished backyard to the "Restoration Timelapse" GPT. It reverse-engineers the final image into text prompts for the empty lot. - The Setup: Paste those prompts into Dzine. This generates your "before" images with perfectly matched geometry. - The Animation: Upload your empty lot and finished yard into Kling 3.0 to animate the build. Once Kling spits out the video file, drop it into CapCut, keep the raw construction audio, and export. I broke down the complete, step-by-step architecture with GPT + GROK + CAPCUT in my full guide below 👇

Spivach

17,421 views • 26 days ago

I created a 3D Gaussian Splat of my kitchen using 1/3rd the images it used to take me by using NVIDIA AI Developer's 3DGRUT! I can now use 180 degree fisheye images and ray tracing to make detailed splats. The only reason the scene isn't sharper is because my input images weren't super sharp - when I took the images back in October, I was still learning to use the lens. I plan to make a "first reactions/overview video". Tutorial after that. For reference, this took 206 images and the ultrawide on my iPhone took 608 images to capture. #3D #AEC #Computervision

I created a 3D Gaussian Splat of my kitchen using 1/3rd the images it used to take me by using NVIDIA AI Developer's 3DGRUT! I can now use 180 degree fisheye images and ray tracing to make detailed splats. The only reason the scene isn't sharper is because my input images weren't super sharp - when I took the images back in October, I was still learning to use the lens. I plan to make a "first reactions/overview video". Tutorial after that. For reference, this took 206 images and the ultrawide on my iPhone took 608 images to capture. #3D #AEC #Computervision

Jonathan Stephens

30,646 views • 1 year ago

UI/UX Designers, this might be one of the best sites for discovering trending AI prompts from X. Meigen gives you access to the hottest prompt posts weekly, including image and video generation prompts, all curated in one place for easy access and inspiration. You can even generate illustration images with one click and bring them into Figma for vectorization. Bookmark it for later 💜

UI/UX Designers, this might be one of the best sites for discovering trending AI prompts from X. Meigen gives you access to the hottest prompt posts weekly, including image and video generation prompts, all curated in one place for easy access and inspiration. You can even generate illustration images with one click and bring them into Figma for vectorization. Bookmark it for later 💜

Abraham John 🦄🦓

15,835 views • 2 months ago

This is an AI video. Made with Luma Luma Ray3. For the past couple of weeks, I've played with it, and I tried to give it the most challenging, but also very natural scenes. The result is unbelievable. Take a moment to realize that this is an AI video in 2025:

This is an AI video. Made with Luma Luma Ray3. For the past couple of weeks, I've played with it, and I tried to give it the most challenging, but also very natural scenes. The result is unbelievable. Take a moment to realize that this is an AI video in 2025:

Alex Patrascu

98,526 views • 9 months ago

This is basically what it took to bring this video to life. Below are the prompts I used for character bible, shot list, camera direction, notes etc. Just copy and paste it into your favorite LLM and ask it to replace it with your character and world. 🌸

This is basically what it took to bring this video to life. Below are the prompts I used for character bible, shot list, camera direction, notes etc. Just copy and paste it into your favorite LLM and ask it to replace it with your character and world. 🌸

Glitter Gal

115,809 views • 2 months ago

This is Pika 2.1 🤯 This is serendipitous as Pika was one of the first Gen AI video tools I ever used. The first time I generated a video on Discord, my life changed forever. And now they have released a game-changing new model with fantastic fidelity—native 1080p. We have arrived at the future of storytelling. And I'm glad they are part of our journey. Cheers.

This is Pika 2.1 🤯 This is serendipitous as Pika was one of the first Gen AI video tools I ever used. The first time I generated a video on Discord, my life changed forever. And now they have released a game-changing new model with fantastic fidelity—native 1080p. We have arrived at the future of storytelling. And I'm glad they are part of our journey. Cheers.

Dave Clark

22,341 views • 1 year ago

It took me a bit, but I was able to dub and edit the first chapter of this comic dub! I honestly love the story and the artwork is just stellar! Thanks to ⛧ 𝙹𝚎𝚗 ( Comms- CLOSED ) for the go ahead! Special thanks to Katabelle Ansari | Kansas City 🏙️ and Ronald S. for their assistance with this dub!

It took me a bit, but I was able to dub and edit the first chapter of this comic dub! I honestly love the story and the artwork is just stellar! Thanks to ⛧ 𝙹𝚎𝚗 ( Comms- CLOSED ) for the go ahead! Special thanks to Katabelle Ansari | Kansas City 🏙️ and Ronald S. for their assistance with this dub!

Zack Trevle {VTuber and Script Writer} 🔞

49,276 views • 8 months ago

I made this in a couple of hours this evening just playing with some techniques and styles. I didn't want to start anything big as I know a few big new things are dropping soon. This was made with Dreamina AI Seedream (all images built from the first frame of the second shot), Kling AI 2.5 turbo and Freepik, music with the one and only Suno. Night AI peeps! A little game of cat and mouse but who is the mouse?

I made this in a couple of hours this evening just playing with some techniques and styles. I didn't want to start anything big as I know a few big new things are dropping soon. This was made with Dreamina AI Seedream (all images built from the first frame of the second shot), Kling AI 2.5 turbo and Freepik, music with the one and only Suno. Night AI peeps! A little game of cat and mouse but who is the mouse?

Uncanny Harry AI

22,432 views • 7 months ago

Just a trippy dream of how Grok and I traversed the universe to understand it. All images to create the video were generated by @Grok, then I face swapped and animated them with Luma and Sora. This is also an original song, "Across the Universe," I made with Suno.

Just a trippy dream of how Grok and I traversed the universe to understand it. All images to create the video were generated by @Grok, then I face swapped and animated them with Luma and Sora. This is also an original song, "Across the Universe," I made with Suno.

Katia Karpenko

8,246,101 views • 1 year ago

this entire video is 100% AI generated and we fully automated the process it took me one message to Claude and 3 images from Nano Banana to generate a full edited video in just a few minutes - total cost was only ~$10 if you’re not using AI to get fk u rich rn you’re running out of time best time to be alive ngl

this entire video is 100% AI generated and we fully automated the process it took me one message to Claude and 3 images from Nano Banana to generate a full edited video in just a few minutes - total cost was only ~$10 if you’re not using AI to get fk u rich rn you’re running out of time best time to be alive ngl

MAX

38,413 views • 3 months ago

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

Max Zhaoshuo Li 李赵硕

1,077,546 views • 1 month ago

The first fully AI-generated movie just went viral in Cannes. The film is called Hell Grind. According to reports, it was made using Higgsfield AI in around 14 days, cost about $500,000 and roughly $400,000 of that went to compute. For context, a traditional Hollywood feature can take 1-3 years and cost tens of millions to produce. This one was created by a team of about 15 directors, cinematographers, and editors using AI tools like Higgsfield’s Soul Cinema, Seedance 2.0 and other video models. The craziest part? For just the first 25 minutes, CineD reported they generated 16,181 clips to get 253 final shots. That’s roughly 64 attempts for every shot. Some prompts were reportedly around 3,000 words each, just to get 15 seconds of usable footage. So yes, AI is getting faster. But the real takeaway is not “humans are useless now” It’s actually the opposite The people who understand storytelling, directing, editing, taste and tools are about to get insane leverage.

The first fully AI-generated movie just went viral in Cannes. The film is called Hell Grind. According to reports, it was made using Higgsfield AI in around 14 days, cost about $500,000 and roughly $400,000 of that went to compute. For context, a traditional Hollywood feature can take 1-3 years and cost tens of millions to produce. This one was created by a team of about 15 directors, cinematographers, and editors using AI tools like Higgsfield’s Soul Cinema, Seedance 2.0 and other video models. The craziest part? For just the first 25 minutes, CineD reported they generated 16,181 clips to get 253 final shots. That’s roughly 64 attempts for every shot. Some prompts were reportedly around 3,000 words each, just to get 15 seconds of usable footage. So yes, AI is getting faster. But the real takeaway is not “humans are useless now” It’s actually the opposite The people who understand storytelling, directing, editing, taste and tools are about to get insane leverage.

Jade 💋

152,520 views • 1 month ago

✨ Added Gemini 2's AI new experimental edit functionality to Photo AI! It's the state of the art model for editing photos with prompts I think It's just one button in Photo AI called [ 📝 AI Edit ] then you type a prompt in a JS prompt() window and just a few seconds later your edit is done Here I took an AI photo of myself, and then added 1000s of puppies to it with [ 📝 AI Edit ] and then turned it into a video with [ 🎞️ Make video ] Gemini 2 isn't perfect btw, the quality of faces reduces a bit with every edit. A short term fix would pressing Remix and then getting the resemblance back.

✨ Added Gemini 2's AI new experimental edit functionality to Photo AI! It's the state of the art model for editing photos with prompts I think It's just one button in Photo AI called [ 📝 AI Edit ] then you type a prompt in a JS prompt() window and just a few seconds later your edit is done Here I took an AI photo of myself, and then added 1000s of puppies to it with [ 📝 AI Edit ] and then turned it into a video with [ 🎞️ Make video ] Gemini 2 isn't perfect btw, the quality of faces reduces a bit with every edit. A short term fix would pressing Remix and then getting the resemblance back.

@levelsio

653,195 views • 1 year ago

Responding to popular demand, here is the first #BottomLine update from Israel. This one is about how Israel took the Shifa Hospital for the second time, what Israel Defense Forces found there and why it matters for the future of the fighting.

Responding to popular demand, here is the first #BottomLine update from Israel. This one is about how Israel took the Shifa Hospital for the second time, what Israel Defense Forces found there and why it matters for the future of the fighting.

Jonathan Conricus

97,086 views • 2 years ago