正在加载视频...

视频加载失败

I'm playing around with generative AI tools and stitching them together into visual stories. Here I took the first few sentences of Pride and Prejudice and made it into a video. The gen stack used for this one: - Anthropic Claude took the first chapter, generated the scenes and...

608,574 次观看 • 2 年前 •via X (Twitter)

10 条评论

Andrej Karpathy 的头像
Andrej Karpathy1 年前

Tried Runway Gen-3 now that they support image prompting. A lot better results on this scene. Dam this is fun. Now if I just tweak the prompt a little more and roll the dice again...

near 的头像
near2 年前

@AnthropicAI and just like that, an entire new genre of youtube videos was created from a single tweet

illusion diffusion 的头像
illusion diffusion2 年前

@AnthropicAI 1. stable diffusion checkpoint/lora from @HelloCivitaior or @midjourney 2. @runwayml gen 3 or @KlingAIOfficial for image-to-vid 3. @elevenlabsio for voiceover/soundfx/folly 4. @sudo_ai or @udiomusic for bg music 5. non ai/premiere text overlay (no logo)

Andrej Karpathy 的头像
Andrej Karpathy2 年前

@AnthropicAI @midjourney @runwayml @KlingAIOfficial @elevenlabsio @sudo_ai @udiomusic doh I totally forgot background music fail 🤦‍♂️

Felix Wang 的头像
Felix Wang2 年前

@AnthropicAI Hey Andrej! I'm working on this exact problem: building an AI-native tool that integrates various models (LLMs, image, video, audio models) to help folks tell stories. Here's a video I made (the tool is at

Michael Kuliasov 的头像
Michael Kuliasov2 年前

@AnthropicAI Tried similar thing a bit earlier too

Andrej Karpathy 的头像
Andrej Karpathy2 年前

@AnthropicAI Very cool!!

AshutoshShrivastava 的头像
AshutoshShrivastava2 年前

@AnthropicAI Andrej try out Gen-3.

Andrej Karpathy 的头像
Andrej Karpathy2 年前

@AnthropicAI I'm trying! People seem to be getting really good results with it but I can't quite get that myself so far. It's kind of ignoring my instructions and generating videos that look way too modern, or just wrong or unrelated. I'll keep trying because the consistency is really great.

Dave Lee 的头像
Dave Lee2 年前

@AnthropicAI Yeah it's a big opportunity.

相关视频

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

58,571 次观看 • 9 个月前

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

Max Zhaoshuo Li 李赵硕

1,077,248 次观看 • 29 天前