正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Introducing SDXL Turbo: A real-time text-to-image generation model. SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one. The code, research paper, and weights for non-commercial use are now available on our... show more

Stability AI

257,940 subscribers

976,339 次观看 • 2 年前 •via X (Twitter)

艺术教育科学技术

Anya Rossi• Live Now

Private livecam show

10 条评论

Tom Osman 🐦‍⬛ 的头像

Tom Osman 🐦‍⬛2 年前

Can you let us catch our breath for 1 minute at least

Simon C 的头像

Simon C2 年前

The world is ridiculous right now. God. Real time images updating as I type. Amazing work @StabilityAI team. As usual.

Walgtech 👨🏻‍💻 的头像

Walgtech 👨🏻‍💻2 年前

First run 🤯😅

Smoke-away 的头像

Smoke-away2 年前

SDXL Turbo 🔥 Now I just need DALL-E Turbo @ChatGPTapp It's the Stable Diffusion/DALL-E 2 release cycle all over again 💯

Stable Diffusion 🎨 AI Art 的头像

Stable Diffusion 🎨 AI Art2 年前

👀

無限 💀 的头像

無限 💀2 年前

LFG!!! Stability killing it out here :)

Fabella 的头像

Fabella2 年前

@replicate make it real.

Burkay Gur 的头像

Burkay Gur2 年前

Just gonna drop a playground here:

Cavit Erginsoy 的头像

Cavit Erginsoy2 年前

Lol are you trolling OAI with that naming?

s3nh 的头像

s3nh2 年前

Pls I have a hangover from img2vid let me rest

相关视频

Step into the future of AI image generation with Qwen-Image! From superior text rendering to consistent image editing across multiple languages, it sets a new benchmark! 💡What will you create with Qwen-Image?

Step into the future of AI image generation with Qwen-Image! From superior text rendering to consistent image editing across multiple languages, it sets a new benchmark! 💡What will you create with Qwen-Image?

Alibaba Group

203,475 次观看 • 10 个月前

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Google

1,897,025 次观看 • 7 个月前

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

AK

16,062 次观看 • 1 年前

Google presents MobileDiffusion Subsecond Text-to-Image Generation on Mobile Devices paper page: MobileDiffusion achieves a remarkable sub-second inference speed for generating a 512 × 512 image on mobile devices, establishing a new state of the art.

Google presents MobileDiffusion Subsecond Text-to-Image Generation on Mobile Devices paper page: MobileDiffusion achieves a remarkable sub-second inference speed for generating a 512 × 512 image on mobile devices, establishing a new state of the art.

AK

150,538 次观看 • 2 年前

Introducing Stable Diffusion XL beta! Our latest image generation model, now available through our API, excels at photorealism & adds many cool features like enhanced face generation, minimal prompts & legible text. Try out #SDXL in DreamStudio now:

Introducing Stable Diffusion XL beta! Our latest image generation model, now available through our API, excels at photorealism & adds many cool features like enhanced face generation, minimal prompts & legible text. Try out #SDXL in DreamStudio now:

Stability AI

154,812 次观看 • 3 年前

Krea 2 Open-Source is now available on fal The most aesthetic open-source image model, released with open weights Turbo delivers fast, polished text-to-image at low step counts RAW serves as the malleable base checkpoint for LoRA training

Krea 2 Open-Source is now available on fal The most aesthetic open-source image model, released with open weights Turbo delivers fast, polished text-to-image at low step counts RAW serves as the malleable base checkpoint for LoRA training

fal

46,395 次观看 • 9 天前

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

Jun-Yan Zhu

36,488 次观看 • 2 年前

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 次观看 • 9 个月前

Nvidia presents ConsiStory Training-Free Consistent Text-to-Image Generation paper page: enable Stable Diffusion XL (SDXL) to generate consistent subjects across a series of images, without additional training.

Nvidia presents ConsiStory Training-Free Consistent Text-to-Image Generation paper page: enable Stable Diffusion XL (SDXL) to generate consistent subjects across a series of images, without additional training.

AK

161,685 次观看 • 2 年前

The quality, cost, and control you can achieve for upscaling + fixing plastic AI skin with open source models still amazes me... Models used: → Z-image-turbo for image gen (~3s) → SDXL + Lora for skin texture (~15s) → SeedVR2 for upscaling (~40s)

The quality, cost, and control you can achieve for upscaling + fixing plastic AI skin with open source models still amazes me... Models used: → Z-image-turbo for image gen (~3s) → SDXL + Lora for skin texture (~15s) → SeedVR2 for upscaling (~40s)

rob - comfyui

41,252 次观看 • 4 个月前

Introducing Ideogram 1.0: the most advanced text-to-image model, now available on This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting.

Introducing Ideogram 1.0: the most advanced text-to-image model, now available on This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting.

Ideogram

255,979 次观看 • 2 年前

Today we launched Nano Banana Pro (Gemini 3 Pro Image). This state-of-the-art image generation and editing model turns your vision into functional reality with unprecedented control, improved text rendering and factual accuracy.

Today we launched Nano Banana Pro (Gemini 3 Pro Image). This state-of-the-art image generation and editing model turns your vision into functional reality with unprecedented control, improved text rendering and factual accuracy.

News from Google

215,740 次观看 • 7 个月前

Introducing StyleDrop, a model that allows a significantly higher level of stylized text-to-image synthesis by using a few style reference images that describe the style for text-to-image generation, bypassing the burden of text prompt engineering. More→

Introducing StyleDrop, a model that allows a significantly higher level of stylized text-to-image synthesis by using a few style reference images that describe the style for text-to-image generation, bypassing the burden of text prompt engineering. More→

Google AI

80,357 次观看 • 2 年前

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

Lovable

35,841 次观看 • 1 年前

UI design is underrated for AI. Sometimes they go hand in hand with new model capabilities in delightful ways. Here's a very clever demo: you can use rich text (bold, font style, color) to assign different types of weights to each token, making image generation controllable in a highly intuitive & flexible fashion. Paper: Expressive Text-to-Image Generation with Rich Text HuggingFace demo: Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

UI design is underrated for AI. Sometimes they go hand in hand with new model capabilities in delightful ways. Here's a very clever demo: you can use rich text (bold, font style, color) to assign different types of weights to each token, making image generation controllable in a highly intuitive & flexible fashion. Paper: Expressive Text-to-Image Generation with Rich Text HuggingFace demo: Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

Jim Fan

146,851 次观看 • 2 年前

We trained a new version of Gen-3 Alpha, Turbo, that can generate videos 7x faster than the original Gen-3 Alpha, while matching its performance on many use cases. We’ll be rolling out Turbo for Image to Video with significantly lower pricing over the coming days while also making it available to free users. Gen-3 Alpha Turbo redefines the efficiency frontier for high-fidelity video generation, unlocking many new possibilities of near real-time interactivity.

We trained a new version of Gen-3 Alpha, Turbo, that can generate videos 7x faster than the original Gen-3 Alpha, while matching its performance on many use cases. We’ll be rolling out Turbo for Image to Video with significantly lower pricing over the coming days while also making it available to free users. Gen-3 Alpha Turbo redefines the efficiency frontier for high-fidelity video generation, unlocking many new possibilities of near real-time interactivity.

Runway

179,415 次观看 • 1 年前

Introducing TurboMemeBot In collaboration with Turbo 🐸 team for the meme community, we’re delighted to launch AI-powered Image and GIF Generation Bot for Turbo, bringing a whole new way to create Memes and GIFs instantly! No design skills needed—just type your text and watch AI turn your words into viral Images and GIFs. Try it out now - Use /meme or /gif Turbo + Your Prompt]

Introducing TurboMemeBot In collaboration with Turbo 🐸 team for the meme community, we’re delighted to launch AI-powered Image and GIF Generation Bot for Turbo, bringing a whole new way to create Memes and GIFs instantly! No design skills needed—just type your text and watch AI turn your words into viral Images and GIFs. Try it out now - Use /meme or /gif Turbo + Your Prompt]

Kreaitor AI

16,460 次观看 • 1 年前

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

Tongyi Lab

164,097 次观看 • 4 个月前

Introducing the SpAItial API to help you build immersive 3D worlds programmatically. You can now create world from text, image url, uploaded image, and 360 panorama with code. Track generation status, download results, export meshes and more. ⬇️

Introducing the SpAItial API to help you build immersive 3D worlds programmatically. You can now create world from text, image url, uploaded image, and 360 panorama with code. Track generation status, download results, export meshes and more. ⬇️

SpAItial AI

43,034 次观看 • 1 个月前

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

Tencent Hy

412,616 次观看 • 9 个月前