Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or...

83,657 görüntüleme • 2 yıl önce •via X (Twitter)

6 Yorum

Boyi Li profil fotoğrafı
Boyi Li2 yıl önce

Thanks @_akhaliq for sharing our work!

zorr0 (ττ) profil fotoğrafı
zorr0 (ττ)2 yıl önce

@replytensor

haareblond profil fotoğrafı
haareblond2 yıl önce

cool but still feels hacky

Takomo AI profil fotoğrafı
Takomo AI2 yıl önce

That's great progress!

Cavit Erginsoy profil fotoğrafı
Cavit Erginsoy2 yıl önce

@yuliangxiu I saw this about a month ago and had played around with it, is the same or a parallel dev? Wish someone built an extension for A1111

VIJAY KUMAR REDDY BOMMIREDDY profil fotoğrafı
VIJAY KUMAR REDDY BOMMIREDDY2 yıl önce

Impressive work! Expanding the text-to-image domain with diffusion models showcases great potential. Looking forward to exploring the paper and GitHub repository. Keep up the great work! 👍

Benzer Videolar

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 görüntüleme • 9 ay önce