Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or...

83,657 Aufrufe • vor 2 Jahren •via X (Twitter)

6 Kommentare

Profilbild von Boyi Li
Boyi Livor 2 Jahren

Thanks @_akhaliq for sharing our work!

Profilbild von zorr0 (ττ)
zorr0 (ττ)vor 2 Jahren

@replytensor

Profilbild von haareblond
haareblondvor 2 Jahren

cool but still feels hacky

Profilbild von Takomo AI
Takomo AIvor 2 Jahren

That's great progress!

Profilbild von Cavit Erginsoy
Cavit Erginsoyvor 2 Jahren

@yuliangxiu I saw this about a month ago and had played around with it, is the same or a parallel dev? Wish someone built an extension for A1111

Profilbild von VIJAY KUMAR REDDY BOMMIREDDY
VIJAY KUMAR REDDY BOMMIREDDYvor 2 Jahren

Impressive work! Expanding the text-to-image domain with diffusion models showcases great potential. Looking forward to exploring the paper and GitHub repository. Keep up the great work! 👍

Ähnliche Videos

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 Aufrufe • vor 9 Monaten