Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

๐Ÿš€ Introducing Emu3.5 โ€” a large-scale multimodal world model that natively predicts the next vision-language state. ๐Ÿ”ฅ Trained on over 10T interleaved vision-language tokens and enhanced with reinforcement learning, Emu3.5 achieves powerful multimodal reasoning and generation. โšก Powered by our new Discrete Diffusion Adaptation (DiDA) for 20ร— faster inference....

51,683 Aufrufe โ€ข vor 7 Monaten โ€ขvia X (Twitter)

0 Kommentare

Keine Kommentare verfรผgbar

Kommentare vom Original-Post werden hier angezeigt

ร„hnliche Videos

Weโ€™re excited to announce the release and open-source of HunyuanImage 3.0 โ€” the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industryโ€™s flagship closed-source model.๐Ÿš€๐Ÿš€๐Ÿš€ HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: โœ…Reason with world knowledge โœ…Understand complex, thousand-word prompts โœ…Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0โ€™s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. ๐Ÿ‘‰๐ŸปTry it now: ๐Ÿ”—GitHub: ๐Ÿค—Hugging Face:

Tencent Hy

412,523 Aufrufe โ€ข vor 8 Monaten