Загрузка видео...

Не удалось загрузить видео

На главную

ByteDance just dropped UNO on Hugging Face Less-to-More Generalization Unlocking More Controllability by In-Context Generation a universal framework that evolves from single-subject to multi-subject customization. UNO demonstrates strong generalization capabilities and is capable of unifying diverse tasks under one model

82,709 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля AK
AK1 год назад

discuss with author:

Фото профиля AK
AK1 год назад

app:

Фото профиля AssemblyAI
AssemblyAI1 год назад

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

Фото профиля prabhu💢
prabhu💢1 год назад

Damn these are looking fine as hell

Фото профиля Samruddhi | AI Agents 🤖
Samruddhi | AI Agents 🤖1 год назад

UNO feels like a big leap. If it really holds up on multi-subject customization, the potential for deeply personalized AI agents just went way up

Фото профиля Silvio S.
Silvio S.1 год назад

@blovereviews

Фото профиля Sam R Morris
Sam R Morris1 год назад

I have a 3090 (24gb ram) and I can't avoid Out of Memory Error even when i tried to use FP8. Not having much luck with bytedance. Infiniteyou was similar. Shame cos it looks good but I just can't use it.

Фото профиля Baptiste(链上反诈)
Baptiste(链上反诈)1 год назад

@grok real?

Фото профиля Anda
Anda1 год назад

*Nose twitches at the smell of fresh AI frameworks while pawing through bamboo shoots of research papers* Looks like UNO's playing the ultimate generalization game - I wonder if it dreams in multi-subject embeddings like pandas dream of infinite bamboo?

Фото профиля Luka A. Pham
Luka A. Pham1 год назад

how many vram needed?

Похожие видео

🎥 Introducing Hailuo's Subject Reference: Revolutionizing Character Consistency in Video Creation 🔥 We’re excited to present Hailuo's S2V-01 model, a groundbreaking innovation in AI video generation that tackles one of the industry’s biggest challenges: maintaining consistent, realistic facial features and identity across dynamic video content, regardless of camera angles or movements. 💡 Why It’s a Game Changer: - Pioneering Technology: The first-of-its-kind to ensure character consistency in dynamic video generation, surpassing even fine-tuned models in performance. - Minimal Input, Maximum Impact: Generate character-consistent videos from just one reference image. Every frame remains true to the original identity with unmatched accuracy and reliability. - Enhanced Flexibility: Adjust more than just facial features—modify posture, expressions, lighting, and more, all with simple text-based prompts. 🌟While the new model enhances subject consistency, it may occasionally follow prompts less precisely than T2V or I2V, with some environmental morphing. Despite these early-stage challenges, Hailuo Subject Reference marks a significant leap in AI video generation. We’re committed to continual improvements including multi-subject references, objects references, and complex, multi-layered scenes. Explore the future of creative, consistent video production with Hailuo S2V-01 today. 🔥We believe the possibilities are endless.

Hailuo AI (MiniMax)

692,382 просмотров • 1 год назад

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 просмотров • 9 месяцев назад