
Meituan LongCat
@Meituan_LongCat • 5,720 subscribers
Meituan_LongCat
Shorts
Videos

Meet LongCat-Video-Avatar 1.5🐱—our upgraded, open-source digital human framework. Built for real production, not just short demos. What's New: 🔹 Upgraded Audio Encoder: Replaces Wav2Vec2 with Whisper-Large, yielding significantly smoother and more natural lip dynamics. 🔹 Production-Ready Stability: Achieves accurate lip-synchronization, full-body temporal stability, and robust long-video generation with strict identity consistency. 🔹 Stylized Domain Generalization: Robustly generalizes to anime, animals, and complex real-world conditions such as multi-person interactions and object handling. 🔹 Efficient 8-Step Inference: Advanced step distillation accelerates inference to 8 NFE, balancing cost-effective serving with exceptional visual fidelity. 📊 LongCat-Video-Avatar 1.5 performs strongly in realism, naturalness, and stability, outperforming leading open-source models and closed systems. 🐱 Avatar 1.5 framework is now open source: 🔗 Weights & Code: 🔗 HuggingFace: 🔗 Tech Report: 🔗 Project Page:
Meituan LongCat29,944 просмотров • 14 дней назад

🚀 Introducing LongCat-Flash-Thinking-2601 — A version built for deep and general agentic thinking. ✨ Highlights: 🤖 Top Tier Agent Capabilities 🔹 Performance: Top tier benchmark results (TIR / Agentic Search / Agentic Tool Use) ; superb generalization ability, outperforming Claude in complex, random tasks 🔹 Env Scaling: Multiple automaticly constructed high-quality environments; dense dependency graph 🔹 Multi-Env RL: Extended DORA (our RL infra), supporting large-scale multi-environment agentic training 🛡️ Real-World Robustness 🔹 Performance: Solid performance in messy, uncertain scenarios (Vita-Noise & Tau^2-Noise) 🔹 Noise Analysis: Systematically analyzed real-world noise in agentic scenarios 🔹 Curriculum RL: Increasing noise type & intensity while training 🎯 Heavy Thinking Mode 🔹 Parallel Thinking: Expands breadth via multiple independent reasoning tracks 🔹 Iterative Summarization: Enhances depth by using a summary model to synthesize outputs, supporting iterative reasoning loops 📅 One more thing: 1M-token context via Zigzag Attention is coming soon. 🔍 Try it now: ✅ API access for this version is also available. Hugging Face: GitHub:
Meituan LongCat82,449 просмотров • 4 месяцев назад

🚀 LongCat-Video Now Open-Source: Text/Image-to-Video + Video Continuation in One Model 🏆 Text/Image-to-Video Performance Hits Open-Source SOTA 🎬 Minutes-Long High-Quality Videos: No Color Drift/Quality Loss (Industry-Standout) ⚙ 13.6B Params | Strong Open-Source DiT-Based Unified Multitask Video Base Model ⚡ C2F Pipeline + Block Sparse Attention: 720p/30fps Video in Minutes 🤗 Open-Source Links: GitHub: Hugging Face: Project Page:
Meituan LongCat43,711 просмотров • 7 месяцев назад

Meet LongCat-Video-Avatar: a robust audio-driven avatar model that pushes the boundaries of long-form video generation. Compared with the previous InfiniteTalk, LongCat-Video-Avatar delivers far better long-sequence stability and realism. New highlights: ⚙ Built on the LongCat-Video architecture, now supporting Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), and Video Continuation modes. 🎭 Open-source SOTA Realism: Ranked 1st in overall anthropomorphism scores for both single and multi-subject scenarios in EvalTalker evaluations (492 participants, 3 independent raters per video). ♾ High-quality long videos: Cross-Chunk Latent Stitching prevents pixel degradation and error accumulation over time, ensuring seamless stitching quality. 🔒 Long-term consistency: Reference Skip Attention maintains ID consistency while eliminating rigid copy-paste artifacts. 🪄 Supports multi-person and infinite-length video generation. 🔗Open-sourced Code: Hugging Face: Project: Paper:
Meituan LongCat27,969 просмотров • 5 месяцев назад
Больше нет контента для загрузки