
ModelScope
@ModelScope2022 • 8,866 subscribers
Driving innovations with open communities. 💬 Join our Discord: https://t.co/4M9AHVh5qa
Videos

MOSS-TTS v1.5 is here, an upgrade to v1.0 from @OpenMOSS. (demo👇)🤖 Key improvements: ⏸️ Inline pause control: [pause 3.2s] now supported mid-sentence 🌍 31 languages, up from 20 — now includes Cantonese, Hindi, Thai, Vietnamese, Tagalog, Swahili and more 🎙️ More stable voice cloning with reduced variance across repeated generations 📝 Better long-reference, short-text cloning All v1.0 capabilities preserved: zero-shot cloning, long-form speech, Pinyin/IPA control, code-switching. 💻
ModelScope13,552 次观看 • 11 天前

PrismAudio is open source👏👏👏 a 518M V2A model accepted at ICLR 2026, achieving SOTA across all four perceptual dimensions on both VGGSound and the new AudioCanvas benchmark. 👀 Demo video below ⬇️ Model: Demo: Paper: GitHub: 🧠 Decomposes V2A reasoning into four specialized CoT modules: Semantic, Temporal, Aesthetic, and Spatial — each with targeted reward functions 🎯 First framework to integrate RL into V2A generation via decomposed CoT planning ⚡ Fast-GRPO: hybrid ODE-SDE sampling that dramatically reduces RL training overhead 🏆 VGGSound: tops all baselines on CLAP, DeSync, PQ, and subjective MOS scores — at 0.63s inference, faster than MMAudio (1.30s) and ThinkSound (1.07s) 🌍 AudioCanvas (out-of-domain): CLAP 0.52, MOS-Q 4.12, beats HunyuanVideo-Foley, MMAudio, ThinkSound 📊 AudioCanvas benchmark released: 300 single-event classes + 501 multi-event samples
ModelScope25,669 次观看 • 2 个月前

🤖 Introducing InternVLA-A1 — now fully open-sourced! Many VLA models follow instructions well in static scenes… but struggle in dynamic environments (conveyor belts, rotating platforms, multi-robot setups). Why? They see the present—but can’t imagine the future. InternVLA-A1 solution: unify perception, imagination, and action in one model: ✅ Scene understanding: Image + text → task parsing ✅ Task imagination: Predict future frames → reason about dynamics ✅ Guided control: Execute actions steered by visual foresight Powered by InternData-A1 - Large-scale high-quality simulated dataset, InternVLA-A1 stays robust under complex backgrounds, lighting, and distractions. 🔥 See it in action: 1️⃣ High-speed conveyor: track, predict, and stably grasp or flip packages 2️⃣ Rotating platform: task-aware recognition & precise pick-up of diverse items 📊 Outperforms π0 and Gr00t N1.5 on general manipulation benchmarks! ✨ Model, data, and code are all open! Models: Datasets: GitHub:
ModelScope38,016 次观看 • 5 个月前

Introducing LingBot-World: An open-source world simulator pushing the boundaries of video generation. 🚀 🌍 High-Fidelity: Realistic, scientific, & stylized. 🧠 Long-Term Memory: Minute-level consistency. ⚡ Real-Time: <1s latency at 16 FPS. 📜 Apache 2.0 Licensed. Model: Github:
ModelScope28,809 次观看 • 4 个月前

🚀 New on ModelScope: MiniMax M2.1 is open-source! ✅ SOTA in 8+ languages (Rust, Go, Java, C++, TS, Kotlin, Obj-C, JS) ✅ Full-stack Web & mobile dev: Android/iOS, 3D visuals, vibe coding that actually ships ✅ Smarter, faster, 30% fewer tokens — with lightning mode (M2.1-lightning) for high-TPS workflows ✅ Top-tier on SWE-bench, VIBE, and custom coding/review benchmarks ✅ Works flawlessly in Cursor, Cline, Droid, BlackBox, and more It’s not just “better code” — it’s AI-native development, end to end. 🔗 Model:
ModelScope16,939 次观看 • 5 个月前
没有更多内容可加载