
Tongyi Lab
@Ali_TongyiLab • 23,939 subscribers
We advance the development of ASI and foster open source collaboration towards a smarter future. Discord: https://t.co/BtsFsAUsvT
Videos

How to create digital fireworks on your computer? Choose Qwen3.7-Max and try it out!
Tongyi Lab2,709,095 görüntüleme • 11 gün önce

1/6 Introducing Qwen3.5-LiveTranslate: Next-gen real-time interpretation is here. 🌍 We’re breaking down language barriers with 3,500+ language pairs, ultra-low latency, visual context, real-time voice cloning, and hotword customization. Engineered to help you ship native, frictionless real-time translation experiences to a global audience.
Tongyi Lab4,110,247 görüntüleme • 16 gün önce

We have officially launched Fun-ASR1.5, a major update to our end-to-end speech recognition model. This release focuses on three core pillars: broader language coverage, language switching, and production-ready text output. Key Features: • Multilingual Support: Supports high-accuracy recognition for 30 languages across Asia, Europe, and the Middle East within a single model. • Language Switching: Handles mixed-language speech (Code-Switching) natively, automatically detecting and transcribing language shifts without the need for manual tagging. • Professional Text Output: Delivers "ready-to-use" text with smart punctuation and automatic formatting for dates, numbers, and currencies. Fun-ASR1.5 bridges the gap between raw audio and professional documentation, providing a reliable engine for global communication.
Tongyi Lab3,917,307 görüntüleme • 1 ay önce

🚀 Qwen-Image-2512: Finer Details, Greater Realism We are thrilled to announce the Qwen-Image-2512 open-source release! This December update pushes the boundaries of our text-to-image foundational model, moving from "AI-generated" looks to true photorealism. What makes 2512 exceptional? · Enhanced Human Realism : We’ve eliminated the artificial "AI look" by capturing intricate facial details—like wrinkles and pores—and ensuring better adherence to body postures. · Finer Natural Detail : Experience notably more detailed rendering of landscapes, misty waterfalls, and animal fur with distinct, individual strands. · Advanced Text Rendering : Achieve professional-grade layout for complex infographics and PPT slides with unprecedented textual accuracy.
Tongyi Lab1,688,246 görüntüleme • 5 ay önce

Found a hardcore LoRA workflow! Generate a PLY point cloud --> Adjust angle in editor -->Refine with Qwen-Image-Edit-2511 Gaussian Splash LoRA. It accurately restores complex perspective shifts. As shown in the demo, it handles 3D rotation and can even restore high-def details from blurry close-ups. Within a 45° range, the consistency is unmatched. Huge thanks to 大雄 for this contribution! 🫡
Tongyi Lab1,493,312 görüntüleme • 4 ay önce

We released Qwen3-Omni-Flash (2025-12-01 version) API Service. Smarter interaction, more human expression: · A/V Interaction: Significant boost in instruction following. Solves "dumbing down" in casual chats with rock-solid stability. · Precise Control: Enhanced System Prompt adherence for specific personas, styles, and lengths. · Multilingual Mastery: Solved language switching instability. Now supports 119 text languages, 19 for speech understanding, and 10 for speech generation. · Human-like Speech: Adaptive speed and prosody. No more drag—sounds just like a real person.
Tongyi Lab1,636,707 görüntüleme • 5 ay önce

1/4 We’re releasing MAI-UI—a family of foundation GUI agents. It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing state-of-the-art results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld. To meet real-world deployment constrains, MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants. We are publicly releasing two models: MAI-UI-2B and MAI-UI-8B.
Tongyi Lab855,581 görüntüleme • 5 ay önce

Complex instruction following is critical for LLM agents and applications. IOPO with notable improvements is proposed to consider both input and output preference pairs , not only aligning with response preferences but also meticulously exploring the instruction preferences.
Tongyi Lab1,762,791 görüntüleme • 10 ay önce

Curious about Wan2.5-Preview? Here's everything you need to know in 2 minutes! ⏱️ Wan2.5-Preview natively supports audio-visual synchronization, with massive upgrades to video, image generation, and editing for commercial-grade content. Watch our video below to see it all in action!
Tongyi Lab1,072,819 görüntüleme • 8 ay önce

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!
Tongyi Lab1,014,717 görüntüleme • 8 ay önce

1/4 How to get the most from a few high‑quality, trusted examples—not by piling up data? Tongyi Lab shifts from data‑centric to sample‑centric and presents LPPO, a Progressive Optimization RL framework to break reasoning bottlenecks: master each problem, not just add more.
Tongyi Lab1,079,954 görüntüleme • 9 ay önce

We introduce HumanOmniV2, an omni-modal model designed to address two core problems in multimodal reasoning: insufficient global context understanding and the shortcut problem. By analyzing visual, auditory, and textual signals, the model performs deep reasoning on complex human intentions, emotions, and social interactions.
Tongyi Lab1,221,672 görüntüleme • 10 ay önce

1/4Introducing Qwen3-Coder-Next: Our latest open-weights model designed specifically to power the next generation of autonomous Coding Agents. Built on Qwen3-Next, this model is engineered to handle complex, long-horizon programming tasks with unprecedented efficiency. High-performance agentic intelligence is now in your hands.
Tongyi Lab212,645 görüntüleme • 4 ay önce

Your AI Voice Partner: Smart, Empathetic & Useful Open-sourced now!!! Introducing Fun-Audio-Chat — a new end-to-end voice model, more than just chat. · An empathetic companion that understands tone and emotion. · A productivity helper that follows voice commands to get things done. · Leader in multiple benchmarks (OpenAudioBench, MMAU, etc.). · End-to-end S2S architecture — lower latency, higher efficiency. · Dual-resolution design — reduces GPU cost by ~50% · Supports voice function calling — just speak to complete tasks.
Tongyi Lab256,132 görüntüleme • 5 ay önce

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.
Tongyi Lab164,097 görüntüleme • 3 ay önce

Introducing Tongyi Fun – Alibaba’s enterprise-grade audio foundation model. Powered by Fun-ASR and Fun-CosyVoice it advances voice AI beyond “hearing and speaking” to truly understand context, transcribe with high accuracy, and speak with natural expressiveness—even in complex enterprise environments. ✅ Understands deeply: Trained on tens of millions of hours of real-world audio, with industry-specific terminology across finance, tech, manufacturing, and more. ✅ Transcribes accurately: A context-enhanced architecture with RAG minimizes errors, hallucinations, and cross-language interference. ✅ Speaks expressively: Delivers natural, stable, multilingual speech synthesis with cross-lingual voice cloning.
Tongyi Lab280,542 görüntüleme • 8 ay önce

If you missed the live broadcast, here is everything you should know about Wan2.2🤩
Tongyi Lab86,842 görüntüleme • 10 ay önce