Tongyi Lab's banner

Tongyi Lab

@Ali_TongyiLab • 25,636 subscribers

We advance the development of ASI and foster open source collaboration towards a smarter future. Discord: https://t.co/BtsFsAUsvT

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Part 2 of our AI Buzzword series is officially LIVE! 🎬 This episode explains exactly how AI evolves from a "chatbot" into a "digital worker" Hit that follow button to unlock more simple AI breakdowns and tech insights!

Part 2 of our AI Buzzword series is officially LIVE! 🎬 This episode explains exactly how AI evolves from a "chatbot" into a "digital worker" Hit that follow button to unlock more simple AI breakdowns and tech insights!

10,549,127 görüntüleme • 1 ay önce

AI excels in the virtual world, but acting in the physical world is a whole different challenge. Why is it so hard to get a robot to fetch an egg without freezing? 🤔 Check out our new explainer video on Embodied Intelligence. We discuss the gap between robot "Thinking" and "Acting,"

AI excels in the virtual world, but acting in the physical world is a whole different challenge. Why is it so hard to get a robot to fetch an egg without freezing? 🤔 Check out our new explainer video on Embodied Intelligence. We discuss the gap between robot "Thinking" and "Acting,"

5,630,641 görüntüleme • 28 gün önce

AI buzzwords are hitting us wave after wave... 🌊 LLM, Token, Prompt, RAG, Agent, Harness... Where did they come from, and what real-world problems do they actually solve? We are breaking them all down in a 2-part video series. Here is Part 1!

AI buzzwords are hitting us wave after wave... 🌊 LLM, Token, Prompt, RAG, Agent, Harness... Where did they come from, and what real-world problems do they actually solve? We are breaking them all down in a 2-part video series. Here is Part 1!

9,085,200 görüntüleme • 1 ay önce

Agents should never be a black box. With AgentScope 2.0, we are pushing transparency into the system level. Check out what’s new👇

Agents should never be a black box. With AgentScope 2.0, we are pushing transparency into the system level. Check out what’s new👇

4,106,368 görüntüleme • 1 ay önce

1/6 Introducing Qwen3.5-LiveTranslate: Next-gen real-time interpretation is here. 🌍 We’re breaking down language barriers with 3,500+ language pairs, ultra-low latency, visual context, real-time voice cloning, and hotword customization. Engineered to help you ship native, frictionless real-time translation experiences to a global audience.

1/6 Introducing Qwen3.5-LiveTranslate: Next-gen real-time interpretation is here. 🌍 We’re breaking down language barriers with 3,500+ language pairs, ultra-low latency, visual context, real-time voice cloning, and hotword customization. Engineered to help you ship native, frictionless real-time translation experiences to a global audience.

4,112,581 görüntüleme • 2 ay önce

How to create digital fireworks on your computer? Choose Qwen3.7-Max and try it out!

How to create digital fireworks on your computer? Choose Qwen3.7-Max and try it out!

2,709,095 görüntüleme • 1 ay önce

We have officially launched Fun-ASR1.5, a major update to our end-to-end speech recognition model. This release focuses on three core pillars: broader language coverage, language switching, and production-ready text output. Key Features: • Multilingual Support: Supports high-accuracy recognition for 30 languages across Asia, Europe, and the Middle East within a single model. • Language Switching: Handles mixed-language speech (Code-Switching) natively, automatically detecting and transcribing language shifts without the need for manual tagging. • Professional Text Output: Delivers "ready-to-use" text with smart punctuation and automatic formatting for dates, numbers, and currencies. Fun-ASR1.5 bridges the gap between raw audio and professional documentation, providing a reliable engine for global communication.

We have officially launched Fun-ASR1.5, a major update to our end-to-end speech recognition model. This release focuses on three core pillars: broader language coverage, language switching, and production-ready text output. Key Features: • Multilingual Support: Supports high-accuracy recognition for 30 languages across Asia, Europe, and the Middle East within a single model. • Language Switching: Handles mixed-language speech (Code-Switching) natively, automatically detecting and transcribing language shifts without the need for manual tagging. • Professional Text Output: Delivers "ready-to-use" text with smart punctuation and automatic formatting for dates, numbers, and currencies. Fun-ASR1.5 bridges the gap between raw audio and professional documentation, providing a reliable engine for global communication.

3,917,307 görüntüleme • 3 ay önce

🚀 Qwen-Image-2512: Finer Details, Greater Realism We are thrilled to announce the Qwen-Image-2512 open-source release! This December update pushes the boundaries of our text-to-image foundational model, moving from "AI-generated" looks to true photorealism. What makes 2512 exceptional? · Enhanced Human Realism : We’ve eliminated the artificial "AI look" by capturing intricate facial details—like wrinkles and pores—and ensuring better adherence to body postures. · Finer Natural Detail : Experience notably more detailed rendering of landscapes, misty waterfalls, and animal fur with distinct, individual strands. · Advanced Text Rendering : Achieve professional-grade layout for complex infographics and PPT slides with unprecedented textual accuracy.

🚀 Qwen-Image-2512: Finer Details, Greater Realism We are thrilled to announce the Qwen-Image-2512 open-source release! This December update pushes the boundaries of our text-to-image foundational model, moving from "AI-generated" looks to true photorealism. What makes 2512 exceptional? · Enhanced Human Realism : We’ve eliminated the artificial "AI look" by capturing intricate facial details—like wrinkles and pores—and ensuring better adherence to body postures. · Finer Natural Detail : Experience notably more detailed rendering of landscapes, misty waterfalls, and animal fur with distinct, individual strands. · Advanced Text Rendering : Achieve professional-grade layout for complex infographics and PPT slides with unprecedented textual accuracy.

1,688,571 görüntüleme • 6 ay önce

Found a hardcore LoRA workflow! Generate a PLY point cloud --> Adjust angle in editor -->Refine with Qwen-Image-Edit-2511 Gaussian Splash LoRA. It accurately restores complex perspective shifts. As shown in the demo, it handles 3D rotation and can even restore high-def details from blurry close-ups. Within a 45° range, the consistency is unmatched. Huge thanks to 大雄 for this contribution! 🫡

Found a hardcore LoRA workflow! Generate a PLY point cloud --> Adjust angle in editor -->Refine with Qwen-Image-Edit-2511 Gaussian Splash LoRA. It accurately restores complex perspective shifts. As shown in the demo, it handles 3D rotation and can even restore high-def details from blurry close-ups. Within a 45° range, the consistency is unmatched. Huge thanks to 大雄 for this contribution! 🫡

1,493,358 görüntüleme • 6 ay önce

We released Qwen3-Omni-Flash (2025-12-01 version) API Service. Smarter interaction, more human expression: · A/V Interaction: Significant boost in instruction following. Solves "dumbing down" in casual chats with rock-solid stability. · Precise Control: Enhanced System Prompt adherence for specific personas, styles, and lengths. · Multilingual Mastery: Solved language switching instability. Now supports 119 text languages, 19 for speech understanding, and 10 for speech generation. · Human-like Speech: Adaptive speed and prosody. No more drag—sounds just like a real person.

We released Qwen3-Omni-Flash (2025-12-01 version) API Service. Smarter interaction, more human expression: · A/V Interaction: Significant boost in instruction following. Solves "dumbing down" in casual chats with rock-solid stability. · Precise Control: Enhanced System Prompt adherence for specific personas, styles, and lengths. · Multilingual Mastery: Solved language switching instability. Now supports 119 text languages, 19 for speech understanding, and 10 for speech generation. · Human-like Speech: Adaptive speed and prosody. No more drag—sounds just like a real person.

1,636,707 görüntüleme • 7 ay önce

Complex instruction following is critical for LLM agents and applications. IOPO with notable improvements is proposed to consider both input and output preference pairs , not only aligning with response preferences but also meticulously exploring the instruction preferences.

Complex instruction following is critical for LLM agents and applications. IOPO with notable improvements is proposed to consider both input and output preference pairs , not only aligning with response preferences but also meticulously exploring the instruction preferences.

1,762,955 görüntüleme • 1 yıl önce

1/4 We’re releasing MAI-UI—a family of foundation GUI agents. It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing state-of-the-art results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld. To meet real-world deployment constrains, MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants. We are publicly releasing two models: MAI-UI-2B and MAI-UI-8B.

1/4 We’re releasing MAI-UI—a family of foundation GUI agents. It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing state-of-the-art results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld. To meet real-world deployment constrains, MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants. We are publicly releasing two models: MAI-UI-2B and MAI-UI-8B.

855,637 görüntüleme • 6 ay önce

Curious about Wan2.5-Preview? Here's everything you need to know in 2 minutes! ⏱️ Wan2.5-Preview natively supports audio-visual synchronization, with massive upgrades to video, image generation, and editing for commercial-grade content. Watch our video below to see it all in action!

Curious about Wan2.5-Preview? Here's everything you need to know in 2 minutes! ⏱️ Wan2.5-Preview natively supports audio-visual synchronization, with massive upgrades to video, image generation, and editing for commercial-grade content. Watch our video below to see it all in action!

1,072,916 görüntüleme • 10 ay önce

1/4 How to get the most from a few high‑quality, trusted examples—not by piling up data? Tongyi Lab shifts from data‑centric to sample‑centric and presents LPPO, a Progressive Optimization RL framework to break reasoning bottlenecks: master each problem, not just add more.

1/4 How to get the most from a few high‑quality, trusted examples—not by piling up data? Tongyi Lab shifts from data‑centric to sample‑centric and presents LPPO, a Progressive Optimization RL framework to break reasoning bottlenecks: master each problem, not just add more.

1,079,954 görüntüleme • 10 ay önce

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!

1,014,717 görüntüleme • 10 ay önce

We introduce HumanOmniV2, an omni-modal model designed to address two core problems in multimodal reasoning: insufficient global context understanding and the shortcut problem. By analyzing visual, auditory, and textual signals, the model performs deep reasoning on complex human intentions, emotions, and social interactions.

We introduce HumanOmniV2, an omni-modal model designed to address two core problems in multimodal reasoning: insufficient global context understanding and the shortcut problem. By analyzing visual, auditory, and textual signals, the model performs deep reasoning on complex human intentions, emotions, and social interactions.

1,221,672 görüntüleme • 1 yıl önce

1/4Introducing Qwen3-Coder-Next: Our latest open-weights model designed specifically to power the next generation of autonomous Coding Agents. Built on Qwen3-Next, this model is engineered to handle complex, long-horizon programming tasks with unprecedented efficiency. High-performance agentic intelligence is now in your hands.

1/4Introducing Qwen3-Coder-Next: Our latest open-weights model designed specifically to power the next generation of autonomous Coding Agents. Built on Qwen3-Next, this model is engineered to handle complex, long-horizon programming tasks with unprecedented efficiency. High-performance agentic intelligence is now in your hands.

212,645 görüntüleme • 5 ay önce

Your AI Voice Partner: Smart, Empathetic & Useful Open-sourced now!!! Introducing Fun-Audio-Chat — a new end-to-end voice model, more than just chat. · An empathetic companion that understands tone and emotion. · A productivity helper that follows voice commands to get things done. · Leader in multiple benchmarks (OpenAudioBench, MMAU, etc.). · End-to-end S2S architecture — lower latency, higher efficiency. · Dual-resolution design — reduces GPU cost by ~50% · Supports voice function calling — just speak to complete tasks.

Your AI Voice Partner: Smart, Empathetic & Useful Open-sourced now!!! Introducing Fun-Audio-Chat — a new end-to-end voice model, more than just chat. · An empathetic companion that understands tone and emotion. · A productivity helper that follows voice commands to get things done. · Leader in multiple benchmarks (OpenAudioBench, MMAU, etc.). · End-to-end S2S architecture — lower latency, higher efficiency. · Dual-resolution design — reduces GPU cost by ~50% · Supports voice function calling — just speak to complete tasks.

256,132 görüntüleme • 7 ay önce

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

164,097 görüntüleme • 5 ay önce

Introducing Tongyi Fun – Alibaba’s enterprise-grade audio foundation model. Powered by Fun-ASR and Fun-CosyVoice it advances voice AI beyond “hearing and speaking” to truly understand context, transcribe with high accuracy, and speak with natural expressiveness—even in complex enterprise environments. ✅ Understands deeply: Trained on tens of millions of hours of real-world audio, with industry-specific terminology across finance, tech, manufacturing, and more. ✅ Transcribes accurately: A context-enhanced architecture with RAG minimizes errors, hallucinations, and cross-language interference. ✅ Speaks expressively: Delivers natural, stable, multilingual speech synthesis with cross-lingual voice cloning.

Introducing Tongyi Fun – Alibaba’s enterprise-grade audio foundation model. Powered by Fun-ASR and Fun-CosyVoice it advances voice AI beyond “hearing and speaking” to truly understand context, transcribe with high accuracy, and speak with natural expressiveness—even in complex enterprise environments. ✅ Understands deeply: Trained on tens of millions of hours of real-world audio, with industry-specific terminology across finance, tech, manufacturing, and more. ✅ Transcribes accurately: A context-enhanced architecture with RAG minimizes errors, hallucinations, and cross-language interference. ✅ Speaks expressively: Delivers natural, stable, multilingual speech synthesis with cross-lingual voice cloning.

280,542 görüntüleme • 9 ay önce