Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Qwen3.5 is here 🚀 397B params, just 17B active. Native multimodal agents for coding, reasoning, GUI + video. 200+ languages. Open weights. Real scale. The next frontier is open. 🔗

107,871 görüntüleme • 3 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Alibaba just dropped Qwen3.5-397B-A17B and there's a lot to unpack. 397B params, 17B active per forward pass. Sparse MoE done right. But the real story isn't the size—it's the architecture choices. The MoE Design Most MoE models feel like bolt-ons. Qwen 3.5's sparse activation is native—only 4.3% of parameters fire per token. That's how you get trillion-parameter-class performance without trillion-parameter inference costs. The 0.8 RMB/million tokens pricing isn't subsidized; it's structurally earned. Native Multimodal, Not Glued-On This is a vision-language model from the ground up. Heterogeneous architecture—separate processing pipelines for text, image, video that fuse early. Not a vision encoder slapped onto an LLM. The result: 90.8 on OmniDocBench, 79.0 on MMMU-Pro. Document understanding and visual reasoning without the usual brittleness. The Context Window Reality Qwen3.5-Plus (the hosted version) ships with 1M tokens by default. That's not a marketing number—they're actually positioning it for long-document workflows. With built-in adaptive tool use, it's clearly aimed at agentic automation, not just chat. What Actually Impressed Me • FP8 native pipeline: ~50% activation memory reduction • Async RL framework for continuous refinement—training and inference workloads separated • 201 languages (up from 119), 250k vocab for better low-resource encoding • Apache 2.0 license. Full weights on HuggingFace and ModelScope. The Benchmark Context 76.4 on SWE-bench Verified puts it in the range where it can handle real debugging workflows. 72.9 on BFCL v4 for agentic tool use. 88.4 on GPQA Diamond. These aren't SOTA in isolation, but the breadth is unusual—strong across reasoning, coding, multimodal, and agentic tasks. The Honest Caveat I haven't stress-tested the 1M context for needle-in-haystack retrieval yet. And "native multimodal" claims need real-world torture testing—PDFs with tables, charts, mixed layouts. Benchmarks are benchmarks. Bottom Line This isn't just another model release. It's a bet on efficient scale: big model capabilities, small active compute, open weights. At 1/18th the cost of Gemini 3 Pro, it's going to force pricing conversations across the board.

Bo Wang

13,221 görüntüleme • 3 ay önce