Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

MiniMax M3: Opus-level coding at DeepSeek pricing. On Terminal-Bench 2.1, it scores 66.0, only 0.1 behind Opus 4.7. I gave it a quick try on a few frontend tasks, and the output quality genuinely feels close to Opus 4.7.

Kai

8,707 subscribers

25,730 просмотров • 2 месяцев назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

My first time building with Opus 4.7 and I love it! I quickly built this Base AI Ecosystem map with names, links logos in the form of a galaxy that you can navigate through 🛰️ With some rigorous prompting, Opus 4.7 did everything for me including descriptions and logos!

My first time building with Opus 4.7 and I love it! I quickly built this Base AI Ecosystem map with names, links logos in the form of a galaxy that you can navigate through 🛰️ With some rigorous prompting, Opus 4.7 did everything for me including descriptions and logos!

Youssef

17,112 просмотров • 3 месяцев назад

we compared Gemini 3.1 Pro, Opus 4.7, and GPT 5.5 to Kimi K2.6, Xiaomi Mimo v2.5, and Qwen 3.6 Max in average, closed source are faster. GPT 5.5 and Opus 4.7 the fastest. Kimi K2.6 comes after. It keeps up thanks to its native INT4 quantization. MiMo took the longest, but the refinement and aesthetic only ranks after Gemini 3.1. as if a mini Gemini. Its slow cuz it's trained for long-horizon agentic work. Claude Opus 4.7 is blooming in a very different style, probably because Anthropic trains for taste, not just accuracy.

we compared Gemini 3.1 Pro, Opus 4.7, and GPT 5.5 to Kimi K2.6, Xiaomi Mimo v2.5, and Qwen 3.6 Max in average, closed source are faster. GPT 5.5 and Opus 4.7 the fastest. Kimi K2.6 comes after. It keeps up thanks to its native INT4 quantization. MiMo took the longest, but the refinement and aesthetic only ranks after Gemini 3.1. as if a mini Gemini. Its slow cuz it's trained for long-horizon agentic work. Claude Opus 4.7 is blooming in a very different style, probably because Anthropic trains for taste, not just accuracy.

GMI Cloud

37,446 просмотров • 2 месяцев назад

We’re excited to add support for Opus 4.6. With adaptive thinking, Opus 4.6 will decide the right amount of reasoning for your query, reducing latency and improving quality. Internally, we've already resolved bugs that Opus 4.5 couldn't tackle. Try it out on the latest!

We’re excited to add support for Opus 4.6. With adaptive thinking, Opus 4.6 will decide the right amount of reasoning for your query, reducing latency and improving quality. Internally, we've already resolved bugs that Opus 4.5 couldn't tackle. Try it out on the latest!

Warp

12,404 просмотров • 5 месяцев назад

Kimi K3 ranks #6 on ReactBench - It performs better than Opus 4.8 while being over 2x cheaper (!!) - Scores 33%, jump of +10% compared to K2.7 - Lags behind the frontier (Fable and Sol) in frontend production-readiness:

Kimi K3 ranks #6 on ReactBench - It performs better than Opus 4.8 while being over 2x cheaper (!!) - Scores 33%, jump of +10% compared to K2.7 - Lags behind the frontier (Fable and Sol) in frontend production-readiness:

Aiden Bai

73,851 просмотров • 13 дней назад

Opus 5 appears to be a top performing model at agentic CAD design! It even outperforms Fable on a number on a number of mechanical design tasks

Opus 5 appears to be a top performing model at agentic CAD design! It even outperforms Fable on a number on a number of mechanical design tasks

adam

101,428 просмотров • 9 дней назад

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:

Dan Shipper 📧

354,033 просмотров • 2 месяцев назад

An OpenAI engineer stopped me at a hackathon in Hayes Valley I had my terminal open on a table. Three panels. Live trades scrolling. He was walking past and froze. "That's not a demo. That's a live scoring engine. What model is that" I told him. Claude Opus 4.7. Four repos. $25 a month. He pulled up a chair without asking. "We benchmarked Opus 4.7 internally. It beat o3 on structured reasoning across every eval we ran. And you're telling me you're using it to trade" I told him it does more than trade. It reads 86 million trades and finds who wins and why. No fine-tuning. No prompting chains. Just raw context. He leaned back. "Show me the data source" I opened one link. 86 million trades. Every wallet. Every entry. Every exit. "You point Opus 4.7 at this and it reverse-engineers the strategy. It finds the wallets that win. Then it finds why they win. Then it copies the pattern" His team spent 14 months building something similar. 10 engineers. Custom infra. Still in staging. "The part that killed us was exit timing. Every model we trained nailed entries. But the best traders exit before the crowd. We never figured out the threshold" I told him my bot cuts at 85% of expected move. Or on a 3x volume spike. Whichever comes first. He stopped talking. "How did you find that" Opus 4.7 found it in poly_data. Top wallets exit before resolution 86% of the time. Losers hold to 58%. Exits are the entire game. I opened another tab. "Three commands. 500 markets. Opus scores them in 20 minutes" "That's our internal eval pipeline. Except it took us a year and you did it in a weekend with our competitor's model" My setup: Claude Opus 4.7 - $20/mo VPS - $5/mo poly_data - free polymarket-cli - free 214 trades. 74% win rate. +$9,400 in 19 days. Copytrade here: I showed him the article where I broke down every repo and every command. He read it twice. Then looked up. "You just published what we've been trying to ship for six months. Using the other team's model" He texted me the next day. "My manager found your thread. Delete it" Too late.

An OpenAI engineer stopped me at a hackathon in Hayes Valley I had my terminal open on a table. Three panels. Live trades scrolling. He was walking past and froze. "That's not a demo. That's a live scoring engine. What model is that" I told him. Claude Opus 4.7. Four repos. $25 a month. He pulled up a chair without asking. "We benchmarked Opus 4.7 internally. It beat o3 on structured reasoning across every eval we ran. And you're telling me you're using it to trade" I told him it does more than trade. It reads 86 million trades and finds who wins and why. No fine-tuning. No prompting chains. Just raw context. He leaned back. "Show me the data source" I opened one link. 86 million trades. Every wallet. Every entry. Every exit. "You point Opus 4.7 at this and it reverse-engineers the strategy. It finds the wallets that win. Then it finds why they win. Then it copies the pattern" His team spent 14 months building something similar. 10 engineers. Custom infra. Still in staging. "The part that killed us was exit timing. Every model we trained nailed entries. But the best traders exit before the crowd. We never figured out the threshold" I told him my bot cuts at 85% of expected move. Or on a 3x volume spike. Whichever comes first. He stopped talking. "How did you find that" Opus 4.7 found it in poly_data. Top wallets exit before resolution 86% of the time. Losers hold to 58%. Exits are the entire game. I opened another tab. "Three commands. 500 markets. Opus scores them in 20 minutes" "That's our internal eval pipeline. Except it took us a year and you did it in a weekend with our competitor's model" My setup: Claude Opus 4.7 - $20/mo VPS - $5/mo poly_data - free polymarket-cli - free 214 trades. 74% win rate. +$9,400 in 19 days. Copytrade here: I showed him the article where I broke down every repo and every command. He read it twice. Then looked up. "You just published what we've been trying to ship for six months. Using the other team's model" He texted me the next day. "My manager found your thread. Delete it" Too late.

Lunar

136,549 просмотров • 3 месяцев назад

DeepSeek V4 Flash 0731 is now 90% off on Nous Portal for the next 7 days, in partnership with Novita AI. At this discounted price, it is over 1000x cheaper than Fable 5 on comparable tasks while still beating it on Terminal-Bench 2.1. Try it today at

DeepSeek V4 Flash 0731 is now 90% off on Nous Portal for the next 7 days, in partnership with Novita AI. At this discounted price, it is over 1000x cheaper than Fable 5 on comparable tasks while still beating it on Terminal-Bench 2.1. Try it today at

Nous Research

660,019 просмотров • 19 часов назад

I tested Kimi K3 vs Claude Opus 4.8 Same prompt, an armory bay with lighting, props, and detail. Top is Kimi K3, bottom is Opus 4.8. It's not even close. Kimi K3 built a full scene with textures, proper lighting, ammo crates, weapon racks, working detail everywhere. Opus 4.8 gave me a near empty room with a couple of floating tables. No doubt it beats Opus 4.8. Kimi K3 is Fable 5 level, and it's clearly better than GPT-5.6 Sol at 3D and games. An open weight model just matched the best closed models on the market. Let that sink in.

I tested Kimi K3 vs Claude Opus 4.8 Same prompt, an armory bay with lighting, props, and detail. Top is Kimi K3, bottom is Opus 4.8. It's not even close. Kimi K3 built a full scene with textures, proper lighting, ammo crates, weapon racks, working detail everywhere. Opus 4.8 gave me a near empty room with a couple of floating tables. No doubt it beats Opus 4.8. Kimi K3 is Fable 5 level, and it's clearly better than GPT-5.6 Sol at 3D and games. An open weight model just matched the best closed models on the market. Let that sink in.

Bhavy☄️

467,303 просмотров • 17 дней назад

Opus 4.7 can build Lottie Animations. One prompt via Lottie Creator MCP → 500 particles, each with its own path, easing, and arrival frame. I didn't touch a keyframe. What should I ask it to build next? Best reply, I'll make it.

Opus 4.7 can build Lottie Animations. One prompt via Lottie Creator MCP → 500 particles, each with its own path, easing, and arrival frame. I didn't touch a keyframe. What should I ask it to build next? Best reply, I'll make it.

Nattu

186,829 просмотров • 3 месяцев назад

the real reason why SF is 6 months ahead on AI is just physics: shipping a SOTA model from the bay area to NYC is a logistical nightmare imagine what it took to bring Opus 4.7 to Europe (we don't have trains here) 😢

the real reason why SF is 6 months ahead on AI is just physics: shipping a SOTA model from the bay area to NYC is a logistical nightmare imagine what it took to bring Opus 4.7 to Europe (we don't have trains here) 😢

fabian

1,850,659 просмотров • 2 месяцев назад

Salute to the Qwen team 🫡 We tested Qwen 3.7-Max, Gemini 3.5 Flash, GPT-5.5, and Claude Opus 4.7. The biggest shock came from Qwen. In less than a month (3.6 Max dropped April 20), Qwen went from the worst multimodal output on our sakura tree test, barely keeping up with Gemini, GPT, and Claude , to matching Gemini 3.5 Flash frame for frame on this soccer test, and outperforming GPT-5.5 and Claude Opus 4.7. It rendered a perfectly proportioned soccer player and the most lifelike ball in the entire test. Remarkable spatial reasoning. Also: Gemini 3.5 Flash is now faster than GPT-5.5, which used to be the fastest in our past tests.

Salute to the Qwen team 🫡 We tested Qwen 3.7-Max, Gemini 3.5 Flash, GPT-5.5, and Claude Opus 4.7. The biggest shock came from Qwen. In less than a month (3.6 Max dropped April 20), Qwen went from the worst multimodal output on our sakura tree test, barely keeping up with Gemini, GPT, and Claude , to matching Gemini 3.5 Flash frame for frame on this soccer test, and outperforming GPT-5.5 and Claude Opus 4.7. It rendered a perfectly proportioned soccer player and the most lifelike ball in the entire test. Remarkable spatial reasoning. Also: Gemini 3.5 Flash is now faster than GPT-5.5, which used to be the fastest in our past tests.

GMI Cloud

75,785 просмотров • 2 месяцев назад

LongCat performed Opus 4.8 and GPT 5.5 level on real physics tasks for $0! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics Prompts: - A cannon demolishing a brick wall - A bowling ball knocking down the pins - A tornado that sucks in random objects Outputs: LongCat: 18,015 tokens, $0.00 Opus 4.8: 18,872 tokens, $0.48 GPT 5.5: 32,588 tokens, $0.98 GLM 5.2: 31,062 tokens, $0.09 On the physics LongCat came out ahead of Opus 4.8 and GLM 5.2 - cleaner collisions, nothing clipping or falling through. On detail and rendering it matched GPT 5.5, the best looking of the four. Getting this quality for free is wild!

LongCat performed Opus 4.8 and GPT 5.5 level on real physics tasks for $0! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics Prompts: - A cannon demolishing a brick wall - A bowling ball knocking down the pins - A tornado that sucks in random objects Outputs: LongCat: 18,015 tokens, $0.00 Opus 4.8: 18,872 tokens, $0.48 GPT 5.5: 32,588 tokens, $0.98 GLM 5.2: 31,062 tokens, $0.09 On the physics LongCat came out ahead of Opus 4.8 and GLM 5.2 - cleaner collisions, nothing clipping or falling through. On detail and rendering it matched GPT 5.5, the best looking of the four. Getting this quality for free is wild!

atomic.chat

105,513 просмотров • 1 месяц назад

We put grok 4.5 and opus 4.8 head to head on the same browser task, on Hyperbrowser Sandboxes. We then asked it to open a page in a real sandboxed browser and pull the title. Grok build on Grok 4.5, Claude code on Opus 4.8, identical setup. Grok 4.5 came out ahead ↓

We put grok 4.5 and opus 4.8 head to head on the same browser task, on Hyperbrowser Sandboxes. We then asked it to open a page in a real sandboxed browser and pull the title. Grok build on Grok 4.5, Claude code on Opus 4.8, identical setup. Grok 4.5 came out ahead ↓

Hyperbrowser

386,220 просмотров • 25 дней назад

Fable 5, first Mythos-class model is live on AI/ML API! We ran a fun test: Opus 4.8 vs Fable 5 are generating a 3D Pokemon. Verdict? Fable 5 is brilliant, fast, and rare as Mew… but Opus is still that nice little guy who does great stuff. 💛 Fable 5 Important bits: SOTA on nearly every benchmark, and the lead only grows on longer, complex tasks. • 1M Context • $10 / 1M input • $50 / 1M output Built for: long-horizon agentic coding, big migrations, vision-to-code, deep research.

Fable 5, first Mythos-class model is live on AI/ML API! We ran a fun test: Opus 4.8 vs Fable 5 are generating a 3D Pokemon. Verdict? Fable 5 is brilliant, fast, and rare as Mew… but Opus is still that nice little guy who does great stuff. 💛 Fable 5 Important bits: SOTA on nearly every benchmark, and the lead only grows on longer, complex tasks. • 1M Context • $10 / 1M input • $50 / 1M output Built for: long-horizon agentic coding, big migrations, vision-to-code, deep research.

AI/ML API

560,634 просмотров • 1 месяц назад

I don't think there's a single terminal ux that handles agent swarms well With slate, you can literally use Opus 4.6 and GPT 5.4 at the exact same time But making it intuitive took a ton of work So heres a thread on how it works and how to actually use it 🧵

I don't think there's a single terminal ux that handles agent swarms well With slate, you can literally use Opus 4.6 and GPT 5.4 at the exact same time But making it intuitive took a ton of work So heres a thread on how it works and how to actually use it 🧵

akira

139,114 просмотров • 4 месяцев назад

claude design is my all time favorite anthropic drop this year, even more than opus 4.7 its actually really really good at design and the "handoff to claude code" makes it basically copy/paste ui to the actual working product claude design made the entire ui for merl (full video on merl soon)

claude design is my all time favorite anthropic drop this year, even more than opus 4.7 its actually really really good at design and the "handoff to claude code" makes it basically copy/paste ui to the actual working product claude design made the entire ui for merl (full video on merl soon)

ashen

368,004 просмотров • 3 месяцев назад

1-bit Kimi K3 performs at Opus 5 level on 3D physics! We ran our Atomic Chat quant of Kimi K3 locally on 4x B200 against three cloud models and gave them all the same task, to build a giant anvil drop test as a single HTML file with real physics Outputs: K3 1bit (local): 15.8K tokens, $0 API cost Kimi K3 (API): 15.3K tokens, $0.30 API cost Opus 5: 22.8K tokens, $0.77 API cost GPT 5.6: 14.5K tokens, $0.72 API cost All four got the physics right. But only Kimi made a working winch. The drum turns and the chain drags the flat car off the pad. Opus 5 drew the most detail, road markings and sparks on the hit. And you can run a model at this level on your own box now. That still feels insane to us

1-bit Kimi K3 performs at Opus 5 level on 3D physics! We ran our Atomic Chat quant of Kimi K3 locally on 4x B200 against three cloud models and gave them all the same task, to build a giant anvil drop test as a single HTML file with real physics Outputs: K3 1bit (local): 15.8K tokens, $0 API cost Kimi K3 (API): 15.3K tokens, $0.30 API cost Opus 5: 22.8K tokens, $0.77 API cost GPT 5.6: 14.5K tokens, $0.72 API cost All four got the physics right. But only Kimi made a working winch. The drum turns and the chain drags the flat car off the pad. Opus 5 drew the most detail, road markings and sparks on the hit. And you can run a model at this level on your own box now. That still feels insane to us

atomic.chat

51,542 просмотров • 2 дней назад

claude opus 4.7 just dropped so i automated my ENTIRE cold email system that books 80 calls/month just put together 30 pages on the full AI playbook - AI writes my scripts (better than i can) - AI researches and personalizes leads - AI replies to interested leads 24/7 - AI optimizes campaigns from the data - every prompt i use for everything this wouldve saved me 2 years of trial and error like + comment "OPUS" and i'll send it over (must follow + RT for priority access)

claude opus 4.7 just dropped so i automated my ENTIRE cold email system that books 80 calls/month just put together 30 pages on the full AI playbook - AI writes my scripts (better than i can) - AI researches and personalizes leads - AI replies to interested leads 24/7 - AI optimizes campaigns from the data - every prompt i use for everything this wouldve saved me 2 years of trial and error like + comment "OPUS" and i'll send it over (must follow + RT for priority access)

James Shields

25,513 просмотров • 3 месяцев назад

We partnered with Prime Intellect to build Fast Ask, a small RL-trained subagent that helps our Sheets agent find answers in spreadsheets. It scores +4% over Opus on exact match accuracy at Haiku latency.

We partnered with Prime Intellect to build Fast Ask, a small RL-trained subagent that helps our Sheets agent find answers in spreadsheets. It scores +4% over Opus on exact match accuracy at Haiku latency.

Ramp Labs

331,815 просмотров • 2 месяцев назад