Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Introducing ml-intern, the agent that just automated the post-training team Hugging Face It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed...

1,260,736 Aufrufe • vor 1 Monat •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

🚨 JUST IN: CHINA just released an AI EMPLOYEE that works 24X7 on its own. 100% OPEN SOURCE. It researches, codes, builds websites, creates slide decks, and generates videos. All by itself. All on your computer. It's called DeerFlow. You give it a task. It makes a plan, spins up its own team of sub-agents, and gets to work. You come back and there's a finished deliverable waiting. Not a draft. Not a summary. The actual thing. Not a chatbot. Not a research assistant. An AI with its own computer that works while you sleep. Here's what it does on its own: → Spawns multiple sub-agents in parallel, each tackling a different piece of your task, then combines everything into one finished output → Writes real code, runs it, reads the results, and fixes its own mistakes without asking you once → Builds slide decks, websites, full research reports, and data dashboards from scratch → Remembers you across sessions. Your writing style. Your tech stack. Your preferences. Gets better every time. → Reads files you upload, works with them inside its own filesystem, hands you clean finished outputs → Searches the web, runs commands, calls any tool you plug in Here's how it thinks: You give one instruction. The lead agent makes a plan. Sub-agents fan out and work in parallel. Results come back. Everything gets synthesized. You get a deliverable. A single research task might split into a dozen sub-agents, each exploring a different angle, then converge into one finished website with generated visuals. Here's the wildest part: DeerFlow 2.0 launched on February 28th 2026 and hit number 1 on all of GitHub Trending the same day. Version 2.0 was a complete rewrite. Zero shared code with version 1. Because users kept using it for things the team never intended. Data pipelines. Dashboards. Entire content workflows. The community told them what it needed to become. So they burned it down and rebuilt it. 22.7K GitHub stars. 2.7K forks. Built by ByteDance 100% Open Source. MIT License.

Kanika

735,266 Aufrufe • vor 2 Monaten

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:

Dan Shipper 📧

350,939 Aufrufe • vor 22 Tagen

I just built a Meta Ads diagnostic in Claude Code that tells you WHY your account broke, not just what changed 🤯 It spins up a team of agents that each investigate a different reason performance dropped, then argue against each other to kill the wrong answer before it ever reaches you. All inside Claude Code. Perfect for DTC brands and agencies who panic-kill creative the second CPA spikes. If you've watched ROAS fall off a cliff and opened Ads Manager with ten tabs going, you already know what happens next. Your gut says "creative fatigue." You kill your best-performing ad. A week later performance is still broken, because that was never the problem. Guessing wrong is the most expensive move in paid social. This workflow ends the guessing: → One agent investigates each competing theory — creative fatigue, budget and delivery changes, traffic quality, offer and seasonality → Each one is blind to the others, reasoning only from its own slice of the data so they can't bias each other → A refuter agent then attacks every surviving theory and tries to kill it → A theory only stands if the data can't disprove it → You get a ranked diagnosis: the real cause, the evidence for and against it, and the one move to make this week No anchoring on the first obvious answer. No killing winning creative on a hunch. No "here's what happened" reports that never tell you why. What you get: → Every theory tested in parallel instead of one biased guess → An adversarial pass that kills the wrong answer before you act on it → A ranked diagnosis with confidence levels and evidence both ways → A reusable workflow you drop next month's export into and re-run Built 100% in Claude Code with the new dynamic workflows. The first account I ran it on looked like textbook creative fatigue. The workflow disagreed, and traced the real cause to a budget change that had doubled spend and flooded delivery with junk traffic. I put together a full playbook with the exact workflow, the prompt, and how to run it on your own account. Want it for free? > Like this post > Comment "META" And I'll send it over (must be following so I can DM)

Mike Futia

12,371 Aufrufe • vor 16 Tagen