Introducing ml-intern, the agent that just automated the post-training... team Hugging Face It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: Web + mobile: And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.show more

Aksel
1,262,289 просмотров • 2 месяцев назад
OpenAI's Deep Research is getting a run for its... money. Deep Lake was just released, and it's a different take on an AI system that can do deep research on your own data. You can use Deep Lake to build AI search with reasoning on your private and public data. (Look at the attached videos to get an idea of how it works.) If you want to research proprietary and sensitive data, Deep Research won't help you because it's limited to public data. Deep Lake, however, will allow you to use your private data. On top of that, Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) You can even use mixed-data queries! Deep Lake can search your data from S3, Dropbox, and GCP. It learns from your queries over time, making the results as relevant to your work as possible!show more

Santiago
171,340 просмотров • 1 год назад
chat with papers for any arXiv link to HF... paper you can now chat using Hugging Chat All Hugging Face Papers now include a built-in assistant, powered by HuggingChat and the Hugging Face MCP server. It helps you quickly understand papers by answering questions, summarizing key ideas, and providing context as you browse the latest researchshow more

AK
39,895 просмотров • 5 месяцев назад
🚨 JUST IN: CHINA just released an AI EMPLOYEE... that works 24X7 on its own. 100% OPEN SOURCE. It researches, codes, builds websites, creates slide decks, and generates videos. All by itself. All on your computer. It's called DeerFlow. You give it a task. It makes a plan, spins up its own team of sub-agents, and gets to work. You come back and there's a finished deliverable waiting. Not a draft. Not a summary. The actual thing. Not a chatbot. Not a research assistant. An AI with its own computer that works while you sleep. Here's what it does on its own: → Spawns multiple sub-agents in parallel, each tackling a different piece of your task, then combines everything into one finished output → Writes real code, runs it, reads the results, and fixes its own mistakes without asking you once → Builds slide decks, websites, full research reports, and data dashboards from scratch → Remembers you across sessions. Your writing style. Your tech stack. Your preferences. Gets better every time. → Reads files you upload, works with them inside its own filesystem, hands you clean finished outputs → Searches the web, runs commands, calls any tool you plug in Here's how it thinks: You give one instruction. The lead agent makes a plan. Sub-agents fan out and work in parallel. Results come back. Everything gets synthesized. You get a deliverable. A single research task might split into a dozen sub-agents, each exploring a different angle, then converge into one finished website with generated visuals. Here's the wildest part: DeerFlow 2.0 launched on February 28th 2026 and hit number 1 on all of GitHub Trending the same day. Version 2.0 was a complete rewrite. Zero shared code with version 1. Because users kept using it for things the team never intended. Data pipelines. Dashboards. Entire content workflows. The community told them what it needed to become. So they burned it down and rebuilt it. 22.7K GitHub stars. 2.7K forks. Built by ByteDance 100% Open Source. MIT License.show more

Kanika
736,129 просмотров • 3 месяцев назад
You can now make your own shadcn/ui and use... it in v0. Create it on open it in v0, and use it as the foundation for your app.show more

v0
56,019 просмотров • 6 месяцев назад
I decided to share part of a prompt you... can use to research any protocol in seconds using INFINIT Intelligence by INFINIT. As an example, I used Silo Labs, where I currently farm most of my stablecoin yields. 🔖 Bookmark this + read until the end for a bonus. Just replace [PROTOCOL NAME] with any protocol you want to research. Prompt: "Conduct thorough research on [PROTOCOL NAME] and answer the following questions: - What is the project building? - What problem does it solve, and for whom? - What makes it different from others? - What blockchain is it built on? - Is there a token? What’s its purpose? - How does the protocol work technically? - How does it make money or sustain itself? - What’s the staking model, emissions, burns, and treasury? - Who are the founders, and is the team public and credible? - Who funded it? Who holds most tokens? - Are users, TVL, and volume growing? - How strong is the ecosystem around it? - Who are the main competitors, and how does it compare? - Is the protocol audited? Any past hacks? - What are the best ways to use the protocol (strategies)? - What’s coming next? Key milestones or launches? - What are the biggest risks? - How strong is the community and social traction?" Bonus: Quote this post with a reason why you like INFINIT, and I’ll DM you the full version of the prompt within the next 24 hours.show more

Keno
15,876 просмотров • 11 месяцев назад
3 weeks since ml-intern launched and we just hit... 1M messages exchanged. that's 3.3 agent-years of ML research in 21 days. 2 months worth of research every day. 17,383 training jobs total. talk about AI acceleration. here's some of what people built: Carlos Miguel Patiño replicated the full DeepSeek v4 architecture and pre+post trained a 100M MoE from scratch. → it landed a third place submission on Keller Jordan optimizer competition. autoresearch on SOTA territory. Lewis Tunstall Got the intern to convert Alec Radford's cool new talkie-lm 1930 model to work with transformers. tokenizer, chat template, model conversion etc all one-shotted by ml-intern. someone created entire PhD dissertation chapter on context-aware agentic cyber defense drafted with 16 research subagents. and someone used it to crack an Paul Jankura kernel optimization take-home. (we don't know how to feel about this one 👀 ) just getting started →show more

Aksel
35,091 просмотров • 1 месяц назад
Dexter vs. Claude Code I ran tests overnight and... Dexter came out ahead on complex financial tasks that required deep research. Dexter won on: • speed (by 92%) • cost (by 26%) • correctness (by 31%) I use Claude Code often, so this was fun to see. A key challenge for CC is that it relies on web search for financial data. Most of what it finds comes from news sites, blogs, and other secondary sources. Dexter uses primary source data from Financial Datasets, so the performance gap makes sense. Plenty of room to improve on Dexter. The gap will only grow from here. Evals from vals. Report coming next.show more

virat
26,638 просмотров • 6 месяцев назад
🐯 as soon as we finished our [training] completion... ceremony, i talked a lot with jihoon [on the phone], i think we talked for almost an hour 🐯 we were like, "you went through that too?" "i did too," and such... it also kinda differs a bit depending on the training campshow more

🌌
14,863 просмотров • 6 месяцев назад
We are entering an extremely exciting era for open-weight... models. Kimi K2.6 now feels like a top agentic model. I took it for a spin via Fireworks AI fast inference APIs. Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information. I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip) You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work. The artifact was fully generated by Kimi.ai's Kimi K2.6. It's cheap and fast. Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities. Stay tuned for more.show more

elvis
47,678 просмотров • 2 месяцев назад
ANTHROPIC JUST TURNED AI AGENTS INTO GIT REPOS Anthropic... shipped "ant" - a CLI that runs every Claude API endpoint straight from your terminal. The headline isn't the terminal access. It's that you can now version-control an AI agent as YAML in Git and have CI sync it to the Claude Platform, the same way you ship code. - Every API resource is a subcommand: messages, models, files, agents, sessions - Define an agent in a YAML file, check it into your repo, and keep it in sync with one update command - Spin up a session, send it an event, then pull every event and tool call back from the same CLI - Claude Code knows how to drive ant out of the box - it shells out and reads the results with no glue code Agents just stopped being prompts you babysit and became infrastructure you deploy.show more

BuBBliK
199,917 просмотров • 23 дней назад
Placing objects sounds simple… until robots have to do... it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !show more

Ilir Aliu - eu/acc
22,843 просмотров • 1 год назад
BREAKING: Anthropic just dropped Opus 4.8—and it is a... MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:show more

Dan Shipper 📧
351,625 просмотров • 28 дней назад
Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR... is a 3B-parameter vision model that achieves 97% precision while using 10× fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!show more

Akshay 🚀
126,036 просмотров • 7 месяцев назад
MiniMax is the James Bond of AI agents. It... uses the world's first open-weight model (MiniMax-M1), and it squeezes every bit of power from it. The agent takes a prompt and does more than any other agent in the market right now: 1. It can do Deep Research 2. It can write code 3. It can design web pages 4. It can build 3D models I built 5 different experiences using MiniMax and recorded them for you:show more

Santiago
44,730 просмотров • 1 год назад
I just built a Meta Ads diagnostic in Claude... Code that tells you WHY your account broke, not just what changed 🤯 It spins up a team of agents that each investigate a different reason performance dropped, then argue against each other to kill the wrong answer before it ever reaches you. All inside Claude Code. Perfect for DTC brands and agencies who panic-kill creative the second CPA spikes. If you've watched ROAS fall off a cliff and opened Ads Manager with ten tabs going, you already know what happens next. Your gut says "creative fatigue." You kill your best-performing ad. A week later performance is still broken, because that was never the problem. Guessing wrong is the most expensive move in paid social. This workflow ends the guessing: → One agent investigates each competing theory — creative fatigue, budget and delivery changes, traffic quality, offer and seasonality → Each one is blind to the others, reasoning only from its own slice of the data so they can't bias each other → A refuter agent then attacks every surviving theory and tries to kill it → A theory only stands if the data can't disprove it → You get a ranked diagnosis: the real cause, the evidence for and against it, and the one move to make this week No anchoring on the first obvious answer. No killing winning creative on a hunch. No "here's what happened" reports that never tell you why. What you get: → Every theory tested in parallel instead of one biased guess → An adversarial pass that kills the wrong answer before you act on it → A ranked diagnosis with confidence levels and evidence both ways → A reusable workflow you drop next month's export into and re-run Built 100% in Claude Code with the new dynamic workflows. The first account I ran it on looked like textbook creative fatigue. The workflow disagreed, and traced the real cause to a budget change that had doubled spend and flooded delivery with junk traffic. I put together a full playbook with the exact workflow, the prompt, and how to run it on your own account. Want it for free? > Like this post > Comment "META" And I'll send it over (must be following so I can DM)show more

Mike Futia
12,472 просмотров • 21 дней назад
You can now use Qwen3-VL in Jan. Find the... GGUF model on Hugging Face, click "Use this model" and select Jan, or copy the model link and paste it into Jan Hub. Thanks Qwen 🧡show more

👋 Jan
44,150 просмотров • 7 месяцев назад
this AI agent builds and sells info products on... full autopilot. here's how: - scan subreddits like r/anxiety, r/solotravel, r/socialskills, r/overthinking every few hours - find the fears people keep posting about over and over - generate a short PDF guide that actually helps them through it - spin up a landing page with payments built in - scan Reddit 24/7 for people posting about that exact problem and drop helpful comments pointing them to the guide - run completely hands off it finds the pain, builds the product and finds the customers. fully automated reply "AGENT" + RT and I'll send you a free guide so you can set it up too (must be following so I can DM)show more

Chris
24,235 просмотров • 2 месяцев назад
Meet Stable Audio 3.0, the open-weight model family built... for artistic experimentation. This is our open invitation to experiment with generative audio. We believe the best innovations are still waiting to be built. The 4-1-1 on 3.0: 📣 You own your outputs, and can distribute and commercialize them under the Stability AI Community License (up to $1 million in revenue). 🎵 New and improved capabilities include variable-length generation up to six minutes, and full song composition on portable devices, no GPU required. ✅ Trained on a fully licensed dataset. 🎨 You can customize the models on your own library with support for LoRa training, which we’ve documented for the first time. More on the models 👇show more

Stability AI
154,029 просмотров • 1 месяц назад
3/ Notebooks: With Web + Work + Pages, you... can ideate with AI and collaborate with other people. It has entirely changed my workflow. And now with Notebooks, I can organize all of my heterogeneous data for a project, whether it’s Pages, docs, websites, team meetings – and Copilot will ground itself just on that content. And this might be the best part: I can turn it all into a new modality like an audio overview. For example, I can collect all the latest things I’m reading about agents and agent frameworks, and then I can listen to it.show more

Satya Nadella
266,360 просмотров • 1 год назад