正在加载视频...
视频加载失败
Introducing SYNTHETIC-1: Collaboratively generating the largest synthetic dataset of verified reasoning traces for math, coding and science using DeepSeek-R1. Join us to contribute compute towards state-of-the-art open reasoning models.
11 条评论

Today, we release: - SYNTHETIC-1: 1.4 million high-quality tasks & verifiers - Public synthetic data run - allowing anyone to contribute compute - GENESYS: open, extendable synthetic data generation framework + call for crowdsourcing tasks & verifiers

Our open reproduction & scaling of R1 will proceed in two steps, mirroring the DeepSeek-R1 approach: 1. Generate verified reasoning data & train SFT model on this cold-start data 2. Globally distributed reinforcement learning with verifiable rewards

SYNTHETIC-1 Tasks & Verifiers - Math Problems with Symbolic Verifiers (777k tasks) - Coding Problems with Unit Tests (144k) - Open-Ended STEM Questions with LLM Judge (313k) - Real World Github Commit Instructions with LLM Judge (70k) - Code Output Prediction with Ground Truth String Matching (61k)

GENESYS - Open-source library for synthetic data generation & verification - Asynchronous verifiers (LLM judges, containerized code tests) - GitHub:

- Easily Extendable, enabling developers to contribute tasks & verifiers and collectively build an RL gym, as inspired by @karpathy

Contribute compute - Anyone can now contribute H200 nodes to help generate verified reasoning data - Real-time run dashboard: We’ll soon open up for trustless compute contributors

Protocol Testnet (Preview) - Decentralized, trustless protocol for globally distributed peer to peer compute and intelligence - Foundation for an ecosystem of frontier open models, agents and their compute base layer More details and announcements coming soon

Next Steps - Scale globally distributed RL with verifiable rewards - Open protocol testnet for permissionless contributions - Crowdsourcing datasets & verifiers for frontier-level reasoning Join us in building fully open-source AGI—through code, data, and compute.

Links - Blog: - SYNTHETIC-1 Dataset: - GENESYS - Synthetic Data Generation Framework: - Dashboard:

AI-first pull request reviewer with context-aware feedback, line-by-line code suggestions, and real-time chat.

one stop closer towards open source agi 🫡


