Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

1/ Introducing RL Swarm’s new backend: GenRL. A modular reinforcement learning library built for distributed, fault-tolerant training - now powering RL Swarm from the ground up. 🧵

gensyn

94,707 subscribers

82,166 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

gensyn profil fotoğrafı

gensyn1 yıl önce

2/ Each worker runs its own environment instance, contributes asynchronously to a shared rollout buffer, and updates its model weights independently, so no central controller is required.

gensyn profil fotoğrafı

gensyn1 yıl önce

3/ GenRL allows RL Swarm to work with any environment, described intuitively through code. This launch incorporates Reasoning Gym out-of-the-box, giving access to >100 community-created environments with no extra configuration required.

gensyn profil fotoğrafı

gensyn1 yıl önce

4/ What’s new: – Modular GenRL backend – Expanded configuration surface – Prebuilt Docker image for easy deployment – Reasoning Gym environment to enhance model reasoning capabilities – New multi-task swarm

gensyn profil fotoğrafı

gensyn1 yıl önce

5/ Now live on the Gensyn testnet. You can run RL-Swarm with GenRL today. Full code + setup:

gensyn profil fotoğrafı

gensyn1 yıl önce

6/ A node update is required for GenRL. Please visit ⁠support-discussion in the Discord if you have any questions.

Gautamgg 🕵 profil fotoğrafı

Gautamgg 🕵1 yıl önce

I want to ask 1 que What about previous trained model data rewards & participants bec it's not showing Is that data saved in your database? @fenbielding @_jamico @_grieve waiting for ans 💙

Mintair | One Click Node🪄 profil fotoğrafı

Mintair | One Click Node🪄1 yıl önce

Looks really interesting, we gotta setup our own custom environment.

AJDominic (🐱,🐐) profil fotoğrafı

AJDominic (🐱,🐐)1 yıl önce

What gensyn cooking is unmatched!

Bitduke profil fotoğrafı

Bitduke1 yıl önce

Cool, cool - more modularity

lior.eth (Lior Messika) profil fotoğrafı

lior.eth (Lior Messika)1 yıl önce

These retro vibes are everything I ever wanted from an AI lab

Benzer Videolar

The network for machine intelligence Two years ago, we laid out our vision for a machine learning compute protocol. One that connects every device in the world into an open network for machine intelligence, with no gatekeepers or artificial boundaries. This week, we’ll be sharing some of our early progress, beginning with RL Swarm, a peer-to-peer system for collaborative reinforcement learning over the internet. Next month, we’ll open our Testnet, allowing anyone to contribute to the frontier of open machine intelligence. Introducing RL Swarm RL Swarm is a fully open source system for collaborative reinforcement learning over the internet. It is a live demo of our research findings, which show that models training with RL learn faster when they train as a collective swarm than they do on their own. Join our swarm now to see this in practice. You can participate with consumer hardware at home or a powerful GPU in the cloud. You can follow along with the swarm’s progress by following the links below.

The network for machine intelligence Two years ago, we laid out our vision for a machine learning compute protocol. One that connects every device in the world into an open network for machine intelligence, with no gatekeepers or artificial boundaries. This week, we’ll be sharing some of our early progress, beginning with RL Swarm, a peer-to-peer system for collaborative reinforcement learning over the internet. Next month, we’ll open our Testnet, allowing anyone to contribute to the frontier of open machine intelligence. Introducing RL Swarm RL Swarm is a fully open source system for collaborative reinforcement learning over the internet. It is a live demo of our research findings, which show that models training with RL learn faster when they train as a collective swarm than they do on their own. Join our swarm now to see this in practice. You can participate with consumer hardware at home or a powerful GPU in the cloud. You can follow along with the swarm’s progress by following the links below.

gensyn

228,703 görüntüleme • 1 yıl önce

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

TBPN

76,634 görüntüleme • 1 yıl önce

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

AK

15,364 görüntüleme • 8 ay önce

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Learning Systems and Robotics Lab (is hiring!)

49,655 görüntüleme • 4 ay önce

Introducing Repo2RLEnv Turn any repository into runnable, verifiable coding environments built from real PRs and commits for coding-agent evaluation or RL training > uv pip install repo2rlenv

Introducing Repo2RLEnv Turn any repository into runnable, verifiable coding environments built from real PRs and commits for coding-agent evaluation or RL training > uv pip install repo2rlenv

Adithya S K

66,475 görüntüleme • 1 ay önce

Reinforcement Learning is the future tense of intelligence. Echo is how it scales. Echo is Gradient’s distributed RL framework, running on everyday consumer devices. From its early experiments, Echo powered a 30B Sokoban model that outperformed DeepSeek-R1 and GPT-OSS-120B.

Reinforcement Learning is the future tense of intelligence. Echo is how it scales. Echo is Gradient’s distributed RL framework, running on everyday consumer devices. From its early experiments, Echo powered a 30B Sokoban model that outperformed DeepSeek-R1 and GPT-OSS-120B.

Gradient

279,022 görüntüleme • 10 ay önce

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game abs: project page:

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game abs: project page:

AK

293,335 görüntüleme • 3 yıl önce

We teamed up with NVIDIA and Matthew Berman to teach you how to do Reinforcement Learning! Learn about: - RL environments, reward functions & reward hacking - Training OpenAI gpt-oss to automatically solve 2048 - Local Windows training with NVIDIA_AI_PC RTX GPUs - How RLVR (verifiable rewards) works - How to interpret RL metrics like KL Divergence Full video tutorial:

We teamed up with NVIDIA and Matthew Berman to teach you how to do Reinforcement Learning! Learn about: - RL environments, reward functions & reward hacking - Training OpenAI gpt-oss to automatically solve 2048 - Local Windows training with NVIDIA_AI_PC RTX GPUs - How RLVR (verifiable rewards) works - How to interpret RL metrics like KL Divergence Full video tutorial:

Unsloth AI

54,179 görüntüleme • 6 ay önce

"We have maybe the same size cake, and we just want to crush it with a giant reinforcement learning cherry." OpenAI's Dan Roberts discussed the controversial shift to prioritizing RL over pre-training at AI Ascent.

"We have maybe the same size cake, and we just want to crush it with a giant reinforcement learning cherry." OpenAI's Dan Roberts discussed the controversial shift to prioritizing RL over pre-training at AI Ascent.

Sequoia Capital

18,285 görüntüleme • 1 yıl önce

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

Yukang Chen

69,747 görüntüleme • 8 ay önce

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

Haotian Ye

77,626 görüntüleme • 6 ay önce

The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, Jiazhi Yang in RISE does RL in a compositional world model. Learn more ->

The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, Jiazhi Yang in RISE does RL in a compositional world model. Learn more ->

Chris Paxton

33,766 görüntüleme • 20 gün önce

Introducing Muscle v0 -- infinite degrees of freedom, from Daxo Robotics. A different mountain to climb - with a far more beautiful peak. We built this from the ground up: - Ultra-dexterous - Built for machine learning - Durable and robust More below (1/n)

Introducing Muscle v0 -- infinite degrees of freedom, from Daxo Robotics. A different mountain to climb - with a far more beautiful peak. We built this from the ground up: - Ultra-dexterous - Built for machine learning - Durable and robust More below (1/n)

Tom Zhang

273,408 görüntüleme • 1 yıl önce

Using our brain simulator, we’ve trained a reinforcement learning agent to maximize bits per second. Here is the RL policy converting brain data to cursor control in simulation:

Using our brain simulator, we’ve trained a reinforcement learning agent to maximize bits per second. Here is the RL policy converting brain data to cursor control in simulation:

Neuralink

18,410 görüntüleme • 1 yıl önce

🚨 RL for LLMs is finally accessible. Introducing OpenTinker: The first community-driven, open-source framework designed to democratize Reinforcement Learning for LLMs. Inspired by Thinking Machines's amazing Tinker, we realize the biggest bottleneck in agentic LLM research isn’t the math—it’s the setup. Current RL pipelines are messy. Configuring VeRL for every single experiment is a productivity killer. OpenTinker fixed it. 🛠 How OpenTinker Works: Decoupled Design of Server and Client - Setup Once, Run Forever: Configure the OpenTinker backend on your GPU cluster once. - Develop Locally: Define your RL environments directly on your laptop. - Train on the Cloud: Simply point your local client to the backend. The cluster handles the compute; you handle the science. 📉 The 10x Development Efficiency Thanks to our elegant architectural decomposition, OpenTinker reduces the time to develop a new RL training pipeline by at least an order of magnitude. ⚡ Turn Idle GPU Compute into Gold Small labs often have underutilized hardware. OpenTinker turns your idle GPUs into an internal/external API service for - RL Training - SFT - Inference 🎯 Who needs OpenTinker? - Researchers tired of infrastructure hell. - Labs needing to standardize workflows. - Teams wanting to maximize hardware ROI. Thanks my amazing PhD student Siqi Zhu for leading the project. We are building the future of open RL infra. Be the first to build with us. 👇 Start Building with OpenTinker Now 🚀 Repo: 🌐 Blog: If you believe RL should be accessible to everyone, give us a star, repost this 🔄 post, and let us know what agents you plan to build!

🚨 RL for LLMs is finally accessible. Introducing OpenTinker: The first community-driven, open-source framework designed to democratize Reinforcement Learning for LLMs. Inspired by Thinking Machines's amazing Tinker, we realize the biggest bottleneck in agentic LLM research isn’t the math—it’s the setup. Current RL pipelines are messy. Configuring VeRL for every single experiment is a productivity killer. OpenTinker fixed it. 🛠 How OpenTinker Works: Decoupled Design of Server and Client - Setup Once, Run Forever: Configure the OpenTinker backend on your GPU cluster once. - Develop Locally: Define your RL environments directly on your laptop. - Train on the Cloud: Simply point your local client to the backend. The cluster handles the compute; you handle the science. 📉 The 10x Development Efficiency Thanks to our elegant architectural decomposition, OpenTinker reduces the time to develop a new RL training pipeline by at least an order of magnitude. ⚡ Turn Idle GPU Compute into Gold Small labs often have underutilized hardware. OpenTinker turns your idle GPUs into an internal/external API service for - RL Training - SFT - Inference 🎯 Who needs OpenTinker? - Researchers tired of infrastructure hell. - Labs needing to standardize workflows. - Teams wanting to maximize hardware ROI. Thanks my amazing PhD student Siqi Zhu for leading the project. We are building the future of open RL infra. Be the first to build with us. 👇 Start Building with OpenTinker Now 🚀 Repo: 🌐 Blog: If you believe RL should be accessible to everyone, give us a star, repost this 🔄 post, and let us know what agents you plan to build!

Jiaxuan You

58,120 görüntüleme • 6 ay önce

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 görüntüleme • 7 ay önce

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Prime Intellect

1,137,660 görüntüleme • 7 ay önce

RL Grime. ISOxo. Jewel. The 2nd drop hardstyle fakeout sent me straight into the ground RL GRIME x ISOxo

RL Grime. ISOxo. Jewel. The 2nd drop hardstyle fakeout sent me straight into the ground RL GRIME x ISOxo

Dancing Astronaut

47,331 görüntüleme • 2 yıl önce

OpenAI shows how gpt-oss can autonomously beat 2048 using reinforcement learning (RL). Training was done locally with Unsloth on NVIDIA DGX Spark. You can also do it free on Colab. 🦥 OpenAI DevDay notebook:

OpenAI shows how gpt-oss can autonomously beat 2048 using reinforcement learning (RL). Training was done locally with Unsloth on NVIDIA DGX Spark. You can also do it free on Colab. 🦥 OpenAI DevDay notebook:

Unsloth AI

98,847 görüntüleme • 8 ay önce

What if you could train AI agents on a laptop as easily as on a GPU cluster? Researchers from UIUC's U Lab, led by Prof. Jiaxuan You, just open-sourced OpenTinker. It's a new "Reinforcement-Learning-as-a-Service" (RLaaS) system that decouples the complex training pipeline into simple, distributed services with friendly APIs. The result? It breaks down the major engineering barriers to RL, outperforming traditional frameworks in accessibility and ease of deployment, finally making agent training viable for more developers and teams. Project: Code: U Lab: Our report: 📬 #PapersAccepted by Jiqizhixin

What if you could train AI agents on a laptop as easily as on a GPU cluster? Researchers from UIUC's U Lab, led by Prof. Jiaxuan You, just open-sourced OpenTinker. It's a new "Reinforcement-Learning-as-a-Service" (RLaaS) system that decouples the complex training pipeline into simple, distributed services with friendly APIs. The result? It breaks down the major engineering barriers to RL, outperforming traditional frameworks in accessibility and ease of deployment, finally making agent training viable for more developers and teams. Project: Code: U Lab: Our report: 📬 #PapersAccepted by Jiqizhixin

机器之心 JIQIZHIXIN

15,893 görüntüleme • 5 ay önce