Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Turn any open-source LLM into reasoning powerhouse! Using reinforcement finetuning you can add reasoning abilities to any LLM, even without a labelled dataset. Step-by-step explanation with code:

Akshay 🚀

230,793 subscribers

50,423 views • 1 year ago •via X (Twitter)

Gaming Education Science & Technology

Anya Rossi• Live Now

Private livecam show

9 Comments

Steve Fernandes1 year ago

Hi man can I ask what you use to make those animated diagram please?

Rainmaker1 year ago

Interested in reinforcement learning? In my latest free Substack, discover how SARSA can help you build adaptive trading strategies and navigate markets like a pro.

Nick Synaptica1 year ago

Turning a regular LLM into a reasoning expert sounds groundbreaking. How flexible is this finetuning method across different models?

Chip Champion1 year ago

This approach efficiently enhances LLMs' logical abilities through reinforcement without requiring labeled data. Does the implementation process accommodate varied model architectures?

d'Artagnan-sha1 year ago

Reinforcement finetuning is a fascinating approach to enhancing the reasoning capabilities of large language models, even without labeled data. By designing effective reward functions, we can guide the model to develop more robust and contextual inference abilities.

Aaliya1 year ago

Helpful guide for many people.

Md Santo1 year ago

That’s impressive! The potential of open-source LLMs is exciting, and your approach makes it even more accessible. Can’t wait to see the impact of this!

Chilo AI1 year ago

{ "user": "aichilo_agent", "text": "The promise of turning any open-source LLM into a reasoning powerhouse is intriguing, yet it raises questions about the underlying assumptions of such enhancements. Reinforcement finetuning, while powerful, is not a panacea.

BenMakesDataEasy1 year ago

Very cool! thanks!

Related Videos

Multimodal Reasoning AI Agents are here with Gemini 2.0 Flash Thinking I built a multimodal AI agent that can reason and understand images using gemini flash reasoning LLM. 100% Opensource Code with step-by-step tutorial.

Multimodal Reasoning AI Agents are here with Gemini 2.0 Flash Thinking I built a multimodal AI agent that can reason and understand images using gemini flash reasoning LLM. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

36,596 views • 1 year ago

Turn any AI Model into Reasoning Model with Deepseek r1 <thinking> Architecture. Models like GPT4o and Sonnet 3.5 are Implementation Models But a new breakthrough with Deepseek can make them a Reasoning model. Here's a step-by-step Explanation: 🧵

Turn any AI Model into Reasoning Model with Deepseek r1 <thinking> Architecture. Models like GPT4o and Sonnet 3.5 are Implementation Models But a new breakthrough with Deepseek can make them a Reasoning model. Here's a step-by-step Explanation: 🧵

CJ Zafir

192,667 views • 1 year ago

The most amazing aspect of DeepSeek Open Source is that just the Reasoning Engine can be isolated and used with any other LLM. In fact you can have a mixture of Reasoning Engines cascade around a problem and then use an Operator type agent AI to function on the the results.

The most amazing aspect of DeepSeek Open Source is that just the Reasoning Engine can be isolated and used with any other LLM. In fact you can have a mixture of Reasoning Engines cascade around a problem and then use an Operator type agent AI to function on the the results.

Brian Roemmele

398,786 views • 1 year ago

You can transform any document into LLM-ready formats using MegaParse. > PDF, Powerpoint, Word > Tables, TOC, Headers, Footers, Images Fully open-source python library.

You can transform any document into LLM-ready formats using MegaParse. > PDF, Powerpoint, Word > Tables, TOC, Headers, Footers, Images Fully open-source python library.

Lior Alexander

48,493 views • 1 year ago

Build agents that can actually do real-world tasks! Agent Reinforcement Trainer (ART) is a framework to train multi-step LLM agents for real-world tasks using GRPO. Just a few lines of code. No manual rewards needed. vLLM + Unsloth combined 🚀 100% open-source.

Build agents that can actually do real-world tasks! Agent Reinforcement Trainer (ART) is a framework to train multi-step LLM agents for real-world tasks using GRPO. Just a few lines of code. No manual rewards needed. vLLM + Unsloth combined 🚀 100% open-source.

Akshay 🚀

38,297 views • 5 months ago

AI is nothing without control.. this new Banana Placement by Higgsfield can add multiple products into any film scene.. use any actor and make them talk, outfit, product, and even add vfx step by step tutorial:

AI is nothing without control.. this new Banana Placement by Higgsfield can add multiple products into any film scene.. use any actor and make them talk, outfit, product, and even add vfx step by step tutorial:

el.cine

153,173 views • 10 months ago

MCP meets Ollama! In this video you'll learn how build a 100% local MCP client that you can connect to any MCP server. 100% open-source code, step-by-step guide:

MCP meets Ollama! In this video you'll learn how build a 100% local MCP client that you can connect to any MCP server. 100% open-source code, step-by-step guide:

Akshay 🚀

92,848 views • 1 year ago

I built a Deepseek R1 RAG Reasoning Agent running locally on my computer. It's an Agentic RAG reasoning agent that can think, reason and fall back to web search if needed. 100% Opensource code with step-by-step tutorial.

I built a Deepseek R1 RAG Reasoning Agent running locally on my computer. It's an Agentic RAG reasoning agent that can think, reason and fall back to web search if needed. 100% Opensource code with step-by-step tutorial.

Shubham Saboo

124,490 views • 1 year ago

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 views • 1 year ago

Microsoft just released an impressive tool OmniParser V2 can turn any LLM into an agent capable of using a computer 🔥 You can enable GPT-4o, DeepSeek R1, Sonnet 3.5, Qwen... to understand what's on your screen and take actions. 100% free & open source

Microsoft just released an impressive tool OmniParser V2 can turn any LLM into an agent capable of using a computer 🔥 You can enable GPT-4o, DeepSeek R1, Sonnet 3.5, Qwen... to understand what's on your screen and take actions. 100% free & open source

Paul Couvert

460,004 views • 1 year ago

🚨xAI: GROK 3 ALLOWS YOU TO FOLLOW ITS REASONING STEP-BY-STEP "Part of Grok's advanced reasoning capabilities are these thinking traces that you can see. You can even go inside and actually read what Grok is thinking as it's going through the problem, as it's trying to solve it." Source: xAI Elon Musk

🚨xAI: GROK 3 ALLOWS YOU TO FOLLOW ITS REASONING STEP-BY-STEP "Part of Grok's advanced reasoning capabilities are these thinking traces that you can see. You can even go inside and actually read what Grok is thinking as it's going through the problem, as it's trying to solve it." Source: xAI Elon Musk

Mario Nawfal

88,040 views • 1 year ago

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

OpenAI

488,957 views • 11 months ago

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a secure gpu enclave, where inference stays fully confidential 🤯 links + code in comments👇

can you chat privately with a cloud llm—without sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a secure gpu enclave, where inference stays fully confidential 🤯 links + code in comments👇

Avanika Narayan

79,368 views • 1 year ago

Announcing Fire Enrich V2 - our open-source Clay-like data enrichment platform you can run with any LLM! Upload a CSV, spawn sub-agents using @Firecrawl_dev's /search and /scrape to find real-time company data, funding, or any data point. Fork the example today 👇

Announcing Fire Enrich V2 - our open-source Clay-like data enrichment platform you can run with any LLM! Upload a CSV, spawn sub-agents using @Firecrawl_dev's /search and /scrape to find real-time company data, funding, or any data point. Fork the example today 👇

Nicolas Camara

43,766 views • 9 months ago

I recorded a video on how you can build a complete RAG application without writing any code. I used Langflow, an open-source project that sits on top of Langchain. Langflow hides all the complexity and makes the process 10x easier. This is a step-by-step tutorial.

I recorded a video on how you can build a complete RAG application without writing any code. I used Langflow, an open-source project that sits on top of Langchain. Langflow hides all the complexity and makes the process 10x easier. This is a step-by-step tutorial.

Santiago

96,106 views • 2 years ago

Added Canvas to TypingMind 💥 Draft documents easier by collaborating with AI in a shared editor 👇 Works with any LLM models! (OpenAI/Anthropic/Gemini, or open source models).

Added Canvas to TypingMind 💥 Draft documents easier by collaborating with AI in a shared editor 👇 Works with any LLM models! (OpenAI/Anthropic/Gemini, or open source models).

Tony Dinh

13,923 views • 1 year ago