Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

A new way to fine tune: Preference Fine-Tuning. Customize models by comparing responses, rather than fixed targets—ideal for subjective tasks where tone, style, and creativity matter.

OpenAI Developers

228,323 subscribers

199,296 views • 1 year ago •via X (Twitter)

Education News & Politics Science & Technology

Anya Rossi• Live Now

Private livecam show

11 Comments

Steven Heidel1 year ago

great work @andrwpng !

Hummingbird1 year ago

Choose who you want to see your message, by choosing who you want to share your message with! 🔥 Remember—we value all influencers, regardless of their follower count. Charging up influencers and KOLs of all sizes with a new way to leverage audiences is how we do things at Hummingbird. 🌐 Join us as we bring to life a marketing platform that lets your business utilize the power of pre-selected influencers perfectly suited to your target audience. 🚀 #Hummingbird #Web3Community #DigitalMarketing

Finance Wizard1 year ago

Unban Finance Wizard. You removed alot of top GPTs by accident two days ago and we all had to file appeals to get back up yet I was denied.

Vincent1 year ago

huge disappointment again. we waited for new smarter models like gemini 2.0 flash

AI Leaks and News1 year ago

Can’t wait to try it!

Itay Bachman1 year ago

when o1 for tier 4?

Callum1 year ago

This is great, will definitely be testing it over the next couple days

Rjay1 year ago

@OpenAI Making progress!

Renee Bush1 year ago

@OpenAI Amazing

2awake1 year ago

@OpenAI 🤝⚡️

OxyCoffin1 year ago

you should totally implement this in custom gpts and projects.

Related Videos

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Max Fu

40,435 views • 1 year ago

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source with 50k stars!

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source with 50k stars!

Avi Chawla

557,355 views • 1 year ago

Today we previewed Reinforcement Fine-Tuning, a new model customization technique that enables organizations to build expert models for specific, complex tasks in domains such as coding, scientific research, or finance.

Today we previewed Reinforcement Fine-Tuning, a new model customization technique that enables organizations to build expert models for specific, complex tasks in domains such as coding, scientific research, or finance.

OpenAI

1,072,512 views • 1 year ago

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source, 51k+ stars 🌟

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source, 51k+ stars 🌟

Akshay 🚀

60,163 views • 1 year ago

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,442 views • 1 year ago

IN: video fine-tuning support for AI at Meta's V-JEPA 2 in HF transformers 🔥 it comes with > fine-tuning notebook > four models fine-tuned on Diving48 and SSv2 dataset > FastRTC demo on V-JEPA2 SSv2 (see below) we're looking forward to see fine-tuned V-JEPA2 models on Hub ⏯️

IN: video fine-tuning support for AI at Meta's V-JEPA 2 in HF transformers 🔥 it comes with > fine-tuning notebook > four models fine-tuned on Diving48 and SSv2 dataset > FastRTC demo on V-JEPA2 SSv2 (see below) we're looking forward to see fine-tuned V-JEPA2 models on Hub ⏯️

merve

15,625 views • 1 year ago

Our course recommendation of the day is “Post-training of LLMs, ” where you’ll learn how to customize pre-trained language models using Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL). You'll learn when to use each method, how to curate training data, and implement them in code to shape model behavior effectively. Enroll here:

Our course recommendation of the day is “Post-training of LLMs, ” where you’ll learn how to customize pre-trained language models using Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL). You'll learn when to use each method, how to curate training data, and implement them in code to shape model behavior effectively. Enroll here:

DeepLearning.AI

29,369 views • 8 months ago

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take Accordance, which used RFT to fine-tune a model that’s SOTA for their tax and accounting purposes. And in supervised fine-tuning news: you can now fine-tune GPT-4.1 nano. Get even more from our fastest, cheapest model by training it specifically for your use-case.

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take Accordance, which used RFT to fine-tune a model that’s SOTA for their tax and accounting purposes. And in supervised fine-tuning news: you can now fine-tune GPT-4.1 nano. Get even more from our fastest, cheapest model by training it specifically for your use-case.

OpenAI Developers

663,651 views • 1 year ago

intensity, precision, conditioning and passion.. fine tuning the fine tuning 🤼‍♀️

intensity, precision, conditioning and passion.. fine tuning the fine tuning 🤼‍♀️

Nattie

55,627 views • 1 year ago

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

OpenLedger

61,662 views • 1 year ago

This is a research preview. It requires more prompt engineering than previous models - but the generations are breathtaking. We’ll continue fine-tuning to improve reliability and control.

This is a research preview. It requires more prompt engineering than previous models - but the generations are breathtaking. We’ll continue fine-tuning to improve reliability and control.

ElevenLabs

40,483 views • 1 year ago

Before Claude Fable 5 got banned, I turned all my fine-tuning research and experiments into a product: (A CLI for data generation to fine-tune AI models). Finetuner uses Codex 5.5 or Opus 4.8 as the orchestrator and Chinese models (DeepSeek v4 Pro, Kimi K2.7, MiMo v2.5, etc.) to generate dataset rows. This system is unique because these are not Python-script-paraphrased datasets. Each and every row is handcrafted by Chinese models based on the orchestrator model's instructions. Now anyone can generate datasets to fine-tune small language models (1B to 30B models). I achieved 10x lower costs and 5x better dataset quality. Releasing the product in a few days. (I'll open-source the skills.)

Before Claude Fable 5 got banned, I turned all my fine-tuning research and experiments into a product: (A CLI for data generation to fine-tune AI models). Finetuner uses Codex 5.5 or Opus 4.8 as the orchestrator and Chinese models (DeepSeek v4 Pro, Kimi K2.7, MiMo v2.5, etc.) to generate dataset rows. This system is unique because these are not Python-script-paraphrased datasets. Each and every row is handcrafted by Chinese models based on the orchestrator model's instructions. Now anyone can generate datasets to fine-tune small language models (1B to 30B models). I achieved 10x lower costs and 5x better dataset quality. Releasing the product in a few days. (I'll open-source the skills.)

CJ Zafir

38,383 views • 9 days ago

We’ve added a new, faster way to fine-tune belt tension on your Prusa CORE One. ✨ The new stroboscope-based Manual Belt Tuning helps you find the optimal setting for both belts quickly and precisely - resulting in smoother motion and cleaner prints. Available from firmware 6.4.0. 🦾 Try it out now in Settings → Manual Belt Tuning.

We’ve added a new, faster way to fine-tune belt tension on your Prusa CORE One. ✨ The new stroboscope-based Manual Belt Tuning helps you find the optimal setting for both belts quickly and precisely - resulting in smoother motion and cleaner prints. Available from firmware 6.4.0. 🦾 Try it out now in Settings → Manual Belt Tuning.

Prusa3D

29,789 views • 8 months ago

🍿 You can now fine-tune open-source video models. We wrote a guide that shows you how to fine-tune Tencent's HunyuanVideo using Kohya Tech's Musubi Tuner.

🍿 You can now fine-tune open-source video models. We wrote a guide that shows you how to fine-tune Tencent's HunyuanVideo using Kohya Tech's Musubi Tuner.

Replicate

20,168 views • 1 year ago

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

430,406 views • 3 months ago

A while back Benjie Holson described a set of "Robot Olympics" challenge tasks -- washing a pan, making a peanut butter sandwich, and more. We tried to fine-tune our models at PI to these tasks, and found that we could do most of them. A few highlights below.

A while back Benjie Holson described a set of "Robot Olympics" challenge tasks -- washing a pan, making a peanut butter sandwich, and more. We tried to fine-tune our models at PI to these tasks, and found that we could do most of them. A few highlights below.

Sergey Levine

81,265 views • 6 months ago

3/4 Devin can train and fine tune its own AI models.

3/4 Devin can train and fine tune its own AI models.

Cognition

685,735 views • 2 years ago

Unsloth AI and NVIDIA are Revolutionizing Local LLM Fine-Tuning: From RTX Desktops to DGX Spark Fine-tune popular AI models faster with Unsloth on NVIDIA RTX AI PCs such as GeForce RTX desktops and laptops to RTX PRO workstations and the new DGX Spark to build personalized assistants for coding, creative work, and complex agentic workflows. The landscape of modern AI is shifting. We are moving away from a total reliance on massive, generalized cloud models and entering the era of local, agentic AI. Whether it is tuning a chatbot to handle hyper-specific product support or building a personal assistant that manages intricate schedules, the potential for generative AI on local hardware is boundless. However, developers face a persistent bottleneck: How do you get a Small Language Model (SLM) to punch above its weight class and respond with high accuracy for specialized tasks? The answer is Fine-Tuning, and the tool of choice is Unsloth. Unsloth provides an easy and high-speed method to customize models. Optimized for efficient, low-memory training on NVIDIA GPUs, Unsloth scales effortlessly from GeForce RTX desktops and laptop all the way to the DGX Spark, the world’s smallest AI supercomputer...... Full analysis: NVIDIA NVIDIA AI NVIDIA AIDev NVIDIAnewsroom Unsloth AI Unsloth

Unsloth AI and NVIDIA are Revolutionizing Local LLM Fine-Tuning: From RTX Desktops to DGX Spark Fine-tune popular AI models faster with Unsloth on NVIDIA RTX AI PCs such as GeForce RTX desktops and laptops to RTX PRO workstations and the new DGX Spark to build personalized assistants for coding, creative work, and complex agentic workflows. The landscape of modern AI is shifting. We are moving away from a total reliance on massive, generalized cloud models and entering the era of local, agentic AI. Whether it is tuning a chatbot to handle hyper-specific product support or building a personal assistant that manages intricate schedules, the potential for generative AI on local hardware is boundless. However, developers face a persistent bottleneck: How do you get a Small Language Model (SLM) to punch above its weight class and respond with high accuracy for specialized tasks? The answer is Fine-Tuning, and the tool of choice is Unsloth. Unsloth provides an easy and high-speed method to customize models. Optimized for efficient, low-memory training on NVIDIA GPUs, Unsloth scales effortlessly from GeForce RTX desktops and laptop all the way to the DGX Spark, the world’s smallest AI supercomputer...... Full analysis: NVIDIA NVIDIA AI NVIDIA AIDev NVIDIAnewsroom Unsloth AI Unsloth

Marktechpost AI Dev News ⚡

31,551 views • 6 months ago

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 views • 1 year ago

LLM post-training used to mean fine-tuning to a downstream task Robotics has been stuck in this setting, needing task-specific fine-tuning for best performance π07 changes this: It works out of the box & outperforms fine-tuned specialists Details:

LLM post-training used to mean fine-tuning to a downstream task Robotics has been stuck in this setting, needing task-specific fine-tuning for best performance π07 changes this: It works out of the box & outperforms fine-tuned specialists Details:

Chelsea Finn

61,139 views • 2 months ago