Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

LLM post-training used to mean fine-tuning to a downstream task Robotics has been stuck in this setting, needing task-specific fine-tuning for best performance π07 changes this: It works out of the box & outperforms fine-tuned specialists Details:

Chelsea Finn

96,660 subscribers

61,139 görüntüleme • 2 ay önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

We pushed MolmoAct 2 into a completely out-of-distribution studio setting to test its robustness under extreme environmental changes. Despite only 10 minutes of task-specific fine-tuning data collected elsewhere, it adapted surprisingly well to the new environment.

We pushed MolmoAct 2 into a completely out-of-distribution studio setting to test its robustness under extreme environmental changes. Despite only 10 minutes of task-specific fine-tuning data collected elsewhere, it adapted surprisingly well to the new environment.

Jiafei Duan

11,381 görüntüleme • 2 ay önce

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take Accordance, which used RFT to fine-tune a model that’s SOTA for their tax and accounting purposes. And in supervised fine-tuning news: you can now fine-tune GPT-4.1 nano. Get even more from our fastest, cheapest model by training it specifically for your use-case.

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take Accordance, which used RFT to fine-tune a model that’s SOTA for their tax and accounting purposes. And in supervised fine-tuning news: you can now fine-tune GPT-4.1 nano. Get even more from our fastest, cheapest model by training it specifically for your use-case.

OpenAI Developers

663,794 görüntüleme • 1 yıl önce

New short course on LLMOps! LLMOps (large language model operations) is a rapidly developing field that takes ideas from MLOps (machine learning operations) and specializes them to building and deploying LLM-based applications. In this course, taught by Google Cloud's Erwin Huizenga, you'll learn to use automation to make building, tuning and deploying an LLM-based application less manual and more efficient. You'll learn how to: - Apply supervised fine-tuning to tune an LLM to a specific task - Automate and orchestrate LLM-tuning and deployment by customizing a pre-built tuning pipeline - Apply best practices for preparing training data for supervised fine-tuning of an LLM - Create an LLMOps workflow you can adapt to other LLM-tuning jobs This course doesn't assume any prior MLOps or LLMOps experience. Sign up here to learn about this emerging field!

New short course on LLMOps! LLMOps (large language model operations) is a rapidly developing field that takes ideas from MLOps (machine learning operations) and specializes them to building and deploying LLM-based applications. In this course, taught by Google Cloud's Erwin Huizenga, you'll learn to use automation to make building, tuning and deploying an LLM-based application less manual and more efficient. You'll learn how to: - Apply supervised fine-tuning to tune an LLM to a specific task - Automate and orchestrate LLM-tuning and deployment by customizing a pre-built tuning pipeline - Apply best practices for preparing training data for supervised fine-tuning of an LLM - Create an LLMOps workflow you can adapt to other LLM-tuning jobs This course doesn't assume any prior MLOps or LLMOps experience. Sign up here to learn about this emerging field!

Andrew Ng

221,787 görüntüleme • 2 yıl önce

Fine-tuning the details.

Fine-tuning the details.

Arsenal Women

195,677 görüntüleme • 6 ay önce

Introducing PorTAL: Portable Task Adapters for LLMs. A novel recipe to cheaply port fine-tuning between models. It matches per task LoRA accuracy at half the cost, lowering the switching overhead of adapting tasks across LLMs. At Ramp, every new model release used to mean retraining our fine-tunes from scratch. PorTAL learns the task once, then efficiently refits it onto any new base model, even across model families.

Introducing PorTAL: Portable Task Adapters for LLMs. A novel recipe to cheaply port fine-tuning between models. It matches per task LoRA accuracy at half the cost, lowering the switching overhead of adapting tasks across LLMs. At Ramp, every new model release used to mean retraining our fine-tunes from scratch. PorTAL learns the task once, then efficiently refits it onto any new base model, even across model families.

Ramp Labs

147,836 görüntüleme • 4 gün önce

Fine-tuning details for the show in Chile #FiestaGrado3

Fine-tuning details for the show in Chile #FiestaGrado3

A*Teens

29,925 görüntüleme • 1 yıl önce

1/ Au revoir, RLVR. New work: EBFT (Energy-Based Fine-Tuning), a post-training method that directly optimizes the long-horizon behavior of model generations, addressing SFT’s deployment-time error amplification without relying on sparse, task-specific rewards.

1/ Au revoir, RLVR. New work: EBFT (Energy-Based Fine-Tuning), a post-training method that directly optimizes the long-horizon behavior of model generations, addressing SFT’s deployment-time error amplification without relying on sparse, task-specific rewards.

Sham Kakade

266,585 görüntüleme • 3 ay önce

intensity, precision, conditioning and passion.. fine tuning the fine tuning 🤼‍♀️

intensity, precision, conditioning and passion.. fine tuning the fine tuning 🤼‍♀️

Nattie

55,627 görüntüleme • 1 yıl önce

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

Andrew Ng

502,781 görüntüleme • 2 yıl önce

IN: video fine-tuning support for AI at Meta's V-JEPA 2 in HF transformers 🔥 it comes with > fine-tuning notebook > four models fine-tuned on Diving48 and SSv2 dataset > FastRTC demo on V-JEPA2 SSv2 (see below) we're looking forward to see fine-tuned V-JEPA2 models on Hub ⏯️

IN: video fine-tuning support for AI at Meta's V-JEPA 2 in HF transformers 🔥 it comes with > fine-tuning notebook > four models fine-tuned on Diving48 and SSv2 dataset > FastRTC demo on V-JEPA2 SSv2 (see below) we're looking forward to see fine-tuned V-JEPA2 models on Hub ⏯️

merve

15,625 görüntüleme • 1 yıl önce

MolmoAct 2 builds on MolmoAct, our first Action Reasoning Model (ARM). Like its predecessor, MolmoAct 2 reasons about the world in 3D before taking actions. It now runs up to 37x faster & handles two-armed tasks out of the box without per-task fine-tuning.

MolmoAct 2 builds on MolmoAct, our first Action Reasoning Model (ARM). Like its predecessor, MolmoAct 2 reasons about the world in 3D before taking actions. It now runs up to 37x faster & handles two-armed tasks out of the box without per-task fine-tuning.

Ai2

25,563 görüntüleme • 2 ay önce

HRM-Text 101 is here. This tutorial takes you from zero to one: from setup to fine-tuning to evaluation. Download the base checkpoint. Fine-tune it on a real task. Evaluate the results. End to end, on a single GPU. Watch the tutorial and start building with HRM-Text.

HRM-Text 101 is here. This tutorial takes you from zero to one: from setup to fine-tuning to evaluation. Download the base checkpoint. Fine-tune it on a real task. Evaluate the results. End to end, on a single GPU. Watch the tutorial and start building with HRM-Text.

Sapient Intelligence

187,377 görüntüleme • 1 ay önce

just fine tuning a few things and sending this out today

just fine tuning a few things and sending this out today

A$AP BERG

11,462 görüntüleme • 1 yıl önce

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

Andrew Ng

132,304 görüntüleme • 8 ay önce

6/ ASCII plays DOOM Fine-tuning Mistral 7B to play DOOM based on ASCII frame representations. Yes, it actually works. Umut Hope YILDIRIM Sammy Aťman Paul Chu 🥇First place fine tuning track

6/ ASCII plays DOOM Fine-tuning Mistral 7B to play DOOM based on ASCII frame representations. Yes, it actually works. Umut Hope YILDIRIM Sammy Aťman Paul Chu 🥇First place fine tuning track

Alex Reibman 🖇️

18,717 görüntüleme • 2 yıl önce

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 görüntüleme • 1 yıl önce

Fine tuning

Fine tuning

Detroit Lions

32,730 görüntüleme • 8 ay önce

Our Hyperbolic agent can now perform fine-tuning tasks! This is a step forward for our self-evolving agent vision. Kudos to Zile from Blockchain Capital! This is how it works: 1. sync files to remote machine 2. install relevant dependencies 3. run an initial Unsloth AI fine-tune task 4. do an inference call to test This is the beauty of open source. Everyone can contribute towards a shared vision. Accelerate 🚀

Our Hyperbolic agent can now perform fine-tuning tasks! This is a step forward for our self-evolving agent vision. Kudos to Zile from Blockchain Capital! This is how it works: 1. sync files to remote machine 2. install relevant dependencies 3. run an initial Unsloth AI fine-tune task 4. do an inference call to test This is the beauty of open source. Everyone can contribute towards a shared vision. Accelerate 🚀

Jasper

20,693 görüntüleme • 1 yıl önce

This Advanced Animation tuning section is seriously wild. You could spend hours in here fine tuning things if you wanted. Can't wait for others to figure out the best settings to make OneUI feel as buttery smooth as possible!

This Advanced Animation tuning section is seriously wild. You could spend hours in here fine tuning things if you wanted. Can't wait for others to figure out the best settings to make OneUI feel as buttery smooth as possible!

Sam Beckman

23,461 görüntüleme • 1 yıl önce

Fine tuning‼️

Fine tuning‼️

Bozeman Quarterbacks

38,839 görüntüleme • 1 yıl önce