Loading video...

Video Failed to Load

Go Home

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

19,192 views • 2 years ago •via X (Twitter)

6 Comments

BhyluCom's profile picture
BhyluCom2 years ago

@googlecloud The 1st step towards AI is

Matt Braun's profile picture
Matt Braun2 years ago

@googlecloud Can't wait to get home and get started!

Lenin Falconí's profile picture
Lenin Falconí2 years ago

@googlecloud Link is down....I am really into checking it seems RL is a great place to mix computer science and control theory

Daoist's profile picture
Daoist2 years ago

@googlecloud Hello, we found your tweet very good, and there are many in-depth contents about AI. I want to get in touch with you and reach a cooperation.

Mariia Gonchar / Iddi de Mooij's profile picture
Mariia Gonchar / Iddi de Mooij2 years ago

@googlecloud The development of artificial intelligence and computer vision will go towards identifying scars on the faces.

Adam✌️☮'s profile picture
Adam✌️☮2 years ago

@googlecloud Sounds good. Q. Is there any plans to revamp the deep learning specialisation? Almost overnight, a literal revolution has taken place in LLM and Gen AI. Or will the short courses remain the primary avenue to bridge that gap?

Related Videos

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 views • 1 year ago

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

Andrew Ng

340,724 views • 1 year ago