Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

DeepLearning.AI

293,680 subscribers

19,192 просмотров • 2 лет назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 6

Фото профиля BhyluCom

BhyluCom2 лет назад

@googlecloud The 1st step towards AI is

Фото профиля Matt Braun

Matt Braun2 лет назад

@googlecloud Can't wait to get home and get started!

Фото профиля Lenin Falconí

Lenin Falconí2 лет назад

@googlecloud Link is down....I am really into checking it seems RL is a great place to mix computer science and control theory

Фото профиля Daoist

Daoist2 лет назад

@googlecloud Hello, we found your tweet very good, and there are many in-depth contents about AI. I want to get in touch with you and reach a cooperation.

Фото профиля Mariia Gonchar / Iddi de Mooij

Mariia Gonchar / Iddi de Mooij2 лет назад

@googlecloud The development of artificial intelligence and computer vision will go towards identifying scars on the faces.

Фото профиля Adam✌️☮

Adam✌️☮2 лет назад

@googlecloud Sounds good. Q. Is there any plans to revamp the deep learning specialisation? Almost overnight, a literal revolution has taken place in LLM and Gen AI. Or will the short courses remain the primary avenue to bridge that gap?

Похожие видео

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

Andrew Ng

205,542 просмотров • 2 лет назад

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 просмотров • 1 год назад

New course! Generative AI with Large Language Models, created with Amazon Web Services and hosted on Coursera. This course goes deep into the technical foundations of LLMs and how to use them. You can sign up here: You’ll work through the full life-cycle of a generative AI project, and learn specific techniques like RLHF; zero-shot, one-shot, and few-shot learning with LLMs; advanced prompting frameworks like ReAct; even fine-tuning LLMs, and gain hands-on practice with all of these techniques. Instructors Antje Barth Chris Fregly Shelbee Eigenbrode and Mike G Chambers all do incredible Generative AI work at AWS, and have supported many companies to build creative LLM applications. They bring tremendous practical LLM expertise to this course. I'm confident you’ll finish this course with a deeper understanding of how LLMs work, and how to use them. I hope you enjoy the course!

New course! Generative AI with Large Language Models, created with Amazon Web Services and hosted on Coursera. This course goes deep into the technical foundations of LLMs and how to use them. You can sign up here: You’ll work through the full life-cycle of a generative AI project, and learn specific techniques like RLHF; zero-shot, one-shot, and few-shot learning with LLMs; advanced prompting frameworks like ReAct; even fine-tuning LLMs, and gain hands-on practice with all of these techniques. Instructors Antje Barth Chris Fregly Shelbee Eigenbrode and Mike G Chambers all do incredible Generative AI work at AWS, and have supported many companies to build creative LLM applications. They bring tremendous practical LLM expertise to this course. I'm confident you’ll finish this course with a deeper understanding of how LLMs work, and how to use them. I hope you enjoy the course!

Andrew Ng

467,912 просмотров • 3 лет назад

New course 🚨 Learn to build AI apps that can process very long documents with the Jamba model in this course, built in partnership with AI21 Labs and taught by Chen Wang and Chen Almagor. Learn more and join for free:

New course 🚨 Learn to build AI apps that can process very long documents with the Jamba model in this course, built in partnership with AI21 Labs and taught by Chen Wang and Chen Almagor. Learn more and join for free:

DeepLearning.AI

10,882 просмотров • 1 год назад

A new short course, Claude Code: A Highly Agentic Coding Assistant, is live! Claude Code is currently one of the most capable coding assistants. It can explore your codebase, plan features, write tests, refactor code, and even collaborate across multiple sessions—with surprisingly minimal input. In this course, you’ll learn how to guide Claude Code effectively: from setting up context and memory to integrating with GitHub and MCP servers. You’ll use it to extend a RAG chatbot, refactor a Jupyter notebook for e-commerce data analysis, build a web app from a Figma design, and more. Taught by Elie Schoppik (Elie Schoppik) and built in collaboration with Anthropic, this course is a must for AI builders. 👉 Enroll now:

A new short course, Claude Code: A Highly Agentic Coding Assistant, is live! Claude Code is currently one of the most capable coding assistants. It can explore your codebase, plan features, write tests, refactor code, and even collaborate across multiple sessions—with surprisingly minimal input. In this course, you’ll learn how to guide Claude Code effectively: from setting up context and memory to integrating with GitHub and MCP servers. You’ll use it to extend a RAG chatbot, refactor a Jupyter notebook for e-commerce data analysis, build a web app from a Figma design, and more. Taught by Elie Schoppik (Elie Schoppik) and built in collaboration with Anthropic, this course is a must for AI builders. 👉 Enroll now:

DeepLearning.AI

32,513 просмотров • 11 месяцев назад

BYD just rolled out its new God’s Eye 5.0 update in China, which now builds on the end-to-end architecture and “reinforcement learning”. BYD says this software is now intended to feel more “human-like” than previous software versions.

BYD just rolled out its new God’s Eye 5.0 update in China, which now builds on the end-to-end architecture and “reinforcement learning”. BYD says this software is now intended to feel more “human-like” than previous software versions.

Nic Cruz Patane

36,622 просмотров • 5 месяцев назад

📢 New short course in collaboration with Google: Build and Train an LLM with JAX. In this course, you’ll implement and train a 20M-parameter MiniGPT-style language model from scratch using JAX, the open-source library behind Gemini. You’ll build the model architecture, load and preprocess training data, implement the training loop, save checkpoints, and generate text through a chat interface. Taught by Chris Achard, Developer Relations Engineer on Google’s TPU Software team. Enroll now:

📢 New short course in collaboration with Google: Build and Train an LLM with JAX. In this course, you’ll implement and train a 20M-parameter MiniGPT-style language model from scratch using JAX, the open-source library behind Gemini. You’ll build the model architecture, load and preprocess training data, implement the training loop, save checkpoints, and generate text through a chat interface. Taught by Chris Achard, Developer Relations Engineer on Google’s TPU Software team. Enroll now:

DeepLearning.AI

46,065 просмотров • 4 месяцев назад

Dr. James Orr’s eight-hour course: The Philosophy of Mind, is available now. In this course, James Orr traces how thinkers from ancient Greece to today have wrestled with the relationship between consciousness and the physical world. We explore key approaches—from Platonic dualism and Aristotelian hylomorphism to Cartesian dualism and 20th-century physicalist theories like behaviorism, identity theory, and functionalism—highlighting why reducing mind to matter remains a challenge. The course concludes by connecting these ideas to contemporary issues, including AI, mental health, and human enhancement, asking what it truly means to be human in a rapidly changing world.

Dr. James Orr’s eight-hour course: The Philosophy of Mind, is available now. In this course, James Orr traces how thinkers from ancient Greece to today have wrestled with the relationship between consciousness and the physical world. We explore key approaches—from Platonic dualism and Aristotelian hylomorphism to Cartesian dualism and 20th-century physicalist theories like behaviorism, identity theory, and functionalism—highlighting why reducing mind to matter remains a challenge. The course concludes by connecting these ideas to contemporary issues, including AI, mental health, and human enhancement, asking what it truly means to be human in a rapidly changing world.

Peterson Academy

41,541 просмотров • 6 месяцев назад

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Abhishek Gupta

10,803 просмотров • 1 год назад

We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time? Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control

We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time? Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Jacky Liang

86,680 просмотров • 2 лет назад

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

Andrew Ng

132,304 просмотров • 8 месяцев назад

Super excited to launch a new AI course! 🚀 Fine-Tuning & Reinforcement Learning for LLMs: Intro to Post-Training A collaboration between AMD 🤝 Andrew Ng’s DeepLearning.AI to give every developer the tools & compute to work with the same post-training techniques, used across today’s leading AI labs. 🎓 Learn for free → 🧵

Super excited to launch a new AI course! 🚀 Fine-Tuning & Reinforcement Learning for LLMs: Intro to Post-Training A collaboration between AMD 🤝 Andrew Ng’s DeepLearning.AI to give every developer the tools & compute to work with the same post-training techniques, used across today’s leading AI labs. 🎓 Learn for free → 🧵

Sharon Zhou

20,386 просмотров • 8 месяцев назад

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

Andrew Ng

341,204 просмотров • 1 год назад

David Silver says human data is AI's fossil fuel. "We mine it and burn it in our LLMs." It gave us a head start -- but it will run out. The sustainable path is reinforcement learning: systems that learn from experience, generate more, and keep learning.

David Silver says human data is AI's fossil fuel. "We mine it and burn it in our LLMs." It gave us a head start -- but it will run out. The sustainable path is reinforcement learning: systems that learn from experience, generate more, and keep learning.

vitrupo

20,191 просмотров • 1 год назад

Join Cobra Code (@cobracodedev) in their newest course and master the art of creating 2D/3D hybrid games in Unreal Engine 5! 💡 This FREE course is now live on ArtStation Learning. Get started now:

Join Cobra Code (@cobracodedev) in their newest course and master the art of creating 2D/3D hybrid games in Unreal Engine 5! 💡 This FREE course is now live on ArtStation Learning. Get started now:

ArtStation.com

26,704 просмотров • 1 год назад

Have you used this tool? Clip Studio Paint has a new Puppet Warp tool that let’s you pose your drawings with a rig! Interested in learning more about CSP? A CSP course is coming to Schoolism soon! Learn with Pernille in her course on Schoolism(dot)com Art by Pernille Ørum

Have you used this tool? Clip Studio Paint has a new Puppet Warp tool that let’s you pose your drawings with a rig! Interested in learning more about CSP? A CSP course is coming to Schoolism soon! Learn with Pernille in her course on Schoolism(dot)com Art by Pernille Ørum

Schoolism

13,962 просмотров • 1 год назад

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

I’m excited to announce a new course with DeepLearning.AI - Building Agentic RAG 💫 In this course, you’ll learn how to build a research assistant that can reason over multiple documents and answer complex questions. You’ll also learn how to step through the execution of the agent and steer it with human feedback. This represents a big step beyond any standard RAG pipeline, which is mostly good for simple questions over a small set of documents. Learn the layers first and then put them together: ✅ Routing ✅ Tool Use ✅ Multi-step reasoning with Memory ✅ Tool retrieval ✅ Debugging + user input Check it out!

Jerry Liu

76,293 просмотров • 2 лет назад

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

Andrew Ng

152,597 просмотров • 2 лет назад