Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

New short course on LLMOps! LLMOps (large language model operations) is a rapidly developing field that takes ideas from MLOps (machine learning operations) and specializes them to building and deploying LLM-based applications. In this course, taught by Google Cloud's Erwin Huizenga, you'll learn to use automation to make building,...

221,760 görüntüleme • 2 yıl önce •via X (Twitter)

10 Yorum

jeffw profil fotoğrafı
jeffw2 yıl önce

@googlecloud Wow, amazing! But Andrew how are you producing these courses even faster than I can consume them (still working on the last RAG one!) My weekends and evening are filling up quick! 🤓

Artur profil fotoğrafı
Artur2 yıl önce

@googlecloud LLMOps is the new new devops

Lyne Tchapmi profil fotoğrafı
Lyne Tchapmi2 yıl önce

@googlecloud Highly needed. Can’t wait to go through this!

Artur profil fotoğrafı
Artur2 yıl önce

@googlecloud Superb! Got to check it out for sure

tetteh profil fotoğrafı
tetteh2 yıl önce

@googlecloud Finally 😎😎… I still won’t finish but

Amer Amayreh profil fotoğrafı
Amer Amayreh2 yıl önce

@googlecloud Short courses are great approach to get in technical topics. Thanks

Key profil fotoğrafı
Key2 yıl önce

@googlecloud Fantastic news, @AndrewYNg! LLMOps is an exciting field that brings together the best of MLOps and LLM-based applications, and I'm eager to learn from @googlecloud's Erwin in this new course. Thanks for sharing!

Jay Ghiya profil fotoğrafı
Jay Ghiya2 yıl önce

@googlecloud Does this course use open source self hosted frameworks or it is just built on top of Google cloud services @AndrewYNg

Mostakim Jihad profil fotoğrafı
Mostakim Jihad2 yıl önce

@googlecloud Excited to watch

Peter Retief profil fotoğrafı
Peter Retief2 yıl önce

@googlecloud I am exhausted already, so much to know!!

Benzer Videolar

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 görüntüleme • 1 yıl önce

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

Andrew Ng

125,146 görüntüleme • 11 ay önce

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

Andrew Ng

126,355 görüntüleme • 1 yıl önce

New short course: Safe and Reliable AI via Guardrails! Learn to create production-ready, reliable LLM applications with guardrails in this new course, built in collaboration with Guardrails AI and taught by its CEO and co-founder, shreya rajpal. I see many companies worry about the reliability of LLM-based systems -- will they hallucinate a catastrophically bad response? -- which slows down investing in building them and transitioning prototypes to deployment. That LLMs generate probabilistic outputs has made them particularly hard to deploy in highly regulated industries or in safety-critical environments. Fortunately, there are good guardrail tools that give a significant new layer of control and reliability/safety. They act as a protective framework that can prevent your application from revealing incorrect, irrelevant, or confidential information, and they are an important part of what it takes to actually get prototypes to deployment. This course will walk you through common failure modes of LLM-powered applications (like hallucinations or revealing personally identifiable information). It will show you how to build guardrails from scratch to mitigate them. You’ll also learn how to access a variety of pre-built guardrails on the GuardrailsAI hub that are ready to integrate into your projects. You'll implement these guardrails in the context of a RAG-powered customer service chatbot for a small pizzeria. Specifically, you'll: - Explore common failure modes like hallucinations, going off-topic, revealing sensitive information, or responses that can harm the pizzeria's reputation. - Learn to mitigate these failure modes with input and output guards that check inputs and/or outputs - Create a guardrail to prevent the chatbot from discussing sensitive topics, such as a confidential project at the pizza shop - Detect hallucinations by ensuring responses are grounded in trusted documents - Add a Personal Identifiable Information (PII) guardrail to detect and redact sensitive information in user prompts and in LLM outputs - Set up a guardrail to limit the chatbot’s responses to topics relevant to the pizza shop, keeping interactions on-topic - Configure a guardrail that prevents your chatbot from mentioning any competitors using a name detection pipeline consisting of conditional logic that routes to an exact match or a threshold check with named entity recognition Guardrails are an important part of the practical building and deployment of LLM-based applications today. This course will show you how to make your applications more reliable and more ready for real-world deployment. Please sign up here:

Andrew Ng

106,642 görüntüleme • 1 yıl önce