Loading video...

Video Failed to Load

Go Home

Learn to train an LLM with distributed data while ensuring privacy using federated learning in a new two-part short course, Intro to Federated Learning and Federated Fine-tuning of LLMs with Private Data, created with Flower and taught by Daniel J. Beutel and nic lane. Federated learning allows a single...

64,517 views • 1 year ago •via X (Twitter)

7 Comments

Nemesis's profile picture
Nemesis1 year ago

This course by @AndrewYNg, @daniel_janes, and @niclane7 on federated learning is a game-changer for those looking to leverage distributed data while ensuring privacy. The practical insights into privacy-enhancing technologies and fine-tuning LLMs across devices or organizations without central data sharing are crucial for modern AI applications. Excited to dive into the nuances of differential privacy and efficient bandwidth usage. A must-learn for anyone in AI and data science!

Rufus's profile picture
Rufus1 year ago

@flwrlabs @daniel_janes @niclane7 Nice one, I am looking forward to the training.

Vincent Valentine (CEO of UnOpen.ai)'s profile picture
Vincent Valentine (CEO of UnOpen.ai)1 year ago

@flwrlabs @daniel_janes @niclane7 Impressive approach to democratize AI training while safeguarding data privacy. How would federated fine-tuning impact model performance and generalization?

Andrew Suther's profile picture
Andrew Suther1 year ago

@flwrlabs @daniel_janes @niclane7 Hi Andrew, thank you for sharing amazing opportunities to learn! How do you prevent vulnerabilities like: 1. inference attacks - model updates patterns / statistical info, could be exploited to infer private data by comparing to public data sets. Is there foolproof methods?

Mohit Jain's profile picture
Mohit Jain1 year ago

@flwrlabs @daniel_janes @niclane7 Data privacy is really important! Federated learning seems promising

Garth Miblish's profile picture
Garth Miblish1 year ago

@flwrlabs @daniel_janes @niclane7 I've been disappointed with some of your courses, because the speakers are not good communicators, but people who are the CEO or CTO or Chief Scientist somewhere. Often they don't speak well or have thick accents. And I want ONE speaker, not 3.

OpenAgentic.com's profile picture
OpenAgentic.com1 year ago

@flwrlabs @daniel_janes @niclane7 Why send data to a central server when you can keep it in your own digital bubble?

Related Videos

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 views • 1 year ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,028 views • 3 years ago

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

Andrew Ng

125,146 views • 11 months ago