As I promised yesterday, I'll briefly explain LoRA training and share a workflow I made so you can do it quickly. First, let me answer a very common question: 'Why train LoRAs when we have such advanced models?' Even though we have incredibly advanced models now (like NBP), we... still can't always get them to do specific things we want. Simplest example: the spritesheet LoRA I made the other day. I generated 1000 images with Nano Banana and only 100 were what I wanted. The LoRA I trained using those 100 images gives me nearly 100% consistent results. Second point is cost and speed. With LoRA, we can cut costs by 4-5x. And while doing that, we're generating 4-5x faster. How many images do you need for a good LoRA? This depends on your LoRA's complexity. For example, when I training the spritesheet LoRA, even though I used 100 images, I didn't include buildings in the training data, so this LoRA doesn't work for buildings. So think about your LoRA's use cases and add examples for as many use cases as possible to improve quality. What are paired images and how to train LoRAs for image-editing? When training LoRAs for image editing on fal, we call each edit example paired images - one with _start suffix, one with _end suffix. For example, if you're training a background remove LoRA, the unedited original photo will be your '_start' image. The image with background removed will be the '_end' image. Simply put: images we want to edit or use as reference get _start, target images we want to achieve get '_end'. Important: save both images with the same name. Like image332_start.jpg and image332_end.jpg. This way the system knows which images pair together. What about training LoRAs for models with multiple image inputs? Same logic. We still use _start and _end suffixes, but with one difference. Since there are multiple input images, we can number them: _start, _start1, _start2. Example: start images, 1st image = Woman portrait (image35_start.jpg) 2nd image = Glasses photo (image35_start1.jpg) 3rd image = Hat photo (image35_start2.jpg) Output image = portrait of woman wearing glasses and hat (image35_end.jpg) Can we do more detailed captioning? Yes. Similarly, you can improve training quality by creating a txt file for each set with the caption inside. Example: create image35.txt and write: 'Recreate the image by putting the glasses from the second image and the hat from the third image on the woman in the first image.' What are Steps? How many should I use? What's Learning Rate? Steps determines how many times the model sees and processes your training data (your images). Each step, the model learns a bit more. But as steps increase, so does the risk of overfitting. So there's no real default. But for a simpler LoRA with 20 paired images, 1000 steps is ideal. Here's a metaphor for the Steps and Learning Rate relationship: Imagine you have a balloon. Our goal is to inflate it to the optimal size. Steps = How many times we blow into the balloon Learning rate = How hard we blow each time If we blow too softly, we need to blow many more times. If we blow too hard, we risk popping it quickly and can't reach optimal size. Of course training won't explode, but it won't work as intended because it wasn't trained optimally. Training's done, now what? Once training's complete, you'll have a safetensors file. Every model you train on fal has a LoRA inference endpoint. In that inference, add your safetensors file link to the LoRA url input, and you can use your LoRA. Thanks for the read! The workflow in the video: If I forgot anything, let me know in the replies.show more

Loading video...