Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code... to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!show more

Santiago

433,890 subscribers

84,747 views • 1 year ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Comments

Santiago1 year ago

Here is the link to the code:

Santiago1 year ago

And here is the YouTube link: How to fine-tune a model using LoRA (step by step)

Hesam1 year ago

This is actually the genius idea that is available by LoRA. And Apple used it for the best. 👏

Brijesh Sankhavara1 year ago

well i know the LoRA,... but didnt know apple is using it...

The Monk Dev1 year ago

Great video and great explanation. Apple has been doing clever things since its inception, doesn't matter whether we like it or not.

deter31 year ago

This method has been mentioned long time ago by lora authors already .

Santiago1 year ago

Yes, this is LoRA. That’s what the video is about.

J. Walu, OGW1 year ago

Great session @Mutuvi @teacherkaris

Abdel Latrache1 year ago

Great explanation of LoRa! Thank you for the video!

Jimi V. (Bitswired)1 year ago

Interesting thanks! This approach is brilliant for edge applications. The cool thing is that you can ship new applications only coming up with new adapters (light updates). And less frequent foundational model heavy updates. LoRA truly was a smart invention!

Related Videos

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

Santiago

324,578 views • 2 years ago

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 views • 1 year ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 views • 3 years ago

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

Santiago

44,783 views • 1 year ago

Our universe is a model with twenty or so carefully fine-tuned parameters that generate all the content inside. Using these parameters and a reduced-scale model you can simulate the history of the cosmos. With the full-scale model, you get to be part of the simulation.

Our universe is a model with twenty or so carefully fine-tuned parameters that generate all the content inside. Using these parameters and a reduced-scale model you can simulate the history of the cosmos. With the full-scale model, you get to be part of the simulation.

Andrew Côté

90,707 views • 2 years ago

This is from Apple's State of the Union The local model is a 3B parameter SLM that uses adapters trained for each specific feature. Diffusion model does the same thing, adapter for each style. Anything running locally or Apple's Secure Cloud is an Apple model, not OpenAI.

This is from Apple's State of the Union The local model is a 3B parameter SLM that uses adapters trained for each specific feature. Diffusion model does the same thing, adapter for each style. Anything running locally or Apple's Secure Cloud is an Apple model, not OpenAI.

Max Weinbach

2,648,076 views • 2 years ago

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

AK

40,467 views • 1 year ago

You can now try Llama 3.1 405B for free (link below)! This is the largest open-source model out there, and for the first time, an open model is competitive with closed models. This time around, Meta did something new: Llama 3.1 has a license that allows developers to use it to enhance other models. For the first time, you can distill Llama 3.1 405B's capabilities into a smaller, more practical model for your use case. First, here is the link where you can play with Llama 3.1 for free: The model is hosted in Tune Studio, an end-to-end platform for developing applications using Large Language Models. They are sponsoring this post. Take a look at the attached video. It will show you how you can fine-tune a simple model using Llama 3.1 without leaving the platform: 1. You can create an empty dataset 2. Use the playground to generate and record interactions with Llama 3.1 3. Modify the dataset directly using the playground 4. Export the data and fine-tune a smaller model Fast and easy! As long as you have a web browser, you can start experimenting with fine-tuning and Llama 3.1. That's all it takes!

You can now try Llama 3.1 405B for free (link below)! This is the largest open-source model out there, and for the first time, an open model is competitive with closed models. This time around, Meta did something new: Llama 3.1 has a license that allows developers to use it to enhance other models. For the first time, you can distill Llama 3.1 405B's capabilities into a smaller, more practical model for your use case. First, here is the link where you can play with Llama 3.1 for free: The model is hosted in Tune Studio, an end-to-end platform for developing applications using Large Language Models. They are sponsoring this post. Take a look at the attached video. It will show you how you can fine-tune a simple model using Llama 3.1 without leaving the platform: 1. You can create an empty dataset 2. Use the playground to generate and record interactions with Llama 3.1 3. Modify the dataset directly using the playground 4. Export the data and fine-tune a smaller model Fast and easy! As long as you have a web browser, you can start experimenting with fine-tuning and Llama 3.1. That's all it takes!

Santiago

55,609 views • 1 year ago

$QVAC SDK will support in 0.9.0 (gonna be release in ~10 days) LoRA fine-tuning directly on-device, letting developers customize LLMs with their own data without sending anything to the cloud. You just load a base model, point it at your training dataset, and get a lightweight LoRA adapter back — all running locally. The fine-tuned model can then be used for inference immediately, with no extra setup. Why it matters: LoRA (Low-Rank Adaptation) fine-tuning lets you specialize a general-purpose language model for your specific use case — like matching a brand's tone, mastering domain terminology, or following a particular output format — using a fraction of the compute a full fine-tune would require. QVAC handles the entire workflow locally: dataset preparation, training with configurable hyperparameters, checkpoint saving, and seamless inference with the resulting adapter. Your data never leaves the device. The developer experience: Fine-tuning with QVAC is as simple as calling "sdk.finetune()" with your dataset and a few hyperparameters. Training runs entirely on your local hardware, produces a compact LoRA adapter file, and supports pause/resume so you can stop a job and pick it back up without losing progress. The result plugs straight into QVAC's inference pipeline — no model conversion, no deployment step, just immediate local completions with your fine-tuned model.$

QVAC SDK will support in 0.9.0 (gonna be release in ~10 days) LoRA fine-tuning directly on-device, letting developers customize LLMs with their own data without sending anything to the cloud. You just load a base model, point it at your training dataset, and get a lightweight LoRA adapter back — all running locally. The fine-tuned model can then be used for inference immediately, with no extra setup. Why it matters: LoRA (Low-Rank Adaptation) fine-tuning lets you specialize a general-purpose language model for your specific use case — like matching a brand's tone, mastering domain terminology, or following a particular output format — using a fraction of the compute a full fine-tune would require. QVAC handles the entire workflow locally: dataset preparation, training with configurable hyperparameters, checkpoint saving, and seamless inference with the resulting adapter. Your data never leaves the device. The developer experience: Fine-tuning with QVAC is as simple as calling "sdk.finetune()" with your dataset and a few hyperparameters. Training runs entirely on your local hardware, produces a compact LoRA adapter file, and supports pause/resume so you can stop a job and pick it back up without losing progress. The result plugs straight into QVAC's inference pipeline — no model conversion, no deployment step, just immediate local completions with your fine-tuned model.

Paolo Ardoino 🤖

42,271 views • 2 months ago

How to match the complexity of the problem you want to solve with the proper model. You want an inference router. In the video, I show you how simple and powerful this is. After this, you'll never talk directly to a model ever again.

How to match the complexity of the problem you want to solve with the proper model. You want an inference router. In the video, I show you how simple and powerful this is. After this, you'll never talk directly to a model ever again.

Santiago

11,861 views • 26 days ago

Many people are flat wrong about DeepSeek. You might think everyone freaked out about DeepSeek because the model is really good—which it is—or because it was from China, which is also true. But there's a more subtle reason: DeepSeek showed they could achieve those results using reinforcement learning instead of supervised fine-tuning. In other words, they didn't need an expensive phase where a bunch of people tuned the model by answering questions. They were able to do all of this faster and cheaper using reinforcement fine-tuning. This is a huge deal! I recorded the attached video with the following goals in mind: 1. Explain how reinforcement fine-tuning works 2. Show you how you can fine-tune your own models 3. Walk you through a complete code example

Many people are flat wrong about DeepSeek. You might think everyone freaked out about DeepSeek because the model is really good—which it is—or because it was from China, which is also true. But there's a more subtle reason: DeepSeek showed they could achieve those results using reinforcement learning instead of supervised fine-tuning. In other words, they didn't need an expensive phase where a bunch of people tuned the model by answering questions. They were able to do all of this faster and cheaper using reinforcement fine-tuning. This is a huge deal! I recorded the attached video with the following goals in mind: 1. Explain how reinforcement fine-tuning works 2. Show you how you can fine-tune your own models 3. Walk you through a complete code example

Santiago

105,545 views • 1 year ago

Tune Studio is an end-to-end platform for developing applications using Large Language Models. So far, I haven't seen any other platform like this one. You can do everything here: 1. You can curate your data. 2. Use the playground to play with different models and try your ideas. 3. Fine-tune an open-source model on your data. 4. Deploy the model when you are done. This is awesome for anyone building generative AI applications. You can use Tune Studio to work with any of the open-source models out there. They were one of the few companies to host Llama 2 and Llama 3 before anyone else. Here is a link to check it out: One of their main selling points is that Tune Studio scales! You don't have to worry about serving your model to lots of users. They also have built-in user management, authentication, on-prem support, user context management, and pretty much everything you need to build generative AI applications. Thanks to the Tune team for collaborating with me on this post. We are living through the best years of development tools for AI developers. The field is unstoppable.

Tune Studio is an end-to-end platform for developing applications using Large Language Models. So far, I haven't seen any other platform like this one. You can do everything here: 1. You can curate your data. 2. Use the playground to play with different models and try your ideas. 3. Fine-tune an open-source model on your data. 4. Deploy the model when you are done. This is awesome for anyone building generative AI applications. You can use Tune Studio to work with any of the open-source models out there. They were one of the few companies to host Llama 2 and Llama 3 before anyone else. Here is a link to check it out: One of their main selling points is that Tune Studio scales! You don't have to worry about serving your model to lots of users. They also have built-in user management, authentication, on-prem support, user context management, and pretty much everything you need to build generative AI applications. Thanks to the Tune team for collaborating with me on this post. We are living through the best years of development tools for AI developers. The field is unstoppable.

Santiago

39,101 views • 2 years ago

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

OpenLedger

61,662 views • 1 year ago

NVIDIA has published a paper on DREAMGEN – a powerful 4-step pipeline for generating synthetic data for humanoids that enables task and environment generalization. - Step 1: Fine-tune a video generation model using a small number of human teleoperation videos - Step 2: Prompt the fine-tuned model to turn a single real image into new AI-imagined videos - Step 3: Automatically label actions in the generated videos - Step 4: Train a robot AI model with the labeled synthetic dataset This enabled humanoid robots to perform 22 novel behaviors – such as pouring, opening/closing articulated objects, and manipulating a variety of tools. The original teleoperation dataset only included pick-and-place tasks. This takes task extensibility to another level without requiring human teleoperation for every single task. The pipeline will be made open-source soon. Project page:

NVIDIA has published a paper on DREAMGEN – a powerful 4-step pipeline for generating synthetic data for humanoids that enables task and environment generalization. - Step 1: Fine-tune a video generation model using a small number of human teleoperation videos - Step 2: Prompt the fine-tuned model to turn a single real image into new AI-imagined videos - Step 3: Automatically label actions in the generated videos - Step 4: Train a robot AI model with the labeled synthetic dataset This enabled humanoid robots to perform 22 novel behaviors – such as pouring, opening/closing articulated objects, and manipulating a variety of tools. The original teleoperation dataset only included pick-and-place tasks. This takes task extensibility to another level without requiring human teleoperation for every single task. The pipeline will be made open-source soon. Project page:

The Humanoid Hub

12,074 views • 1 year ago

The best *code embedding* model in the market right now was just released: Qodo-Embed-1 — There are two flavors: A lite model with 1.5B parameters and a medium model with 7B parameters (Hugging Face links below). If you want to index a large codebase (supports 10M+ lines of code), this is the model you want. 1. Index your repositories 2. Ask anything (including test and code generation) The models are optimized to answer natural language questions or code-to-code questions. The video here shows the model indexing 90 repositories (!!!!!) and letting the user ask questions about them. The simplest way to use the model is through the Qodo Gen AI extension in Visual Studio Code, Cursor, or JetBrains (see link below).

The best code embedding model in the market right now was just released: Qodo-Embed-1 — There are two flavors: A lite model with 1.5B parameters and a medium model with 7B parameters (Hugging Face links below). If you want to index a large codebase (supports 10M+ lines of code), this is the model you want. 1. Index your repositories 2. Ask anything (including test and code generation) The models are optimized to answer natural language questions or code-to-code questions. The video here shows the model indexing 90 repositories (!!!!!) and letting the user ask questions about them. The simplest way to use the model is through the Qodo Gen AI extension in Visual Studio Code, Cursor, or JetBrains (see link below).

Santiago

56,584 views • 1 year ago

Choose a model (any model) and build your application with it. Do not spend time swapping models early on. Do not try to optimize before you have a working system. This is one of the first recommendations I make to every new team I consult with. Eventually, it will be time to optimize the model. • You may need a cheaper model • You may need a faster model • You might need a smarter model Good luck if you stitched together 12 different APIs and SDKs from 7 different vendors. Over half of the companies I consult for run on Microsoft software and have access to Microsoft Foundry. Microsoft Foundry is a complete agentic ecosystem. If you're in that world and building AI applications, Microsoft Foundry is where everything lives: • Models (largest selection in the market) • Agentic SDK (Python, C#, JavaScript/TypeScript) • Tools • Evaluations • Monitoring They are fully integrated with GitHub and Visual Studio Code. The best part: Their agentic platform is fully agnostic of the models you use. You can integrate with any model using the same OpenAI-style API. Swapping one model for another takes 1 second.

Choose a model (any model) and build your application with it. Do not spend time swapping models early on. Do not try to optimize before you have a working system. This is one of the first recommendations I make to every new team I consult with. Eventually, it will be time to optimize the model. • You may need a cheaper model • You may need a faster model • You might need a smarter model Good luck if you stitched together 12 different APIs and SDKs from 7 different vendors. Over half of the companies I consult for run on Microsoft software and have access to Microsoft Foundry. Microsoft Foundry is a complete agentic ecosystem. If you're in that world and building AI applications, Microsoft Foundry is where everything lives: • Models (largest selection in the market) • Agentic SDK (Python, C#, JavaScript/TypeScript) • Tools • Evaluations • Monitoring They are fully integrated with GitHub and Visual Studio Code. The best part: Their agentic platform is fully agnostic of the models you use. You can integrate with any model using the same OpenAI-style API. Swapping one model for another takes 1 second.

Santiago

12,014 views • 4 months ago

I made this $700,000,000 music video in an afternoon on part of my weed budget. No, this is not 100% AI. I wrote the song, Through The Tempest, and recorded the samples which were then layered and mixed into the song. I curated the custom styling by running thousands of images of my own art to make a custom model. That model was then iterated dozens of times to fine tune details into new models. Then from the final model I ran dozens of images to animate with the help of AI. All of this to create something I'm going to be projecting later tonight downtown bringing all of this work into the real world. Enjoy.

I made this $700,000,000 music video in an afternoon on part of my weed budget. No, this is not 100% AI. I wrote the song, Through The Tempest, and recorded the samples which were then layered and mixed into the song. I curated the custom styling by running thousands of images of my own art to make a custom model. That model was then iterated dozens of times to fine tune details into new models. Then from the final model I ran dozens of images to animate with the help of AI. All of this to create something I'm going to be projecting later tonight downtown bringing all of this work into the real world. Enjoy.

BLVCKL!GHT

24,034 views • 3 months ago

Dreamina Seedance 2.0 is by far the best video model I've tried, and Dreamina makes it even better. You can try it in the link below. It's not just about the quality of the output, but about how much control you have over your video's look. Watch my video here. I'm attaching reference images and asking the model to generate a video using them. I can reference each image using the @ symbol to instruct the model which image to use. You can even upload a clip and use its camera movement, styles from an image, and audio vibe from a track. By the way, you can take an existing video and replace, remove, or add elements to it while the model preserves everything else. This is the closest we've gotten to "editing videos like photos".

Dreamina Seedance 2.0 is by far the best video model I've tried, and Dreamina makes it even better. You can try it in the link below. It's not just about the quality of the output, but about how much control you have over your video's look. Watch my video here. I'm attaching reference images and asking the model to generate a video using them. I can reference each image using the @ symbol to instruct the model which image to use. You can even upload a clip and use its camera movement, styles from an image, and audio vibe from a track. By the way, you can take an existing video and replace, remove, or add elements to it while the model preserves everything else. This is the closest we've gotten to "editing videos like photos".

Santiago

45,143 views • 2 months ago

I have secured an—accurate—high resolution copy of the Voynich Manuscript. I of course will use DeepSeek-OCR on this but I will also be fine tuning the model on the details of the each entire page. Of course it is online to some degree, but this affords something extra…

I have secured an—accurate—high resolution copy of the Voynich Manuscript. I of course will use DeepSeek-OCR on this but I will also be fine tuning the model on the details of the each entire page. Of course it is online to some degree, but this affords something extra…

Brian Roemmele

87,914 views • 7 months ago