Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

We made a thing! Very happy to announce sqlcoder-pro and the Defog Alignment Platform. Available to use immediately without a wait-list, weights will be open-sourced very soon. The video does a quick show and tell comparison against ChatGPT (with gpt-4o). Read on for more details! TLDR 💪 equal (or... better) performance on text-to-SQL as the most capable Claude-3.5 or GPT-4 models 🤝 You can use it today on a free plan/free trial, without a waitlist 🪽 self-hostable on a single RTX4090, with 2 second median generation times for SQL queries 🔁 exactly the same output every time, give the same prompt 👨🏻‍🏫 teachable and steerable: show the model what you want it to do 🛞 debuggable – you can understand WTF is going on inside the model, instead of treating it like a black box Let's dig into each of these one-by-one! Performance SQLCoder-8b-pro significantly exceeds the performance of our previous sqlcoder-8b model on Postgres text-to-SQL (from 88.2% to 90.2% accuracy - gpt-4o is at 87.6%, for reference). It is also better at following instructions. This was done via self-merges, hand crafted fine-tuning data, and adapting the training data to fit our tokenizer. Cost You can host this on the model on a single $3,500 RTX4090, and support ~5 requests/second via VLLM. If you're looking to host on the cloud instead, you can run it on a single L4 GPU that costs $300/mo on GCP Repeatability We have a dense 8b model with no MoE shenanigans. For the same prompt with temperature=0, you'll always get the same answer – which is critical in BI. Teachable In our alignment and feedback modes, you can give the model feedback on how it answered certain questions, and it will automatically adapt to the feedback. Debuggable You can use logprobs and attention scores to determine where, exactly is the model paying attention to inside a prompt + what it's getting confused by when generating outputs. Available today You can use Defog on the cloud today by going to docs[dot]defog[dot]ai, and getting an API key. Excited to hear what you think!show more

Rishabh Srivastava

12,553 subscribers

13,460 views • 1 year ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

10 Comments

Alok Bishoyi1 year ago

🚀🙏

Dennis1 year ago

sick

Aditya Mandke1 year ago

awesome! does this work for multiple tables too? like joining tables and performing aggregation etc on it

Rishabh Srivastava1 year ago

Yup! You can connect your DB schema to it at – works pretty well for up to 100 tables if given adequate instructions about join hints!

Ilia Sazonov1 year ago

You rock 🔥🔥🔥

Abhi Shah1 year ago

Awesomeness. Does it work on SQL server equally well, and what sort of prompt engineering does it require. I'll trawl through the docs as well..

Rishabh Srivastava1 year ago

Aye works quite well for SQLServer! For this defog-desktop app, basically none – just ask a question, give feedback on what works and what doesn't - and you're good to go

Kyle Corbitt1 year ago

Very nice work @rishdotblog!

Amit ⚡1 year ago

Love it! Happy to trial and provide feedback. Just took it for a spin on your site. SQL seems correct but maybe there's something up with execution and returns null response.

Rishabh Srivastava1 year ago

Thanks for the feedback! We need to fix the display when no data is returned (which often happens for queries that return a NULL response) – will fix that in a few minutes

Related Videos

Imagine an AI application that can type anywhere you can and use the full context of what's on your screen. This is the application we all deserve (at least if you have macOS.) Check out Omnipilot. It's an app that works with every other macOS application and uses Claude Sonet 3.5 in the background—it also supports Gemini and GPT-4o. Here is the idea: You can use the tool to ask questions about anything on your screen. Or you can use it to autocomplete the text you are typing. You don't need to copy and paste anymore or waste your time providing context to a model. It sees what you see. It works right where you are. That's pretty cool. Here are a couple of cool examples: • Use it to reply to an email • Use it in the terminal to autocomplete a command • Use it to finish a document • Use to send a message on Slack AI at the system level is bonkers. You can read a ton more on their Product Hunt launch page: Thanks to the Omnipilot team for collaborating with me on this post!

Imagine an AI application that can type anywhere you can and use the full context of what's on your screen. This is the application we all deserve (at least if you have macOS.) Check out Omnipilot. It's an app that works with every other macOS application and uses Claude Sonet 3.5 in the background—it also supports Gemini and GPT-4o. Here is the idea: You can use the tool to ask questions about anything on your screen. Or you can use it to autocomplete the text you are typing. You don't need to copy and paste anymore or waste your time providing context to a model. It sees what you see. It works right where you are. That's pretty cool. Here are a couple of cool examples: • Use it to reply to an email • Use it in the terminal to autocomplete a command • Use it to finish a document • Use to send a message on Slack AI at the system level is bonkers. You can read a ton more on their Product Hunt launch page: Thanks to the Omnipilot team for collaborating with me on this post!

Santiago

72,306 views • 2 years ago

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

Santiago

324,602 views • 2 years ago

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

Santiago

44,783 views • 1 year ago

Building AI agents and tools is getting ridiculously easy! Here is a pretty impressive FREE and Open-Source AI agent builder that runs on your computer. It's a desktop app you can install on Mac or Windows. Link: • Free • 100% no-code: You can type what you want, and the tool will build everything for you. • You can use local models like DeepSeek or Qwen, or connect to cloud models like GPT-4o, Claude, or Gemini with your API key. • You can turn the tools you build into MCP Servers. I recorded a quick video to show you how easy it is to build something from scratch.

Building AI agents and tools is getting ridiculously easy! Here is a pretty impressive FREE and Open-Source AI agent builder that runs on your computer. It's a desktop app you can install on Mac or Windows. Link: • Free • 100% no-code: You can type what you want, and the tool will build everything for you. • You can use local models like DeepSeek or Qwen, or connect to cloud models like GPT-4o, Claude, or Gemini with your API key. • You can turn the tools you build into MCP Servers. I recorded a quick video to show you how easy it is to build something from scratch.

Santiago

56,815 views • 1 year ago

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

The Peel

51,632 views • 5 months ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,510 views • 3 years ago

How to use 50+ API keys (models) for FREE on OpenClaw API??? - go to - login or register your account - click on "more models" - click on "use case" and select what you need it for - choose the model and open it - click on "view code" → "Generate API key" many models don't allow direct deploy, so use the "view code" button to generate API access basically Nvidia NIM gives you the ability to test almost any model from their list for FREE some of them are not worse than GPT 5.2 or Claude Opus 4.6, some might even perform better depending on the task how to understand if a model is efficient and compare it with others??? - go to - type the model name in search - click on "benchmarks" - you’ll see performance tests and rankings this way you can easily compare free models with paid ones of course there are RPM limits, on many models it’s around ~40 requests per minute each model is different, after generating the API key, RPM limits are shown in the top-right corner nothing stops you from using them, many work perfectly fine, super solid option for first tests and for learning OpenClaw or any other system where you need an AI API model

How to use 50+ API keys (models) for FREE on OpenClaw API??? - go to - login or register your account - click on "more models" - click on "use case" and select what you need it for - choose the model and open it - click on "view code" → "Generate API key" many models don't allow direct deploy, so use the "view code" button to generate API access basically Nvidia NIM gives you the ability to test almost any model from their list for FREE some of them are not worse than GPT 5.2 or Claude Opus 4.6, some might even perform better depending on the task how to understand if a model is efficient and compare it with others??? - go to - type the model name in search - click on "benchmarks" - you’ll see performance tests and rankings this way you can easily compare free models with paid ones of course there are RPM limits, on many models it’s around ~40 requests per minute each model is different, after generating the API key, RPM limits are shown in the top-right corner nothing stops you from using them, many work perfectly fine, super solid option for first tests and for learning OpenClaw or any other system where you need an AI API model

Ronin

58,789 views • 5 months ago

🚨You can now use the new upcoming OpenAI model GPT 5.2 inside Cursor. Here is the full walkthrough. - Open the editor, go to settings and then the model tab. Add a custom model and enter the text "gpt-5.2-high" and "gpt-5.2". - After that you can select the model and ask questions. To verify, I started my test on the usage page which had zero gpt-5.2-high requests and consumption. After the test I could see the details in usage and the cost incurred while using it. Enjoy

🚨You can now use the new upcoming OpenAI model GPT 5.2 inside Cursor. Here is the full walkthrough. - Open the editor, go to settings and then the model tab. Add a custom model and enter the text "gpt-5.2-high" and "gpt-5.2". - After that you can select the model and ask questions. To verify, I started my test on the usage page which had zero gpt-5.2-high requests and consumption. After the test I could see the details in usage and the cost incurred while using it. Enjoy

AshutoshShrivastava

424,035 views • 7 months ago

Larry Ellison—owner of Oracle, CBS, and now TikTok—tells Tony Blair about his plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" his AI models. "We have to take all of this data... and move it into a single, if you will, unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Larry Ellison—owner of Oracle, CBS, and now TikTok—tells Tony Blair about his plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" his AI models. "We have to take all of this data... and move it into a single, if you will, unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Wide Awake Media

144,324 views • 6 months ago

Tony Blair and Oracle co-founder Larry Ellison plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" their AI models. "We have to take all of this data... and move it into a single... unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Tony Blair and Oracle co-founder Larry Ellison plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" their AI models. "We have to take all of this data... and move it into a single... unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Wide Awake Media

54,084 views • 8 months ago

MIT PhD student Alex Zhang reveals the scaling result where a model trained on short tasks generalizes to problems 100x longer for free: "If you're very clever about the design of your harness or how you use the language model, you can almost get scaling gains for free." "If you train a model naively, there's no tricks. It's just the same way you train a model on these RL environments. You just roll it out, and then you just get some reward." "If you train it on only short tasks, like only tasks that are 10,000 tokens long, and then you were to run it on a similar domain, but at a million tokens, or 10 million tokens, or 100,000 tokens, it generalizes really, really well. If you look at it compared to even the base transformer, you get way better generalization properties." "When the model uses an RLM (Recursive Language Model) after it's trained on these short tasks, it will see some kind of trajectory of actions that it does. Between these two problems of different lengths, the RLM learns to see them as almost the same problem." "Token for token, they're almost the same. You can describe it in code. In one code setting, maybe the for loop is a little bigger, but it's the same kind of code and it derives the constants from the data. There's no hard coding, so they literally look the same." alex zhang

MIT PhD student Alex Zhang reveals the scaling result where a model trained on short tasks generalizes to problems 100x longer for free: "If you're very clever about the design of your harness or how you use the language model, you can almost get scaling gains for free." "If you train a model naively, there's no tricks. It's just the same way you train a model on these RL environments. You just roll it out, and then you just get some reward." "If you train it on only short tasks, like only tasks that are 10,000 tokens long, and then you were to run it on a similar domain, but at a million tokens, or 10 million tokens, or 100,000 tokens, it generalizes really, really well. If you look at it compared to even the base transformer, you get way better generalization properties." "When the model uses an RLM (Recursive Language Model) after it's trained on these short tasks, it will see some kind of trajectory of actions that it does. Between these two problems of different lengths, the RLM learns to see them as almost the same problem." "Token for token, they're almost the same. You can describe it in code. In one code setting, maybe the for loop is a little bigger, but it's the same kind of code and it derives the constants from the data. There's no hard coding, so they literally look the same." alex zhang

MTS

99,784 views • 6 days ago

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 views • 2 years ago

You can now try Llama 3.1 405B for free (link below)! This is the largest open-source model out there, and for the first time, an open model is competitive with closed models. This time around, Meta did something new: Llama 3.1 has a license that allows developers to use it to enhance other models. For the first time, you can distill Llama 3.1 405B's capabilities into a smaller, more practical model for your use case. First, here is the link where you can play with Llama 3.1 for free: The model is hosted in Tune Studio, an end-to-end platform for developing applications using Large Language Models. They are sponsoring this post. Take a look at the attached video. It will show you how you can fine-tune a simple model using Llama 3.1 without leaving the platform: 1. You can create an empty dataset 2. Use the playground to generate and record interactions with Llama 3.1 3. Modify the dataset directly using the playground 4. Export the data and fine-tune a smaller model Fast and easy! As long as you have a web browser, you can start experimenting with fine-tuning and Llama 3.1. That's all it takes!

You can now try Llama 3.1 405B for free (link below)! This is the largest open-source model out there, and for the first time, an open model is competitive with closed models. This time around, Meta did something new: Llama 3.1 has a license that allows developers to use it to enhance other models. For the first time, you can distill Llama 3.1 405B's capabilities into a smaller, more practical model for your use case. First, here is the link where you can play with Llama 3.1 for free: The model is hosted in Tune Studio, an end-to-end platform for developing applications using Large Language Models. They are sponsoring this post. Take a look at the attached video. It will show you how you can fine-tune a simple model using Llama 3.1 without leaving the platform: 1. You can create an empty dataset 2. Use the playground to generate and record interactions with Llama 3.1 3. Modify the dataset directly using the playground 4. Export the data and fine-tune a smaller model Fast and easy! As long as you have a web browser, you can start experimenting with fine-tuning and Llama 3.1. That's all it takes!

Santiago

55,609 views • 2 years ago

What. Copilot Tasks is already on mobile. So you basically have a cloud computer/Claude Cowork working for you on the go. Same features as on desktop. With one prompt it can: - Use a browser to navigate - Interact with the page - Scrap the relevant info - Generate an Excel file And it still has access to all the Office tools, emails, ability to schedule tasks, etc.

What. Copilot Tasks is already on mobile. So you basically have a cloud computer/Claude Cowork working for you on the go. Same features as on desktop. With one prompt it can: - Use a browser to navigate - Interact with the page - Scrap the relevant info - Generate an Excel file And it still has access to all the Office tools, emails, ability to schedule tasks, etc.

Paul Couvert

38,425 views • 4 months ago

Chinese AI models are wiping billions off Big Tech right now. Google just lost $200 billion in a single day, and the model it needed to fight back still isn't ready. Gemini 3.5 Pro, Google's most powerful model, is months behind schedule. Alphabet stock dropped 4.4% that same day. The Deepseek moment is happening again, and the new model is FAR bigger. On the same day Google's delay leaked, a Beijing lab called Moonshot released Kimi K3. It is the largest open model ever built, with 2.8 trillion parameters. It took the number one spot on the Frontend Code Arena, a live coding leaderboard, passing Anthropic's best model. And Moonshot is giving it away for free on July 27. The genius part: Anyone with enough computers can download it and run a frontier level AI without paying a cent to a US company. A single task on Kimi K3 costs about 94 cents. The same work on some American models costs nearly double. So why would a company keep paying premium prices for a model it can now get for free? The entire US AI business is built on selling access to models that cost billions to train. If a free Chinese version does most of the same work, that pricing power starts to crack. And Kimi is close to the best. On one closely watched intelligence ranking it scored 57, just behind the top American models GPT-5.6 Sol and Fable 5, and ahead of Claude Opus 4.8. Bank of America told clients that Kimi proves Chinese labs can keep making big leaps even with limited chips. And the founder of Moonshot, Yang Zhilin, learned to build AI as a researcher INSIDE Google. Google literally wrote the 2017 paper that made all of these models possible. Now the people who studied its work are using it to destroy Google, and handing it out for free. What happens next: Kimi K3's weights go public on July 27. Google reports earnings on July 22, and everyone will be asking the same question about Gemini. If free models keep topping the charts, every valuation built on paid AI access has to be rewritten. What do you think?

Chinese AI models are wiping billions off Big Tech right now. Google just lost $200 billion in a single day, and the model it needed to fight back still isn't ready. Gemini 3.5 Pro, Google's most powerful model, is months behind schedule. Alphabet stock dropped 4.4% that same day. The Deepseek moment is happening again, and the new model is FAR bigger. On the same day Google's delay leaked, a Beijing lab called Moonshot released Kimi K3. It is the largest open model ever built, with 2.8 trillion parameters. It took the number one spot on the Frontend Code Arena, a live coding leaderboard, passing Anthropic's best model. And Moonshot is giving it away for free on July 27. The genius part: Anyone with enough computers can download it and run a frontier level AI without paying a cent to a US company. A single task on Kimi K3 costs about 94 cents. The same work on some American models costs nearly double. So why would a company keep paying premium prices for a model it can now get for free? The entire US AI business is built on selling access to models that cost billions to train. If a free Chinese version does most of the same work, that pricing power starts to crack. And Kimi is close to the best. On one closely watched intelligence ranking it scored 57, just behind the top American models GPT-5.6 Sol and Fable 5, and ahead of Claude Opus 4.8. Bank of America told clients that Kimi proves Chinese labs can keep making big leaps even with limited chips. And the founder of Moonshot, Yang Zhilin, learned to build AI as a researcher INSIDE Google. Google literally wrote the 2017 paper that made all of these models possible. Now the people who studied its work are using it to destroy Google, and handing it out for free. What happens next: Kimi K3's weights go public on July 27. Google reports earnings on July 22, and everyone will be asking the same question about Gemini. If free models keep topping the charts, every valuation built on paid AI access has to be rewritten. What do you think?

Ricardo

47,202 views • 6 days ago

For the first time we are fundamentally changing how humans can collaborate with ChatGPT since it launched two years ago. We’re introducing canvas, a new interface for working with ChatGPT on writing and coding projects that go beyond simple chat. Product and model features: 1/ Ask for in-line feedback. With canvas, ChatGPT can better understand the context of what you’re trying to accomplish. You can highlight specific sections to indicate exactly what you want ChatGPT to focus on. Like a copy editor or code reviewer, it can give in-line feedback and suggestions with the entire project in mind. 2/ Directly edit the model's output and select a specific area for targeted editing. You control your creative work on canvas. You can directly edit text or code. 3/ Menu of shortcuts. There’s a menu of shortcuts for you to ask ChatGPT to adjust writing length, debug your code, and quickly perform other useful actions. You can also restore previous versions of your work by using the back button in canvas. 4/ Use search with canvas for research writing! As we are moving towards the new paradigm of reasoning we are fundamentally evolving the chat interface into a more collaborative human-AI interaction. Today you can say “browse / use browsing to find XYZ on the internet and write a report in canvas”

For the first time we are fundamentally changing how humans can collaborate with ChatGPT since it launched two years ago. We’re introducing canvas, a new interface for working with ChatGPT on writing and coding projects that go beyond simple chat. Product and model features: 1/ Ask for in-line feedback. With canvas, ChatGPT can better understand the context of what you’re trying to accomplish. You can highlight specific sections to indicate exactly what you want ChatGPT to focus on. Like a copy editor or code reviewer, it can give in-line feedback and suggestions with the entire project in mind. 2/ Directly edit the model's output and select a specific area for targeted editing. You control your creative work on canvas. You can directly edit text or code. 3/ Menu of shortcuts. There’s a menu of shortcuts for you to ask ChatGPT to adjust writing length, debug your code, and quickly perform other useful actions. You can also restore previous versions of your work by using the back button in canvas. 4/ Use search with canvas for research writing! As we are moving towards the new paradigm of reasoning we are fundamentally evolving the chat interface into a more collaborative human-AI interaction. Today you can say “browse / use browsing to find XYZ on the internet and write a report in canvas”

Karina

845,627 views • 1 year ago

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Santiago

84,747 views • 1 year ago

A voice agent powered by gpt-oss. Running locally on my macBook. Demo recorded in a Waymo with WiFi turned off. I'm still on my space game voice AI kick, obviously. Code link below. For conversational voice AI, you want to set the gpt-oss reasoning behavior to "low". (The default is "medium".) Notes on how to do that and a jinja template you can use are in the repo. The LLM in the demo video is the big, 120B version of gpt-oss. You can use the smaller, 20B model for this, of course. But OpenAI really did a cool thing here designing the 120B model to run in "just" 80GB of VRAM. And the llama.cpp mlx inference is fast: ~250ms TTFT. Running a big model on-device feels like a time warp into the future of AI.

A voice agent powered by gpt-oss. Running locally on my macBook. Demo recorded in a Waymo with WiFi turned off. I'm still on my space game voice AI kick, obviously. Code link below. For conversational voice AI, you want to set the gpt-oss reasoning behavior to "low". (The default is "medium".) Notes on how to do that and a jinja template you can use are in the repo. The LLM in the demo video is the big, 120B version of gpt-oss. You can use the smaller, 20B model for this, of course. But OpenAI really did a cool thing here designing the 120B model to run in "just" 80GB of VRAM. And the llama.cpp mlx inference is fast: ~250ms TTFT. Running a big model on-device feels like a time warp into the future of AI.

kwindla

202,147 views • 11 months ago

This is how you can get programmatic access to any website. You need two things: 1. The URL of the website 2. A prompt specifying what you want to do on the site Mino is a neat platform that uses browser automation and AI-powered navigation to understand your prompt, open the website, and extract the information you want. This opens the doors to unlimited potential automations you could build using data from everyday websites that don't give you an API. Mino uses AI to go way further than you could get by using a regular web scraping tool: • It can handle dynamic JavaScript content • It can handle login walls • It can navigate through interactive booking flows • It adapts to different interfaces and layout changes • It can fill out forms automatically • It supports stealth browser mode In my experience, running Mino on relatively straightforward sites takes about 30-60 seconds to complete. The more times you run it, the faster it gets. I recorded a quick video to show you the platform. You can check their site here: Thanks to the team for the support, onboarding me on their platform, and the collaboration on this post.

This is how you can get programmatic access to any website. You need two things: 1. The URL of the website 2. A prompt specifying what you want to do on the site Mino is a neat platform that uses browser automation and AI-powered navigation to understand your prompt, open the website, and extract the information you want. This opens the doors to unlimited potential automations you could build using data from everyday websites that don't give you an API. Mino uses AI to go way further than you could get by using a regular web scraping tool: • It can handle dynamic JavaScript content • It can handle login walls • It can navigate through interactive booking flows • It adapts to different interfaces and layout changes • It can fill out forms automatically • It supports stealth browser mode In my experience, running Mino on relatively straightforward sites takes about 30-60 seconds to complete. The more times you run it, the faster it gets. I recorded a quick video to show you the platform. You can check their site here: Thanks to the team for the support, onboarding me on their platform, and the collaboration on this post.

Santiago

31,742 views • 7 months ago