Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Introducing DoctorGPT! After applying fine-tuning, reinforcement learning, & compilation techniques to Meta's Llama2 model, I got amazing results: - Passes the US Medical Licensing Exam - Offline - iOS & Android - Open Source Code: Full video tutorial:

Siraj Raval

61,651 subscribers

450,230 görüntüleme • 2 yıl önce •via X (Twitter)

Eğitim Sağlık & İyilik Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

9 Yorum

Rowan Cheung profil fotoğrafı

Rowan Cheung2 yıl önce

Great work

Dwayne profil fotoğrafı

Dwayne2 yıl önce

Finally, I can achieve my dreams of becoming a doctor and charging $300 for a band aid.

biccs👨🏽‍💻{3.LAND} profil fotoğrafı

biccs👨🏽‍💻{3.LAND}2 yıl önce

Me: my head hurts DoctorGPT: you have cancer 💀

Abhinav Das profil fotoğrafı

Abhinav Das2 yıl önce

A Doctor in my pocket!

Manish Patel profil fotoğrafı

Manish Patel2 yıl önce

Can't wait for @DrHughHarvey to see this 😬😱

Linus Ekenstam – eu/acc profil fotoğrafı

Linus Ekenstam – eu/acc2 yıl önce

This is amazing work Siraj!!!

Lachlan Phillips exo/acc 👾 profil fotoğrafı

Lachlan Phillips exo/acc 👾2 yıl önce

.@elonmusk @lindayaX @X The features that make YouTube the best are: 1. Full screen button makes the video horizontal mode. 2. Video plays as audio in the background so you can do other things and listen. Please implement these! Xx

Izzy Brooks profil fotoğrafı

Izzy Brooks2 yıl önce

@wagieeacc 👀

AJ Keller profil fotoğrafı

AJ Keller2 yıl önce

Giving me so many thoughts!

Benzer Videolar

Open-source SUNO is here! Introducing HeartMula: free & offline AI music generator Full tutorial:

Open-source SUNO is here! Introducing HeartMula: free & offline AI music generator Full tutorial:

⚡AI Search⚡

77,489 görüntüleme • 4 ay önce

Video tutorial of how to connect your 9 proxies !!! On iOS, Android & PC !!! 🪶

Video tutorial of how to connect your 9 proxies !!! On iOS, Android & PC !!! 🪶

Street Update 247

145,671 görüntüleme • 3 ay önce

Many people are flat wrong about DeepSeek. You might think everyone freaked out about DeepSeek because the model is really good—which it is—or because it was from China, which is also true. But there's a more subtle reason: DeepSeek showed they could achieve those results using reinforcement learning instead of supervised fine-tuning. In other words, they didn't need an expensive phase where a bunch of people tuned the model by answering questions. They were able to do all of this faster and cheaper using reinforcement fine-tuning. This is a huge deal! I recorded the attached video with the following goals in mind: 1. Explain how reinforcement fine-tuning works 2. Show you how you can fine-tune your own models 3. Walk you through a complete code example

Many people are flat wrong about DeepSeek. You might think everyone freaked out about DeepSeek because the model is really good—which it is—or because it was from China, which is also true. But there's a more subtle reason: DeepSeek showed they could achieve those results using reinforcement learning instead of supervised fine-tuning. In other words, they didn't need an expensive phase where a bunch of people tuned the model by answering questions. They were able to do all of this faster and cheaper using reinforcement fine-tuning. This is a huge deal! I recorded the attached video with the following goals in mind: 1. Explain how reinforcement fine-tuning works 2. Show you how you can fine-tune your own models 3. Walk you through a complete code example

Santiago

105,497 görüntüleme • 1 yıl önce

Introducing Cohere's first open-source coding model: North Mini Code Small & efficient, designed for agentic performance and built for community input.

Introducing Cohere's first open-source coding model: North Mini Code Small & efficient, designed for agentic performance and built for community input.

Cohere

441,132 görüntüleme • 1 gün önce

Introducing Open TTS Tracker! 🗣️ *sound on* A one-stop shop to track all open access/ source TTS models! Ranging from XTTS to Pheme, OpenVoice to VITS, and more... ⚡ For each model, we compile: 1. Souce-code 2. Checkpoints 3. License 4. Fine-tuning code 5. Languages supported 6. Paper 7. Demo Help us make it more complete! Let's 2024 the year of open TTS models! ❤️

Introducing Open TTS Tracker! 🗣️ sound on A one-stop shop to track all open access/ source TTS models! Ranging from XTTS to Pheme, OpenVoice to VITS, and more... ⚡ For each model, we compile: 1. Souce-code 2. Checkpoints 3. License 4. Fine-tuning code 5. Languages supported 6. Paper 7. Demo Help us make it more complete! Let's 2024 the year of open TTS models! ❤️

Vaibhav (VB) Srivastav

68,037 görüntüleme • 2 yıl önce

YC S23's MediSearch just launched MediSearch Pro—your personal medical research assistant. It's currently the most accurate publicly available medical search engine, scoring 94.2% on the US Medical Licensing Examination (USMLE). Now on web, iOS, and Android.

YC S23's MediSearch just launched MediSearch Pro—your personal medical research assistant. It's currently the most accurate publicly available medical search engine, scoring 94.2% on the US Medical Licensing Examination (USMLE). Now on web, iOS, and Android.

Y Combinator

14,024 görüntüleme • 1 yıl önce

It's time to become the legend. Tomb Raider is out now for iOS & Android! iOS: Android:

It's time to become the legend. Tomb Raider is out now for iOS & Android! iOS: Android:

Tomb Raider

120,344 görüntüleme • 3 ay önce

Just added Qwen 2.5 32B Coder to LlamaCoder – it's an amazing open source coding model. Going to run some evals between it and Llama 3.1 405B & will share my results soon.

Just added Qwen 2.5 32B Coder to LlamaCoder – it's an amazing open source coding model. Going to run some evals between it and Llama 3.1 405B & will share my results soon.

Hassan

52,565 görüntüleme • 1 yıl önce

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Together AI

28,338 görüntüleme • 11 ay önce

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

Andrew Ng

502,757 görüntüleme • 2 yıl önce

Introducing the Send Mobile by SEND ecosystem 📱 An open-source React Native kit to build iOS and Android mobile apps on Solana in ~15 minutes Ft. 18+ protocol integrations 🧵

Introducing the Send Mobile by SEND ecosystem 📱 An open-source React Native kit to build iOS and Android mobile apps on Solana in ~15 minutes Ft. 18+ protocol integrations 🧵

Solana

905,942 görüntüleme • 1 yıl önce

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 görüntüleme • 1 yıl önce

reinforcement learning expanding the range of techniques of the Ultra Mobile Vehicle (UMV)

reinforcement learning expanding the range of techniques of the Ultra Mobile Vehicle (UMV)

Science girl

19,399 görüntüleme • 9 ay önce

Fine-tuning in 2026 has never been easier You can make any open-source model 10x more powerful And thanks to Unsloth Studio, creating custom datasets takes just a few mins, Here is the full course:

Fine-tuning in 2026 has never been easier You can make any open-source model 10x more powerful And thanks to Unsloth Studio, creating custom datasets takes just a few mins, Here is the full course:

David Ondrej

34,503 görüntüleme • 12 gün önce

🚀 LTXV 0.9.5 is here! Our latest AI video model brings: ✅ Commercial licensing 🎬 Keyframe conditioning 🔍 Higher resolution & improved quality 📈 Longer sequences, fewer artifacts 🖥️ Native ComfyUI support Bringing open-source AI video creation to the next level.

🚀 LTXV 0.9.5 is here! Our latest AI video model brings: ✅ Commercial licensing 🎬 Keyframe conditioning 🔍 Higher resolution & improved quality 📈 Longer sequences, fewer artifacts 🖥️ Native ComfyUI support Bringing open-source AI video creation to the next level.

Lightricks

31,740 görüntüleme • 1 yıl önce

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source with 50k stars!

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source with 50k stars!

Avi Chawla

557,355 görüntüleme • 1 yıl önce

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source, 51k+ stars 🌟

Fine-tune 100+ LLMs directly from a UI! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Supports 100+ models, multimodal fine-tuning, PPO, DPO, experiment tracking, and much more! 100% open-source, 51k+ stars 🌟

Akshay 🚀

60,163 görüntüleme • 1 yıl önce

🚀 LTXV 0.9.5 is here! Our latest AI video model brings: ✅ Commercial licensing 🎬 Keyframe conditioning 🔍 Higher resolution & improved quality 📈 Longer sequences, fewer artifacts 🖥️ Native ComfyUI support Bringing open-source AI video creation to the next level.

🚀 LTXV 0.9.5 is here! Our latest AI video model brings: ✅ Commercial licensing 🎬 Keyframe conditioning 🔍 Higher resolution & improved quality 📈 Longer sequences, fewer artifacts 🖥️ Native ComfyUI support Bringing open-source AI video creation to the next level.

LTX Studio

65,576 görüntüleme • 1 yıl önce

Super excited to launch a new AI course! 🚀 Fine-Tuning & Reinforcement Learning for LLMs: Intro to Post-Training A collaboration between AMD 🤝 Andrew Ng’s DeepLearning.AI to give every developer the tools & compute to work with the same post-training techniques, used across today’s leading AI labs. 🎓 Learn for free → 🧵

Super excited to launch a new AI course! 🚀 Fine-Tuning & Reinforcement Learning for LLMs: Intro to Post-Training A collaboration between AMD 🤝 Andrew Ng’s DeepLearning.AI to give every developer the tools & compute to work with the same post-training techniques, used across today’s leading AI labs. 🎓 Learn for free → 🧵

Sharon Zhou

20,386 görüntüleme • 7 ay önce