Sebastian Raschka's banner

Sebastian Raschka

@rasbt • 464,298 subscribers

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5

Sebastian Raschka

199,207 views • 9 months ago

Claude distillation has been a big topic this week while I am (coincidentally) writing Chapter 8 on model distillation. In that context, I shared some utilities to generate distillation data from all sorts of open-weight models via OpenRouter and Ollama:

Claude distillation has been a big topic this week while I am (coincidentally) writing Chapter 8 on model distillation. In that context, I shared some utilities to generate distillation data from all sorts of open-weight models via OpenRouter and Ollama:

Sebastian Raschka

62,458 views • 3 months ago

Excited to be launching a series of short videos (5 min or less) to optimize the training performance of your PyTorch models! Covering - mixed-precision training - multi-GPU training strategies - Compiling PyTorch models - and finding good batch sizes 🔗

Excited to be launching a series of short videos (5 min or less) to optimize the training performance of your PyTorch models! Covering - mixed-precision training - multi-GPU training strategies - Compiling PyTorch models - and finding good batch sizes 🔗

Sebastian Raschka

120,786 views • 3 years ago

When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization 2. Dataset sampling and shuffling 3. Nondeterministic algorithms 4. Different runtime algorithms 5. Hardware and drivers 6. Randomness in generative AI models

When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization 2. Dataset sampling and shuffling 3. Nondeterministic algorithms 4. Different runtime algorithms 5. Hardware and drivers 6. Randomness in generative AI models

Sebastian Raschka

81,670 views • 2 years ago

$I just added the new Llama 3.2 1B and 3B models to LitGPT, the open-source LLM library I help develop (focused on efficiency and code readability). LitGPT allows you to fine-tune and use these models on the cloud or a laptop. So, if you are looking for something to play with this weekend: # 1) Finetune the model litgpt finetune_lora meta-llama/Llama-3.2-1B \ --data JSON \ --data.json_path my_custom_dataset.json \ --train.epochs 1 \ --out_dir out/llama-3.2-finetuned \ --precision bf16-true # 2) Chat with the model litgpt chat out/llama-3.2-finetuned/final # 3) Serve the model via an API endpoint litgpt serve out/llama-3.2-finetuned/final$

I just added the new Llama 3.2 1B and 3B models to LitGPT, the open-source LLM library I help develop (focused on efficiency and code readability). LitGPT allows you to fine-tune and use these models on the cloud or a laptop. So, if you are looking for something to play with this weekend: # 1) Finetune the model litgpt finetune_lora meta-llama/Llama-3.2-1B \ --data JSON \ --data.json_path my_custom_dataset.json \ --train.epochs 1 \ --out_dir out/llama-3.2-finetuned \ --precision bf16-true # 2) Chat with the model litgpt chat out/llama-3.2-finetuned/final # 3) Serve the model via an API endpoint litgpt serve out/llama-3.2-finetuned/final

Sebastian Raschka

65,529 views • 1 year ago

One of the Lightning 2.0 highlights is the open source Fabric library ( Fabric let's you scale you PyTorch code with only a few lines of code. Used the pre-release for my research & it's amazing! Here's a quick demo + code:

One of the Lightning 2.0 highlights is the open source Fabric library ( Fabric let's you scale you PyTorch code with only a few lines of code. Used the pre-release for my research & it's amazing! Here's a quick demo + code:

Sebastian Raschka

74,833 views • 3 years ago

Excited to share Unit 10, the finale of my latest course! 🎉 My aim is to provide a modern take on learning deep learning & AI using open-source libraries. And I hope you enjoyed the journey from backpropagation to implementing LLMs in PyTorch!

Excited to share Unit 10, the finale of my latest course! 🎉 My aim is to provide a modern take on learning deep learning & AI using open-source libraries. And I hope you enjoyed the journey from backpropagation to implementing LLMs in PyTorch!

Sebastian Raschka

69,415 views • 3 years ago

No more content to load