
Sebastian Raschka
@rasbt • 464,298 subscribers
ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)
Videos

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5
Sebastian Raschka199,207 Aufrufe • vor 9 Monaten

Claude distillation has been a big topic this week while I am (coincidentally) writing Chapter 8 on model distillation. In that context, I shared some utilities to generate distillation data from all sorts of open-weight models via OpenRouter and Ollama:
Sebastian Raschka62,458 Aufrufe • vor 3 Monaten

Excited to be launching a series of short videos (5 min or less) to optimize the training performance of your PyTorch models! Covering - mixed-precision training - multi-GPU training strategies - Compiling PyTorch models - and finding good batch sizes 🔗
Sebastian Raschka120,786 Aufrufe • vor 3 Jahren

When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization 2. Dataset sampling and shuffling 3. Nondeterministic algorithms 4. Different runtime algorithms 5. Hardware and drivers 6. Randomness in generative AI models
Sebastian Raschka81,670 Aufrufe • vor 2 Jahren

I just added the new Llama 3.2 1B and 3B models to LitGPT, the open-source LLM library I help develop (focused on efficiency and code readability). LitGPT allows you to fine-tune and use these models on the cloud or a laptop. So, if you are looking for something to play with this weekend: # 1) Finetune the model litgpt finetune_lora meta-llama/Llama-3.2-1B \ --data JSON \ --data.json_path my_custom_dataset.json \ --train.epochs 1 \ --out_dir out/llama-3.2-finetuned \ --precision bf16-true # 2) Chat with the model litgpt chat out/llama-3.2-finetuned/final # 3) Serve the model via an API endpoint litgpt serve out/llama-3.2-finetuned/final
Sebastian Raschka65,529 Aufrufe • vor 1 Jahr
Keine weiteren Inhalte verfügbar