Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

DeepSeek's (DeepSeek) latest—MLA, Multi-Token Prediction, 256 Experts, FP8 block quantization—shines with vLLM. Catch the office hours session were we discuss all the DeepSeek goodies and explore their integration and benchmarks with #vLLM.

Red Hat AI

6,642 subscribers

14,093 просмотров • 1 год назад •via X (Twitter)

Наука и технологии #vLLM

Anya Rossi• Live Now

Private livecam show

Комментарии: 2

Фото профиля Neural Magic (Acquired by Red Hat)

Neural Magic (Acquired by Red Hat)1 год назад

@vllm_project You can see the session slides here:

Фото профиля Lab4crypto

Lab4crypto1 год назад

🚨 New Weekly Quant Analysis! 🚨 Dive into my free, in-depth crypto market analysis, which is delivered straight to your inbox! 📊📬 🔍 Quantitative insights, growth trends, and risk management. 👀 Check out the preview below 👇 and subscribe now for early access!

Похожие видео

InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the vLLM community over the past two weeks:

InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the vLLM community over the past two weeks:

Red Hat AI

24,429 просмотров • 7 месяцев назад

"I look at [DeepSeek's] work, and I'm like, 'Ah, a kindred soul'." Anthropic researcher Sholto Douglas and Trenton Bricken explain how DeepSeek papers reveal good research taste. "They very clearly understand this dance between hardware systems and algorithms." More technical discussion on MLA, multi-token prediction, MoE, and other DeepSeek innovations in the full episode.

"I look at [DeepSeek's] work, and I'm like, 'Ah, a kindred soul'." Anthropic researcher Sholto Douglas and Trenton Bricken explain how DeepSeek papers reveal good research taste. "They very clearly understand this dance between hardware systems and algorithms." More technical discussion on MLA, multi-token prediction, MoE, and other DeepSeek innovations in the full episode.

Dwarkesh Patel

36,630 просмотров • 1 год назад

DeepSeek-V4-Flash-Spark and Spark-Mini were born today. 1 command setup to Pi / vllm-studio / docker deployment. All this is running local on a single DGX Spark.

DeepSeek-V4-Flash-Spark and Spark-Mini were born today. 1 command setup to Pi / vllm-studio / docker deployment. All this is running local on a single DGX Spark.

0xSero

20,019 просмотров • 17 дней назад

Dozens of teams have asked my advice on running LLMs. How fast is DeepSeek V3 with vLLM on 8 GPUs? What's the max throughput of Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor

Dozens of teams have asked my advice on running LLMs. How fast is DeepSeek V3 with vLLM on 8 GPUs? What's the max throughput of Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor

Charles 🎉 Frye

86,348 просмотров • 1 год назад

Among the fastest DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B inference in the market, per Artificial Analysis benchmarks (April 2026). ⚡️🤖 Sub-1-second TTFT. 230 tokens per second. Co-designed every layer of the stack with Inferact, performance optimized vLLM, all on NVIDIA HGX B300. Live on DigitalOcean Serverless Inference now. Full breakdown in the comments. ⬇️

Among the fastest DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B inference in the market, per Artificial Analysis benchmarks (April 2026). ⚡️🤖 Sub-1-second TTFT. 230 tokens per second. Co-designed every layer of the stack with Inferact, performance optimized vLLM, all on NVIDIA HGX B300. Live on DigitalOcean Serverless Inference now. Full breakdown in the comments. ⬇️

DigitalOcean

39,063 просмотров • 1 месяц назад

Completed a first hour side-by-side comparison between Qwen3.5 27b and Qwen3.6 27b on the same 4 canvas coding tests. Running the Qwen3.6 27b FP8, vLLM. What do you think?

Completed a first hour side-by-side comparison between Qwen3.5 27b and Qwen3.6 27b on the same 4 canvas coding tests. Running the Qwen3.6 27b FP8, vLLM. What do you think?

stevibe

114,070 просмотров • 1 месяц назад

Testing DeepSeek R1 in VSCode with CodeGPT Step-by-step guide to connect: ✨ Select "LLMs Cloud model" ✨ Choose DeepSeek as the provider ✨ Pick the "deepseek-reasoner" model ✨ Select code and/or files from your project and send them to the model That’s it! You’re all set to use this amazing DeepSeek model... 🚀 P.S.: Update to the latest version of CodeGPT to start using it!

Testing DeepSeek R1 in VSCode with CodeGPT Step-by-step guide to connect: ✨ Select "LLMs Cloud model" ✨ Choose DeepSeek as the provider ✨ Pick the "deepseek-reasoner" model ✨ Select code and/or files from your project and send them to the model That’s it! You’re all set to use this amazing DeepSeek model... 🚀 P.S.: Update to the latest version of CodeGPT to start using it!

Daniel San

34,346 просмотров • 1 год назад

DeepSeek's R1 Model has shocked the World. Here is how it works. #DeepSeek #R1 #AI

DeepSeek's R1 Model has shocked the World. Here is how it works. #DeepSeek #R1 #AI

Gaurav Sen

87,427 просмотров • 1 год назад

Introducing DeepSeek based AI Agents! We built an Executive AI Assistant that can, - Listen to incoming emails - Fetch all your unread emails and summarize the important ones - Draft and Send reply emails Connect with Gmail, Calendar, Slack and more with Composio Frontend designed with Vercel v0 Use Deepseek and other LLMs with Groq Inc with AI SDK code:

Introducing DeepSeek based AI Agents! We built an Executive AI Assistant that can, - Listen to incoming emails - Fetch all your unread emails and summarize the important ones - Draft and Send reply emails Connect with Gmail, Calendar, Slack and more with Composio Frontend designed with Vercel v0 Use Deepseek and other LLMs with Groq Inc with AI SDK code:

Karan Vaidya

79,976 просмотров • 1 год назад

Jaya and Didi discuss #DeepSeek and AI 😄😄

Jaya and Didi discuss #DeepSeek and AI 😄😄

Danish Sait

249,425 просмотров • 1 год назад

For many, DeepSeek's rise was unexpected. But what can we learn from prior internet waves about what might happen next? martin_casado and Steven Sinofsky joined the a16z Podcast to discuss what drove the DeepSeek frenzy and more importantly, what we should take away, through the lens of Internet history. Full episode here:

For many, DeepSeek's rise was unexpected. But what can we learn from prior internet waves about what might happen next? martin_casado and Steven Sinofsky joined the a16z Podcast to discuss what drove the DeepSeek frenzy and more importantly, what we should take away, through the lens of Internet history. Full episode here:

a16z

85,460 просмотров • 1 год назад

I trained a 100 million parameter DeepSeek V3 LLM from scratch Here's what you need to know. Previously I trained traditional GPT-2 architecture which has become obsolete with recent LLM advancements. Most recent models like Llama, Mistral, DeepSeek, and GPT-4 use latest architectures. ✦ Model Configuration of my SLM DeepSeek V3 - Parameters: 109,032,032 - Embedding Dimension: 512 - Layers: 8 - Heads: 8 - Experts (MoE): 8 - Experts per token: 2 ✦ DeepSeek brings major architectural changes: - Multi Head Latent Attention - Mixture of Experts - RMS Norm - Multi Token Prediction ✦ Dataset Challenge - TinyStories is great for learning SLMs. I trained GPT-2 on it previously with good results. - But I needed a more challenging dataset. - If I use TinyStories again on DeepSeek, how would I know MHLA, MoE or MTP works better than old architecture? - The old architecture can handle it, so new DeepSeek would too without utilizing latest advancements. That's why I moved to FineWeb-Edu dataset Thanks Yuvraj Singh (smolhub.com) for the suggestion for this dataset ✦ Training Journey - Rented A100 PCIe GPU and trained the model. - Did test runs. During final run, model was 65% trained but stopped due to glitch after 4 hours. - Fixed all edge cases and ran training again with increased config parameters. - Final training: 7 hours, 20,000 epochs 𝐓𝐨𝐭𝐚𝐥 𝐆𝐏𝐔 𝐜𝐨𝐬𝐭: $17 - $9.53 for main 7-hour run - $7.42 for experiments and demos ✦ Reflection Amazing long project that taught me latest architectural advancements. I'll reimplement and revisit after a few weeks because there's too much complexity, mostly in Multi Head Latent Attention part. Need to make concepts stronger. Code Final trained Model Dataset Resources Huge shoutout to Raj Dandekar again for creating one of the most detailed video series about DeepSeek - this was my primary resource for the implementation. Playlist Blogs by Maarten Grootendorst These are excellent visual blogs to understand MoE in detail. Thanks Maarten for your amazing contributions to the community through your books and blogs Blogs on MoE Implemention of MoE from scratch by @aviTwit3 One of the most detailed blogs on implementing Mixture of Experts. Thanks Avinash for this blog - it helped me understand Mixture of Experts much better. If you're someone in the 𝐌𝐋 & 𝐋𝐋𝐌 space, would love to 𝐜𝐨𝐧𝐧𝐞𝐜𝐭 and discuss this field in general, so give a follow up for that.

I trained a 100 million parameter DeepSeek V3 LLM from scratch Here's what you need to know. Previously I trained traditional GPT-2 architecture which has become obsolete with recent LLM advancements. Most recent models like Llama, Mistral, DeepSeek, and GPT-4 use latest architectures. ✦ Model Configuration of my SLM DeepSeek V3 - Parameters: 109,032,032 - Embedding Dimension: 512 - Layers: 8 - Heads: 8 - Experts (MoE): 8 - Experts per token: 2 ✦ DeepSeek brings major architectural changes: - Multi Head Latent Attention - Mixture of Experts - RMS Norm - Multi Token Prediction ✦ Dataset Challenge - TinyStories is great for learning SLMs. I trained GPT-2 on it previously with good results. - But I needed a more challenging dataset. - If I use TinyStories again on DeepSeek, how would I know MHLA, MoE or MTP works better than old architecture? - The old architecture can handle it, so new DeepSeek would too without utilizing latest advancements. That's why I moved to FineWeb-Edu dataset Thanks Yuvraj Singh (smolhub.com) for the suggestion for this dataset ✦ Training Journey - Rented A100 PCIe GPU and trained the model. - Did test runs. During final run, model was 65% trained but stopped due to glitch after 4 hours. - Fixed all edge cases and ran training again with increased config parameters. - Final training: 7 hours, 20,000 epochs 𝐓𝐨𝐭𝐚𝐥 𝐆𝐏𝐔 𝐜𝐨𝐬𝐭: $17 - $9.53 for main 7-hour run - $7.42 for experiments and demos ✦ Reflection Amazing long project that taught me latest architectural advancements. I'll reimplement and revisit after a few weeks because there's too much complexity, mostly in Multi Head Latent Attention part. Need to make concepts stronger. Code Final trained Model Dataset Resources Huge shoutout to Raj Dandekar again for creating one of the most detailed video series about DeepSeek - this was my primary resource for the implementation. Playlist Blogs by Maarten Grootendorst These are excellent visual blogs to understand MoE in detail. Thanks Maarten for your amazing contributions to the community through your books and blogs Blogs on MoE Implemention of MoE from scratch by @aviTwit3 One of the most detailed blogs on implementing Mixture of Experts. Thanks Avinash for this blog - it helped me understand Mixture of Experts much better. If you're someone in the 𝐌𝐋 & 𝐋𝐋𝐌 space, would love to 𝐜𝐨𝐧𝐧𝐞𝐜𝐭 and discuss this field in general, so give a follow up for that.

Mayank Pratap Singh

48,005 просмотров • 11 месяцев назад

introducing simple-llm: a ~950 line, powerful & extensible inference engine that performs on par with vllm. enjoy :) performance (gpt-oss-120b, on an h100): - batch=1: 135 tok/s (vllm: 138) - batch=64: 4,041 tok/s (vllm: 3,846)

introducing simple-llm: a ~950 line, powerful & extensible inference engine that performs on par with vllm. enjoy :) performance (gpt-oss-120b, on an h100): - batch=1: 135 tok/s (vllm: 138) - batch=64: 4,041 tok/s (vllm: 3,846)

naklecha

59,730 просмотров • 5 месяцев назад

DeepSeek-R1-0528 is now live on Hyperbolic’s Serverless Inference! We also are the first to serve the latest DeepSeek model on Hugging Face. 🟣 Run it instantly:

DeepSeek-R1-0528 is now live on Hyperbolic’s Serverless Inference! We also are the first to serve the latest DeepSeek model on Hugging Face. 🟣 Run it instantly:

Hyperbolic

33,228 просмотров • 1 год назад

Boy, have we got news for you! 🧑‍🍳 The @openwork_ai team is happy to announce: - Amazon Bedrock integration, contributed by our friends at Amazon Web Services (thanks guys!) - Native DeepSeek integration - Integration with and LiteLLM (YC W23)! Here's how what it looks like >>

Boy, have we got news for you! 🧑‍🍳 The @openwork_ai team is happy to announce: - Amazon Bedrock integration, contributed by our friends at Amazon Web Services (thanks guys!) - Native DeepSeek integration - Integration with and LiteLLM (YC W23)! Here's how what it looks like >>

Or Hiltch

41,115 просмотров • 4 месяцев назад

I’ve tested Perplexity + DeepSeek r1 and it works like a charm. Amazing integration

I’ve tested Perplexity + DeepSeek r1 and it works like a charm. Amazing integration

Chubby♨️

19,182 просмотров • 1 год назад

JUST IN: BLACK-OWNED STARTUP AI LAB NIGGACHAIN LABS UNVEILS THEIR LATEST AI MODEL. OUTPERFORMS LATEST OpenAI AND DeepSeek MODELS

JUST IN: BLACK-OWNED STARTUP AI LAB NIGGACHAIN LABS UNVEILS THEIR LATEST AI MODEL. OUTPERFORMS LATEST OpenAI AND DeepSeek MODELS

Niggachain AI Layer 2 🧪⛓️

21,024 просмотров • 1 год назад