Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

PSA: DeepSeek R1 Distill Llama 70B speculative decoding version is now live on Groq Inc for Dev Tier. We just made fast even faster for instant reasoning. 🏁

Hatice Ozen

5,760 subscribers

44,675 views • 1 year ago •via X (Twitter)

Science & Technology Education

Anya Rossi• Live Now

Private livecam show

11 Comments

Hatice Ozen1 year ago

1/5 What is speculative decoding? It's a technique that uses a smaller, faster model to predict a sequence of tokens, which are then verified by the main, more powerful model in parallel. The main model evaluates these predictions and determines which tokens to keep or reject.

Hatice Ozen1 year ago

2/5 Speculative decoding achieves faster inference because the main model can verify multiple tokens in parallel rather than generating them one-by-one. This parallel verification is significantly faster than traditional sequential token generation.

Hatice Ozen1 year ago

3/5 Think of it like pair programming where your junior dev (small model) writes the first draft of code, and the senior dev (large model) reviews and corrects it. When the junior gets it right and the draft aligns, you save a lot of time.

Hatice Ozen1 year ago

4/5 The efficiency comes from parallel verification - while the main model still verifies each token, it can do this simultaneously for many tokens. When wrong? No problem, the main model corrects course. This means much faster inference without having to compromise on quality.

Hatice Ozen1 year ago

5/5 Really excited for you all to try it. Will get around to doc updates, but you can just use the `deepseek-r1-distill-llama-70b-specdec` model ID to try. Let us know what else you'd like to see below and have fun building with instant reasoning! 💪

Ben Everman1 year ago

@GroqInc Any plans for 670B?

Jasper1 year ago

@GroqInc The speed is insane! Would love to have your machines and models on our platform

Mike Sulka1 year ago

@GroqInc Great stuff!

Charlie Greenman1 year ago

@GroqInc cool

Rish e/acc1 year ago

@GroqInc This is fkin awesome, I hadn’t heard of speculative deckding. Can you recommend any literature etc on the subject?

Hatice Ozen1 year ago

@GroqInc 100% agree and recommend this white paper to learn more:

Related Videos

deepseek-r1-distill-llama-70b on groq

deepseek-r1-distill-llama-70b on groq

Developers Digest

143,277 views • 1 year ago

HOLY FUNIONS! Groq makes DeepSeek R1 feel instant! Cooked up a chatbot using their Distill Llama 70B. Releasing the repo tomorrow. You control your data flow.

HOLY FUNIONS! Groq makes DeepSeek R1 feel instant! Cooked up a chatbot using their Distill Llama 70B. Releasing the repo tomorrow. You control your data flow.

Ray Fernando

54,028 views • 1 year ago

DeepSeek R1 on Groq is insanely fast Deepseek R1 70B Distill + Groq LPU its now available in ai-gradio pip install --upgrade "ai-gradio[groq]" import gradio as gr import ai_gradio gr.load( name='groq:deepseek-r1-distill-llama-70b', src=ai_gradio.registry, coder=True # remove coder for regular chat ).launch() prompt: write a script for a bouncing yellow ball within a square, make sure to handle collision detection properly. make the square slowly rotate. implement it in p5.js. make sure ball stays within the square

DeepSeek R1 on Groq is insanely fast Deepseek R1 70B Distill + Groq LPU its now available in ai-gradio pip install --upgrade "ai-gradio[groq]" import gradio as gr import ai_gradio gr.load( name='groq:deepseek-r1-distill-llama-70b', src=ai_gradio.registry, coder=True # remove coder for regular chat ).launch() prompt: write a script for a bouncing yellow ball within a square, make sure to handle collision detection properly. make the square slowly rotate. implement it in p5.js. make sure ball stays within the square

AK

69,829 views • 1 year ago

Stagehand is getting really fast ... See how Llama 3 70b on Groq Inc outperforms OpenAI 4o in speed on this form fill 54.48 seconds for groq VS 62.75 for 4o🤯

Stagehand is getting really fast ... See how Llama 3 70b on Groq Inc outperforms OpenAI 4o in speed on this form fill 54.48 seconds for groq VS 62.75 for 4o🤯

Kyle Jeong

26,095 views • 1 year ago

HUGE PSA: Meta's Llama 4 Scout (17Bx16E MoE) is now live on Groq Inc for all users via console playground and Groq API. This conversational beast with native multimodality just dropped today and we're excited to offer Day 0 support so you can build fast.

HUGE PSA: Meta's Llama 4 Scout (17Bx16E MoE) is now live on Groq Inc for all users via console playground and Groq API. This conversational beast with native multimodality just dropped today and we're excited to offer Day 0 support so you can build fast.

Hatice Ozen

15,221 views • 1 year ago

PSA: Qwen's Qwen-2.5-Coder-32B-Instruct is now live on Groq Inc for insanely fast (and smart) code generation. See below for instructions to add to Cursor.

PSA: Qwen's Qwen-2.5-Coder-32B-Instruct is now live on Groq Inc for insanely fast (and smart) code generation. See below for instructions to add to Cursor.

Hatice Ozen

36,643 views • 1 year ago

Today, we're building a Streamlit app to compare MetaAI's Llama 4 against DeepSeek-R1 using RAG. Tech stack: - @Llama_Index workflows for orchestration - Comet Opik for evaluation - Groq Inc for blazing-fast inference (FREE) Let's go! 🚀

Today, we're building a Streamlit app to compare MetaAI's Llama 4 against DeepSeek-R1 using RAG. Tech stack: - @Llama_Index workflows for orchestration - Comet Opik for evaluation - Groq Inc for blazing-fast inference (FREE) Let's go! 🚀

Akshay 🚀

23,338 views • 1 year ago

Deepseek 70B Distill is now on for those of you wanting to try it out and it's available in for developers!

Deepseek 70B Distill is now on for those of you wanting to try it out and it's available in for developers!

Groq Inc

40,817 views • 1 year ago

PSA: Prompt caching is now live for moonshotai/kimi-k2-instruct on Groq Inc: → 50% discount for cached tokens → Lower latency → Automatic prefix matching Excited to roll this out starting with Kimi K2 and even more excited for inexpensive, fast vibe coding for all. ⚡️

PSA: Prompt caching is now live for moonshotai/kimi-k2-instruct on Groq Inc: → 50% discount for cached tokens → Lower latency → Automatic prefix matching Excited to roll this out starting with Kimi K2 and even more excited for inexpensive, fast vibe coding for all. ⚡️

Hatice Ozen

67,965 views • 10 months ago

PSA: Kimi.ai just launched a huge update to Kimi-K2-0905 and it's now live on Groq Inc for instant inference. Highlights? 256K context window + improvements to coding/tool calling capabilities that outperform Claude Sonnet 4. 🚀

PSA: Kimi.ai just launched a huge update to Kimi-K2-0905 and it's now live on Groq Inc for instant inference. Highlights? 256K context window + improvements to coding/tool calling capabilities that outperform Claude Sonnet 4. 🚀

Hatice Ozen

21,842 views • 9 months ago

Using LLaMA 3 70B on Groq Inc to instantly refactor and document code. The implications for software engineering are wild. Gone are the days of waiting on an LLM for suggestions or code changes. Now, it's an instant feedback loop. Demo link in the comments:

Using LLaMA 3 70B on Groq Inc to instantly refactor and document code. The implications for software engineering are wild. Gone are the days of waiting on an LLM for suggestions or code changes. Now, it's an instant feedback loop. Demo link in the comments:

Matt Shumer

442,655 views • 2 years ago

PSA: OpenAI is putting the Open back in OpenAI and Groq Inc has Day 0 support. 🤗 GPT-OSS 20B and 120B, hybrid-reasoning models with built-in browser search and code execution are now live for instant inference. P.S. We've also launched OpenAI Responses API compatibility.

PSA: OpenAI is putting the Open back in OpenAI and Groq Inc has Day 0 support. 🤗 GPT-OSS 20B and 120B, hybrid-reasoning models with built-in browser search and code execution are now live for instant inference. P.S. We've also launched OpenAI Responses API compatibility.

Hatice Ozen

80,584 views • 10 months ago

What can you do with Llama quality and Groq speed? Instant. That's what. 3 months back: Llama 8B running at 750 Tokens/sec Now: Llama 70B model running at 3,200 Tokens/sec We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉

What can you do with Llama quality and Groq speed? Instant. That's what. 3 months back: Llama 8B running at 750 Tokens/sec Now: Llama 70B model running at 3,200 Tokens/sec We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉

Jonathan Ross

179,815 views • 1 year ago

Lovable + DeepSeek R1 = 🏃💨 😉 This will significantly improve prompt curation and PRD development @lovable_dev Add-Ons v1.1.3 Download: @lovable_dev Groq Inc DeepSeek #Lovable #aicode #DeepSeekAI #groq

Lovable + DeepSeek R1 = 🏃💨 😉 This will significantly improve prompt curation and PRD development @lovable_dev Add-Ons v1.1.3 Download: @lovable_dev Groq Inc DeepSeek #Lovable #aicode #DeepSeekAI #groq

Rezaul Arif

18,818 views • 1 year ago

Llama2 70B - blazing fast, low wattage, low cost Groq Inc LPU (language processing unit) running on Groq racks put together in days. High T/s, low T/watt, low $/T specifically for inference.

Llama2 70B - blazing fast, low wattage, low cost Groq Inc LPU (language processing unit) running on Groq racks put together in days. High T/s, low T/watt, low $/T specifically for inference.

Jay Z

15,799 views • 2 years ago

Running multi agents on open source models has always been a hit or miss until... Llama 3.1 on Groq Inc with 405b and 70b

Running multi agents on open source models has always been a hit or miss until... Llama 3.1 on Groq Inc with 405b and 70b

FlowiseAI

48,208 views • 1 year ago

🚨 BREAKING: NVIDIA JUST announced roadmap for physical AI, robotics and national-scale AI factories. Here’s a breakdown of the top important announcements: 🧵👇 1. DeepSeek R1 is now 4x faster, setting the standard for AI in inference and reasoning.

🚨 BREAKING: NVIDIA JUST announced roadmap for physical AI, robotics and national-scale AI factories. Here’s a breakdown of the top important announcements: 🧵👇 1. DeepSeek R1 is now 4x faster, setting the standard for AI in inference and reasoning.

Shruti

1,722,831 views • 1 year ago