Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

LM Studio 0.3.10 is here with 🔮 Speculative Decoding! This provides inferencing speedups, in some cases 2x or more, with no degradation in quality. - Works for both GGUF/llama.cpp and MLX models! - Easily experiment with different draft models - Visualize accepted draft token % rate - Works in... show more

LM Studio

50,007 subscribers

73,791 views • 1 year ago •via X (Twitter)

Gaming Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

New Llama.cpp UI is a blessing for the local AI world 🌎 - Blazing fast, beautiful, and private (ofc) - Use 150,000+ GGUF models in a super slick UI - Drop in PDFs, images, or text documents - Branch and edit conversations anytime - Parallel chats and image processing - Math and code rendering - Constrained generation with JSON schema supported Easy setup + open-source + community-built 🔥

New Llama.cpp UI is a blessing for the local AI world 🌎 - Blazing fast, beautiful, and private (ofc) - Use 150,000+ GGUF models in a super slick UI - Drop in PDFs, images, or text documents - Branch and edit conversations anytime - Parallel chats and image processing - Math and code rendering - Constrained generation with JSON schema supported Easy setup + open-source + community-built 🔥

Victor M

161,109 views • 7 months ago

Batching for vision models is now available in Beta with our latest MLX engine update 👾 The updated engine also brings major improvements to caching for faster inference overall. Turn on Developer Mode, choose the beta runtime channel, and select LM Studio MLX v1.8.1.

Batching for vision models is now available in Beta with our latest MLX engine update 👾 The updated engine also brings major improvements to caching for faster inference overall. Turn on Developer Mode, choose the beta runtime channel, and select LM Studio MLX v1.8.1.

LM Studio

46,015 views • 1 month ago

Jan Desktop v0.7.7 is live 💛 This update brings native MLX support on macOS, a broader UX and UI refresh across the app, and better support for developer workflows. You can now upload files in Projects, use the local API server with both local and remote models, and work more smoothly with tools like Claude Code and other CLIs. Update your Jan or download the latest version at

Jan Desktop v0.7.7 is live 💛 This update brings native MLX support on macOS, a broader UX and UI refresh across the app, and better support for developer workflows. You can now upload files in Projects, use the local API server with both local and remote models, and work more smoothly with tools like Claude Code and other CLIs. Update your Jan or download the latest version at

👋 Jan

28,378 views • 4 months ago

THIS AI IS WILD chatgpt, claude and gemini and 3 more models in one app, you can chat with them at the same time, one shared brain that knows you. no more switching models. no more losing context. try here:

THIS AI IS WILD chatgpt, claude and gemini and 3 more models in one app, you can chat with them at the same time, one shared brain that knows you. no more switching models. no more losing context. try here:

Farhan

20,548 views • 15 days ago

chat app with the new mlx foundation models

chat app with the new mlx foundation models

xavier

58,451 views • 1 year ago

What I have in the works for future Minecraft and Roblox avatars. Thicc OC models are in the works as well. This one is a WIP.

Sensitive content

What I have in the works for future Minecraft and Roblox avatars. Thicc OC models are in the works as well. This one is a WIP.

DangerDrip

188,675 views • 2 years ago

Introducing Parallel Requests for MLX! Multiple requests to the same model can now be processed simultaneously ✨🚄⚡️ Works both in the API and in Split View chats. See it in action 👇🕺

Introducing Parallel Requests for MLX! Multiple requests to the same model can now be processed simultaneously ✨🚄⚡️ Works both in the API and in Split View chats. See it in action 👇🕺

LM Studio

40,763 views • 4 months ago

HOW TO CREATE ANIME EYEBALL EFFECT IN 3D - thread This assumes you already know how to model in Blender and set up materials in Unity! This method works with .vrm models. 1/?

HOW TO CREATE ANIME EYEBALL EFFECT IN 3D - thread This assumes you already know how to model in Blender and set up materials in Unity! This method works with .vrm models. 1/?

刻矛 MAO 🔆🪱🔆 3D Modeler

64,540 views • 3 years ago

Devin for Terminal is a local agent that works with all frontier models, including Opus 4.7, GPT 5.5, and SWE-1.6. You can switch model mid-session, or handoff to Devin in the cloud.

Devin for Terminal is a local agent that works with all frontier models, including Opus 4.7, GPT 5.5, and SWE-1.6. You can switch model mid-session, or handoff to Devin in the cloud.

Cognition

10,129,242 views • 1 month ago

LM Studio 0.3.4 ships with Apple MLX 🚢🍎 Run on-device LLMs super fast, 100% locally and offline on your Apple Silicon Mac! Includes: > run Llama 3.2 1B at ~250 tok/sec (!) on M3 > enforce structured JSON responses > use via chat UI, or from your own code > run multiple models simultaneously > download any model from Hugging Face Video at 1x speed.

LM Studio 0.3.4 ships with Apple MLX 🚢🍎 Run on-device LLMs super fast, 100% locally and offline on your Apple Silicon Mac! Includes: > run Llama 3.2 1B at ~250 tok/sec (!) on M3 > enforce structured JSON responses > use via chat UI, or from your own code > run multiple models simultaneously > download any model from Hugging Face Video at 1x speed.

LM Studio

171,577 views • 1 year ago

Jan v0.6.9 is here: Chat with images now. Highlights: - Multimodal models can see images - MCP Server is now stable feature - Tool calling for gpt-oss (upstream llama.cpp upgrade) - Auto-detect tools & vision - no more manual setup Update your Jan or download the latest.

Jan v0.6.9 is here: Chat with images now. Highlights: - Multimodal models can see images - MCP Server is now stable feature - Tool calling for gpt-oss (upstream llama.cpp upgrade) - Auto-detect tools & vision - no more manual setup Update your Jan or download the latest.

👋 Jan

22,604 views • 9 months ago

After months of work, and with the help of our awesome community, we're excited to finally share LM Studio 0.3.0! 🎉 🔥 What's new: - Built-in Chat with Documents, 100% offline - OpenAI-like 'Structured Outputs' API with any local model - Total UI revamp (with dark/light/sepia themes) - Load & serve multiple LLMs *on the local network* - Available in 7 languages! 🌎🌍🌏 - Download any supported model from Hugging Face - Update LLM runtimes (llama.cpp) separately from the app ... and tons more goodies! Let us know how you like it! 👾🤝

After months of work, and with the help of our awesome community, we're excited to finally share LM Studio 0.3.0! 🎉 🔥 What's new: - Built-in Chat with Documents, 100% offline - OpenAI-like 'Structured Outputs' API with any local model - Total UI revamp (with dark/light/sepia themes) - Load & serve multiple LLMs on the local network - Available in 7 languages! 🌎🌍🌏 - Download any supported model from Hugging Face - Update LLM runtimes (llama.cpp) separately from the app ... and tons more goodies! Let us know how you like it! 👾🤝

LM Studio

142,578 views • 1 year ago

We made it easy to run local models on your computer and use them from your phone. A secure, end-to-end encrypted connection in just a few clicks. Available now with LM Studio and Locally AI - Local AI Chat.

We made it easy to run local models on your computer and use them from your phone. A secure, end-to-end encrypted connection in just a few clicks. Available now with LM Studio and Locally AI - Local AI Chat.

Adrien Grondin

12,045 views • 15 days ago

This symmetric diffusion paper at ICLR is nice (simple idea in retrospect): SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models We'd actually implemented this idea internally at Orbital, and it works nicely even for very large crystal structures:

This symmetric diffusion paper at ICLR is nice (simple idea in retrospect): SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models We'd actually implemented this idea internally at Orbital, and it works nicely even for very large crystal structures:

Mark Neumann

18,188 views • 1 year ago

Google Places API meets shadcn ui. ◆ Highlight matching text in suggestions ◆ Simple loading state ◆ Fetch best-matching places and by ID ◆ Works with any country code Inspired by Stripe checkout.

Google Places API meets shadcn ui. ◆ Highlight matching text in suggestions ◆ Simple loading state ◆ Fetch best-matching places and by ID ◆ Works with any country code Inspired by Stripe checkout.

Maximilian Kaske 🏓

80,294 views • 1 year ago

Working with multiple models in Chat? The model picker in VS Code is now organized by provider, making it easier to browse, search, and switch between models. You'll also see provider names next to your recent models for quicker recognition. 💡 Tip: Use /models for quick access.

Working with multiple models in Chat? The model picker in VS Code is now organized by provider, making it easier to browse, search, and switch between models. You'll also see provider names next to your recent models for quicker recognition. 💡 Tip: Use /models for quick access.

Visual Studio Code

32,599 views • 1 month ago

we sped up distributed inference by up to 5x with decentralized speculative decoding. many don't realize that AI models normally generate text one single word at a time, waiting for the network after every word. speculative decoding changes this by using a "guess & confirm" system, similar to autocomplete. how it's done: 1. draft locally (the guess) instead of waiting for the network, a tiny, fast model on your device guesses the next few words instantly, without waiting for the network. 2. confirm remotely (the check) the massive remote model doesn't generate from scratch; it just checks the draft. it looks at the guesses in a batch and says "yes, yes, no." you get multiple words in the time it usually takes to get one. 3. adaptive logic dsd is smart. if the topic is creative, it lets the draft flow loose. if the topic is math or code, it checks more strictly. it balances speed and precision automatically so your inference almost feel instant. find out more: paper: blog:

we sped up distributed inference by up to 5x with decentralized speculative decoding. many don't realize that AI models normally generate text one single word at a time, waiting for the network after every word. speculative decoding changes this by using a "guess & confirm" system, similar to autocomplete. how it's done: 1. draft locally (the guess) instead of waiting for the network, a tiny, fast model on your device guesses the next few words instantly, without waiting for the network. 2. confirm remotely (the check) the massive remote model doesn't generate from scratch; it just checks the draft. it looks at the guesses in a batch and says "yes, yes, no." you get multiple words in the time it usually takes to get one. 3. adaptive logic dsd is smart. if the topic is creative, it lets the draft flow loose. if the topic is math or code, it checks more strictly. it balances speed and precision automatically so your inference almost feel instant. find out more: paper: blog:

Parallax

45,129 views • 5 months ago

Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups. > MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass. Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!

Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups. > MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass. Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!

🚨 AI News | TestingCatalog

46,013 views • 1 month ago

Screenshotsaturday This week i made 36 draft animations for Kimmerians Hockey game prototype. Some are more, some are less drafty Discount for not pixel art animations is still works)

Screenshotsaturday This week i made 36 draft animations for Kimmerians Hockey game prototype. Some are more, some are less drafty Discount for not pixel art animations is still works)

Andrey Gogiya

12,026 views • 1 year ago