Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

Charles 🎉 Frye

18,786 subscribers

17,452 Aufrufe • vor 1 Monat •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Microsoft killed the GPU mafia 🤯 They finally open-sourced their 1-bit LLM inference framework called bitnet.cpp. It lets you run 100B parameter models on your local CPU without GPUs. - 6.17x faster inference - 82.2% less energy on CPUs 100% Open Source.

Microsoft killed the GPU mafia 🤯 They finally open-sourced their 1-bit LLM inference framework called bitnet.cpp. It lets you run 100B parameter models on your local CPU without GPUs. - 6.17x faster inference - 82.2% less energy on CPUs 100% Open Source.

Oliver Prompts

1,627,569 Aufrufe • vor 5 Monaten

Real-time Moondream inference using our new inference engine

Real-time Moondream inference using our new inference engine

vik

144,409 Aufrufe • vor 3 Monaten

Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores. Congratulations Georgi Gerganov ! This is a triumph.

Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores. Congratulations Georgi Gerganov ! This is a triumph.

Nat Friedman

1,764,082 Aufrufe • vor 3 Jahren

First steps for a specialized DeepSeek v4 Flash inference engine focused on inference quality / stability at different quantizations, with networked API that is batching capable. This is the 2 bit quants model running on my M3 Max 128GB.

First steps for a specialized DeepSeek v4 Flash inference engine focused on inference quality / stability at different quantizations, with networked API that is batching capable. This is the 2 bit quants model running on my M3 Max 128GB.

antirez

14,176 Aufrufe • vor 1 Monat

It’s live. AxonDAO’s GPU fleet is installed and will soon be rentable on the open market via vast.ai. Premium GPU capacity - 8x NVIDIA B200 and 8x RTX Pro 6000 - built for large-scale AI training, inference, rendering, and research workloads. 🧵

It’s live. AxonDAO’s GPU fleet is installed and will soon be rentable on the open market via vast.ai. Premium GPU capacity - 8x NVIDIA B200 and 8x RTX Pro 6000 - built for large-scale AI training, inference, rendering, and research workloads. 🧵

AxonDAO

17,580 Aufrufe • vor 4 Monaten

My AI broke the world record on Tempest yesterday! But I still hold the human record :-) [on Extreme difficulty settings] Here's a little demo reel of the Tempest AI doing inference and training at the same time up on the hardest Tempest levels. This is all running on our Dell Technologies 7875 Workstation, with the 9995WX CPU handling 2000 fps of Tempest while the dual Blackwell RTX6000 GPUs do inference and training.

My AI broke the world record on Tempest yesterday! But I still hold the human record :-) [on Extreme difficulty settings] Here's a little demo reel of the Tempest AI doing inference and training at the same time up on the hardest Tempest levels. This is all running on our Dell Technologies 7875 Workstation, with the 9995WX CPU handling 2000 fps of Tempest while the dual Blackwell RTX6000 GPUs do inference and training.

Dave W Plummer

37,372 Aufrufe • vor 4 Monaten

Today, Nesa is excited to announce it is partnering with io.net to bring their decentralized fleet of GPUs to Nesa. Nesa and are natural complements to one another. Nesa’s private, distributed AI inference executed on ocean of decentralized compute means greater accessibility for the network. Together, we are one step closer to decentralizing AI for all.

Today, Nesa is excited to announce it is partnering with io.net to bring their decentralized fleet of GPUs to Nesa. Nesa and are natural complements to one another. Nesa’s private, distributed AI inference executed on ocean of decentralized compute means greater accessibility for the network. Together, we are one step closer to decentralizing AI for all.

Nesa

257,408 Aufrufe • vor 2 Jahren

Laika AI x Inference Labs Excited to announce our partnership with Inference Labs We're providing our real-time RAG & AI model API to Inference Labs, powering their verification infrastructure with live blockchain data. Inference Labs delivers open-source, trustless verification for AI agent outputs, so you can trust what you see—without relying on centralized gatekeepers.

Laika AI x Inference Labs Excited to announce our partnership with Inference Labs We're providing our real-time RAG & AI model API to Inference Labs, powering their verification infrastructure with live blockchain data. Inference Labs delivers open-source, trustless verification for AI agent outputs, so you can trust what you see—without relying on centralized gatekeepers.

Laika AI

13,727 Aufrufe • vor 1 Jahr

Depth Anything 3 now runs as pure C++/ggml (ggml) . No Python, no PyTorch, no CUDA toolkit at inference, just one self-contained GGUF. It's faster than PyTorch on CPU! and ties speed on GPU. The CPU win came from the last place..I'd have looked. Quantized GGUF on Hugging Face🤗 Shout out to Georgi Gerganov for ggml (we are building a ggml-world!❤️) and to ByteDance Open Source and Depth Anything 3 authors Bingyi Kang Jun Hao Liew Donny Y. Chen !

Depth Anything 3 now runs as pure C++/ggml (ggml) . No Python, no PyTorch, no CUDA toolkit at inference, just one self-contained GGUF. It's faster than PyTorch on CPU! and ties speed on GPU. The CPU win came from the last place..I'd have looked. Quantized GGUF on Hugging Face🤗 Shout out to Georgi Gerganov for ggml (we are building a ggml-world!❤️) and to ByteDance Open Source and Depth Anything 3 authors Bingyi Kang Jun Hao Liew Donny Y. Chen !

Ettore Di Giacinto

33,985 Aufrufe • vor 9 Tagen

Crypto Payments Are Now Live on Hyperbolic You now have all the tools needed to create an autonomous AI society. Get started paying for GPUs and inference services using USDC, USDT, or DAI on Base at

Crypto Payments Are Now Live on Hyperbolic You now have all the tools needed to create an autonomous AI society. Get started paying for GPUs and inference services using USDC, USDT, or DAI on Base at

Hyperbolic

10,870 Aufrufe • vor 1 Jahr

You can now run inference directly on the Llama 4 Hugging Face model page – powered by Together AI!

You can now run inference directly on the Llama 4 Hugging Face model page – powered by Together AI!

Together AI

21,489 Aufrufe • vor 1 Jahr

Meigen MultiTalk @gradio demo is available on Hugging Face 🤗 Duplicate on L40S for personal and unlimited inference, enjoy ! *Compatible with multi-GPU too 😉

Meigen MultiTalk @gradio demo is available on Hugging Face 🤗 Duplicate on L40S for personal and unlimited inference, enjoy ! *Compatible with multi-GPU too 😉

Sylvain Filoni

18,055 Aufrufe • vor 1 Jahr

parakeet.cpp: native C++/ggml (ggml) inference for NVIDIA AI Developer's Parakeet, one of the best speech-to-text models out there, from the LocalAI team. Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU. Quantized GGUF on Hugging Face 🤗 Huge thanks to Georgi Gerganov for ggml and to NVIDIA AI Developer for releasing Parakeet! 🧵

parakeet.cpp: native C++/ggml (ggml) inference for NVIDIA AI Developer's Parakeet, one of the best speech-to-text models out there, from the LocalAI team. Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU. Quantized GGUF on Hugging Face 🤗 Huge thanks to Georgi Gerganov for ggml and to NVIDIA AI Developer for releasing Parakeet! 🧵

Ettore Di Giacinto

55,603 Aufrufe • vor 28 Tagen

Excited to share our NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper:

Excited to share our NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper:

Felix Petersen

157,469 Aufrufe • vor 1 Jahr

Snapshots. Scenes. 📸

Snapshots. Scenes. 📸

U.S. Soccer Men's National Team

51,276 Aufrufe • vor 8 Tagen

Your GPU is sitting idle right now. It could be earning money. Download the Earn Module. Click start. That's the whole setup. While you sleep, scroll X, or binge Netflix, your graphics card powers AI inference and stacks GUSD in your account. Zero technical knowledge required.

Your GPU is sitting idle right now. It could be earning money. Download the Earn Module. Click start. That's the whole setup. While you sleep, scroll X, or binge Netflix, your graphics card powers AI inference and stacks GUSD in your account. Zero technical knowledge required.

GamerHash AI

729,041 Aufrufe • vor 2 Monaten

Dolphin Inference Network node operation is now live for anyone who would like to beta test before we go into production $POD rewards live for testers Repurposing idle GPUs to run Qwen 3.5 35B MoE

Dolphin Inference Network node operation is now live for anyone who would like to beta test before we go into production $POD rewards live for testers Repurposing idle GPUs to run Qwen 3.5 35B MoE

Dolphin

75,297 Aufrufe • vor 2 Monaten

Virtuals Protocol is integrating Venice to power AI agent building with private, uncensored inference, available to anyone, anywhere on Base. Venice brings best-in-class privacy-first inference. Virtuals EconomyOS brings the full agent infrastructure stack: wallets, identity, payments, commerce, funding rails, and launch infrastructure. We are deploying up to $400,000 in private inference credits so anyone can move from idea to working agent without compute or backend complexity getting in the way. Start your AI journey. Inference and infra are on us. Program details soon.

Virtuals Protocol is integrating Venice to power AI agent building with private, uncensored inference, available to anyone, anywhere on Base. Venice brings best-in-class privacy-first inference. Virtuals EconomyOS brings the full agent infrastructure stack: wallets, identity, payments, commerce, funding rails, and launch infrastructure. We are deploying up to $400,000 in private inference credits so anyone can move from idea to working agent without compute or backend complexity getting in the way. Start your AI journey. Inference and infra are on us. Program details soon.

Virtuals Protocol

310,370 Aufrufe • vor 26 Tagen

Some snapshots from Regions Tradition for Daily Wire ⛳️.

Some snapshots from Regions Tradition for Daily Wire ⛳️.

Lynden Blake

31,039 Aufrufe • vor 1 Monat

🚀 Hyra AI: 2,500,000+ Connected Devices From smartphones and tablets to IoT and edge GPUs, our growing network proves: Idle compute isn’t idle anymore. Every connected device contributes to powering real-time inference, edge learning, and decentralized AI applications across 205 countries. ⚙️ Scalable ⚡️ On-demand 🌍 Open to all Next stop? 10 million???

🚀 Hyra AI: 2,500,000+ Connected Devices From smartphones and tablets to IoT and edge GPUs, our growing network proves: Idle compute isn’t idle anymore. Every connected device contributes to powering real-time inference, edge learning, and decentralized AI applications across 205 countries. ⚙️ Scalable ⚡️ On-demand 🌍 Open to all Next stop? 10 million???

Hyra AI

21,612 Aufrufe • vor 10 Monaten