
TensorBlock
@tensorblock_aoi • 1,737 subscribers
Making AI accessible and democratic for all. https://t.co/5CODh8MUDk
Videos

QwQ-32B (Qwen ) model sharding between M1 MacBook Pro (16GB) + RTX 4060 Ti. Enabling efficient inference through model quantization and cross-device parallel computing. Demonstrating production-ready performance on consumer hardware. Technical benchmarks coming soon.
TensorBlock23,874 просмотров • 1 год назад

Successfully deployed Deepseek R1 Distilled 70B (AWQ) across 8x NVIDIA RTX 3080 10G GPUs, achieving 60 tokens/s with full tensor parallelism via PCIe. Total hardware cost: $6,400 This demonstrates that consumer GPUs can deliver substantial ML inference capabilities at a fraction of the cost of datacenter hardware. For perspective, a single A100 80G costs $17,550, while a H100 80G runs $25,000. Our testing validates an often-overlooked opportunity: millions of idle crypto mining rigs could be repurposed into a powerful, distributed AI infrastructure. The performance-to-cost ratio of consumer GPUs, especially when properly optimized for tensor operations, presents a compelling case for decentralized AI compute. We're excited to share more insights as we continue pushing the boundaries of consumer hardware in AI workloads.
TensorBlock15,896 просмотров • 1 год назад
Больше нет контента для загрузки