Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

433,003 subscribers

60,588 görüntüleme • 9 ay önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

AK

42,677 görüntüleme • 9 ay önce

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,685 görüntüleme • 9 ay önce

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Meng To

14,152 görüntüleme • 24 gün önce

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Andrej Baranovskij

30,645 görüntüleme • 1 yıl önce

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

74,531 görüntüleme • 1 yıl önce

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

Joseph Semrai

239,718 görüntüleme • 2 yıl önce

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

cocktail peanut

11,405 görüntüleme • 1 yıl önce

Planning with Reasoning using Vision Language World Model

Planning with Reasoning using Vision Language World Model

AK

26,274 görüntüleme • 9 ay önce

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Ivan Fioravanti ᯅ

24,324 görüntüleme • 1 yıl önce

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

M5Stack

16,383 görüntüleme • 1 yıl önce

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Pierre-Louis Biojout (PLB)

39,794 görüntüleme • 1 yıl önce

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

Marques Brownlee

2,563,944 görüntüleme • 2 yıl önce

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Brad Lynch

16,712 görüntüleme • 7 ay önce

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

Spatial Insider

13,487 görüntüleme • 29 gün önce

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

47,032 görüntüleme • 1 yıl önce

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

AK

19,012 görüntüleme • 9 ay önce

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

Shubham Saboo

42,261 görüntüleme • 1 yıl önce

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,158 görüntüleme • 3 yıl önce

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

Justin Ryan ᯅ

52,920 görüntüleme • 1 yıl önce