正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

433,003 subscribers

60,588 次观看 • 9 个月前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

AK

42,677 次观看 • 9 个月前

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,685 次观看 • 9 个月前

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Meng To

14,152 次观看 • 27 天前

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Andrej Baranovskij

30,645 次观看 • 1 年前

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

74,531 次观看 • 1 年前

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

Joseph Semrai

239,718 次观看 • 2 年前

Planning with Reasoning using Vision Language World Model

Planning with Reasoning using Vision Language World Model

AK

26,274 次观看 • 9 个月前

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

cocktail peanut

11,405 次观看 • 1 年前

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Ivan Fioravanti ᯅ

24,324 次观看 • 1 年前

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

M5Stack

16,383 次观看 • 1 年前

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Pierre-Louis Biojout (PLB)

39,794 次观看 • 1 年前

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

Marques Brownlee

2,563,944 次观看 • 2 年前

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Brad Lynch

16,712 次观看 • 7 个月前

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

Spatial Insider

13,487 次观看 • 1 个月前

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

47,032 次观看 • 1 年前

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

AK

19,012 次观看 • 9 个月前

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

Shubham Saboo

42,269 次观看 • 1 年前

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,158 次观看 • 3 年前

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

Justin Ryan ᯅ

52,920 次观看 • 1 年前