Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

433,003 subscribers

60,588 просмотров • 9 месяцев назад •via X (Twitter)

Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

AK

42,676 просмотров • 9 месяцев назад

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,685 просмотров • 9 месяцев назад

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Meng To

14,152 просмотров • 16 дней назад

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Andrej Baranovskij

30,645 просмотров • 1 год назад

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

74,392 просмотров • 1 год назад

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

Joseph Semrai

239,697 просмотров • 2 лет назад

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

cocktail peanut

11,405 просмотров • 1 год назад

Planning with Reasoning using Vision Language World Model

Planning with Reasoning using Vision Language World Model

AK

26,274 просмотров • 9 месяцев назад

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Ivan Fioravanti ᯅ

24,324 просмотров • 1 год назад

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Pierre-Louis Biojout (PLB)

39,794 просмотров • 1 год назад

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

M5Stack

16,383 просмотров • 1 год назад

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

Marques Brownlee

2,563,944 просмотров • 2 лет назад

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Brad Lynch

16,712 просмотров • 7 месяцев назад

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

Spatial Insider

13,398 просмотров • 22 дней назад

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

47,032 просмотров • 1 год назад

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

AK

19,012 просмотров • 9 месяцев назад

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,151 просмотров • 3 лет назад

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

Shubham Saboo

42,261 просмотров • 1 год назад

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

Justin Ryan ᯅ

52,920 просмотров • 1 год назад