Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

433,003 subscribers

60,588 Aufrufe • vor 9 Monaten •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

Apple released FastVLM so I tried vibe coding a video captioning AI app with it took 5 prompts to get a working app in anycoder and deployed it on Hugging Face 85x faster and 3.4x smaller than comparable sized VLMs the deployed app works 100% locally in your browser powered by transformers.js and WebGPU

AK

42,677 Aufrufe • vor 9 Monaten

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,685 Aufrufe • vor 9 Monaten

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Day 88 vibe coding my AI video editor with SwiftUI It's insane how Apple provides everything, like removing the background of a video without using a single token.

Meng To

14,152 Aufrufe • vor 23 Tagen

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX) I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.

Andrej Baranovskij

30,645 Aufrufe • vor 1 Jahr

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

74,392 Aufrufe • vor 1 Jahr

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

inference on your head mistral 7b (4bit quantized) running locally on apple vision pro

Joseph Semrai

239,718 Aufrufe • vor 2 Jahren

Planning with Reasoning using Vision Language World Model

Planning with Reasoning using Vision Language World Model

AK

26,274 Aufrufe • vor 9 Monaten

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

Instant AutoGPT Launcher I wrote a launcher for AutoGPT today. Play with LLM agents using a ComfyUI-like node based interface. Even works with local LLM via ollama (shown in the video) Works on all platforms (windows, linux, mac). Install with 1 click. Use with 0 click.

cocktail peanut

11,405 Aufrufe • vor 1 Jahr

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Reasoning on Apple Vision Pro with Apple MLX and DeepSeek R1 Qwen 7B 4bit! 🔥 14 tokens per sec! 🔥 Note: sorry for the shaking footage, I was excited to see it running 😂

Ivan Fioravanti ᯅ

24,324 Aufrufe • vor 1 Jahr

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Why pay $3500 for the Apple Vision Pro? Control a robot with your hands using the phospho dev kit

Pierre-Louis Biojout (PLB)

39,794 Aufrufe • vor 1 Jahr

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

#M5StackNew 🎊 The LLM630 Compute Kit is an #AI large language model (#LLM) inference development kit, powered by the #Axera #AX630C SoC with a 3.2 TOPs NPU, it delivers efficient AI inference for tasks like computer vision (CV) and LLM processing.

M5Stack

16,383 Aufrufe • vor 1 Jahr

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

NEW VIDEO - What Using Apple Vision Pro is Actually Like! Full 35 minutes in-depth video:

Marques Brownlee

2,563,944 Aufrufe • vor 2 Jahren

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Drawing on flat surfaces with Logitech Muse on Apple Vision Pro And yes, it has haptic feedback whenever you start drawing or touching a window (using the TouchDesk app)

Brad Lynch

16,712 Aufrufe • vor 7 Monaten

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

SpatialGen just announced Zeus, a dedicated hardware system for live Apple Immersive Video streaming. It supports live 16K immersive video encoding with 90 FPS streaming. This is a huge step toward more live immersive experiences on Apple Vision Pro.

Spatial Insider

13,487 Aufrufe • vor 28 Tagen

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

I just built a RAG Agent with web search using Cohere's ⌘R 7B model. 100% Opensource Code with step-by-step tutorial.

Shubham Saboo

47,032 Aufrufe • vor 1 Jahr

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). vibe coding a Video Chat AI app with MiniCPM-V-4.5 in anycoder

AK

19,012 Aufrufe • vor 9 Monaten

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,158 Aufrufe • vor 3 Jahren

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

I built a multimodal AI Coding Agent team with multi-agents. It has 3 AI agents working together as a team to generate and execute the code: 1. Coding Agent using o-3 mini 2. Vision Agent using Gemini 3. Code Execution Agent using o-3 mini and E2B 100% Opensource Code.

Shubham Saboo

42,261 Aufrufe • vor 1 Jahr

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

I turned my living room into a full-on Blockbuster with Apple Vision Pro! Posters. Shelves. Trailers. The works. The app is called ReelRoom and I love it! If you miss the Friday night video store vibe, this brings it all back.

Justin Ryan ᯅ

52,920 Aufrufe • vor 1 Jahr