正在加载视频...
视频加载失败
🚀Introducing LLaVA-NeXT Interleave: Now AI can understand and reason with multiple images at once - This opens up multi-image scenarios like multi-frame videos, multi-view 3D, and multiple inter-leaved images. - An all round LMM that can understand videos, images, and 3D More⬇️
27,655 次观看 • 1 年前 •via X (Twitter)
8 条评论

LLaVA-NeXT-Interleave🔥 - Interleave data format unifies different tasks. - New datasets on 🤗Hub: 1️⃣M4-Instruct, high-quality dataset, 1.1M samples from domains: multi-image, video, 3D & single-image 2️⃣LLaVA-Interleave Bench - Set of tasks to evaluate multi-image capabilities

LLaVA-NeXT-Interleave💪 - Attached videos show how it can explain jokes and understand content spread in multiple images and videos 🤯 - SoTA Performance, both, in multi and single images - Matches in perf with LLaVA-NeXT - Improved performance in video tasks

Gradio Multimodal Demo for LLaVA-NeXT-Interleave😍 : Models and Datasets are on 🤗 Hub:

how to finetune?

How can you refer to the order of the images in the prompt? Simply saying "first image" is enough? Like -"is the object in the first image shown in the second image"

How does it understand 3D?

Different views as multiple image input

Love this! 💡
