正在加载视频...

视频加载失败

🚀Introducing LLaVA-NeXT Interleave: Now AI can understand and reason with multiple images at once - This opens up multi-image scenarios like multi-frame videos, multi-view 3D, and multiple inter-leaved images. - An all round LMM that can understand videos, images, and 3D More⬇️

27,655 次观看 • 1 年前 •via X (Twitter)

8 条评论

Gradio 的头像
Gradio1 年前

LLaVA-NeXT-Interleave🔥 - Interleave data format unifies different tasks. - New datasets on 🤗Hub: 1️⃣M4-Instruct, high-quality dataset, 1.1M samples from domains: multi-image, video, 3D & single-image 2️⃣LLaVA-Interleave Bench - Set of tasks to evaluate multi-image capabilities

Gradio 的头像
Gradio1 年前

LLaVA-NeXT-Interleave💪 - Attached videos show how it can explain jokes and understand content spread in multiple images and videos 🤯 - SoTA Performance, both, in multi and single images - Matches in perf with LLaVA-NeXT - Improved performance in video tasks

Gradio 的头像
Gradio1 年前

Gradio Multimodal Demo for LLaVA-NeXT-Interleave😍 : Models and Datasets are on 🤗 Hub:

Stark 的头像
Stark1 年前

how to finetune?

Omri Kaduri 的头像
Omri Kaduri1 年前

How can you refer to the order of the images in the prompt? Simply saying "first image" is enough? Like -"is the object in the first image shown in the second image"

Lily Zhang 的头像
Lily Zhang1 年前

How does it understand 3D?

Gradio 的头像
Gradio1 年前

Different views as multiple image input

Gradio 的头像
Gradio1 年前

Love this! 💡

相关视频