正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

🚀Introducing LLaVA-NeXT Interleave: Now AI can understand and reason with multiple images at once - This opens up multi-image scenarios like multi-frame videos, multi-view 3D, and multiple inter-leaved images. - An all round LMM that can understand videos, images, and 3D More⬇️

Gradio

56,347 subscribers

27,655 次观看 • 1 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

8 条评论

Gradio 的头像

Gradio1 年前

LLaVA-NeXT-Interleave🔥 - Interleave data format unifies different tasks. - New datasets on 🤗Hub: 1️⃣M4-Instruct, high-quality dataset, 1.1M samples from domains: multi-image, video, 3D & single-image 2️⃣LLaVA-Interleave Bench - Set of tasks to evaluate multi-image capabilities

Gradio 的头像

Gradio1 年前

LLaVA-NeXT-Interleave💪 - Attached videos show how it can explain jokes and understand content spread in multiple images and videos 🤯 - SoTA Performance, both, in multi and single images - Matches in perf with LLaVA-NeXT - Improved performance in video tasks

Gradio 的头像

Gradio1 年前

Gradio Multimodal Demo for LLaVA-NeXT-Interleave😍 : Models and Datasets are on 🤗 Hub:

Stark 的头像

Stark1 年前

how to finetune？

Omri Kaduri 的头像

Omri Kaduri1 年前

How can you refer to the order of the images in the prompt? Simply saying "first image" is enough? Like -"is the object in the first image shown in the second image"

Lily Zhang 的头像

Lily Zhang1 年前

How does it understand 3D?

Gradio 的头像

Gradio1 年前

Different views as multiple image input

Gradio 的头像

Gradio1 年前

Love this! 💡

相关视频

🔥 New Feature: Multi-Image to 3D! Now you can upload multiple images of the same object to generate a more complete and accurate 3D model. 😉 More improvements coming soon! Try it out and let us know what you think. #3dmodeling #ai

🔥 New Feature: Multi-Image to 3D! Now you can upload multiple images of the same object to generate a more complete and accurate 3D model. 😉 More improvements coming soon! Try it out and let us know what you think. #3dmodeling #ai

MeshyAI

59,314 次观看 • 1 年前

Introducing Stable Virtual Camera: This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspective—without complex reconstruction or scene-specific optimization.

Introducing Stable Virtual Camera: This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspective—without complex reconstruction or scene-specific optimization.

Stability AI

197,252 次观看 • 1 年前

🔥 New Feature: Multi-View Images in One Click! 🎨 Turn text/image → consistent multi-views (powered by Flux Kontext) → high-accuracy 3D models in seconds! 🚀 Boost your 3D workflow today:

🔥 New Feature: Multi-View Images in One Click! 🎨 Turn text/image → consistent multi-views (powered by Flux Kontext) → high-accuracy 3D models in seconds! 🚀 Boost your 3D workflow today:

MeshyAI

10,436 次观看 • 1 年前

Multi-modal #LLMs understand a lot about humans. But do they understand our 3D pose? We train #PoseGPT to estimate, generate, and reason about 3D human pose (#SMPL) in images and text. This is the first true foundation model for understanding 3D humans.

Multi-modal #LLMs understand a lot about humans. But do they understand our 3D pose? We train #PoseGPT to estimate, generate, and reason about 3D human pose (#SMPL) in images and text. This is the first true foundation model for understanding 3D humans.

Michael Black

81,365 次观看 • 2 年前

Multi-view images to 3D with Tripo V2.0 and HD texture is updated on webapp and API platform👇 More user experience improvements:

Multi-view images to 3D with Tripo V2.0 and HD texture is updated on webapp and API platform👇 More user experience improvements:

Tripo

11,826 次观看 • 1 年前

MoGe can turn images and videos into 3D point maps! Links ⬇️

MoGe can turn images and videos into 3D point maps! Links ⬇️

Dreaming Tulpa 🥓👑

161,368 次观看 • 1 年前

You can now create long form AI Videos. Introducing Multi-Frame by Dreamina AI! Upload up to 10 images to be stitched together for a longer AI video output. You can control the duration of each transition and guide with a text prompt. Tutorial in the comments 🔽

You can now create long form AI Videos. Introducing Multi-Frame by Dreamina AI! Upload up to 10 images to be stitched together for a longer AI video output. You can control the duration of each transition and guide with a text prompt. Tutorial in the comments 🔽

Jerrod Lew

31,762 次观看 • 7 个月前

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

AK

294,442 次观看 • 2 年前

Created using Grok Imagine multi-image to video. 4 base images, 6 videos.

Created using Grok Imagine multi-image to video. 4 base images, 6 videos.

Kara

830,872 次观看 • 3 个月前

Introducing Multi-Product Shots. Shoot your entire product range with AI, create category images, generate stunning campaign videos. All in one simple tool →

Introducing Multi-Product Shots. Shoot your entire product range with AI, create category images, generate stunning campaign videos. All in one simple tool →

Kive

6,114,584 次观看 • 1 年前

You can now remove background and/or boost resolution for multiple images at once

You can now remove background and/or boost resolution for multiple images at once

Figma

296,509 次观看 • 10 个月前

Introducing 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐧𝐲𝐭𝐡𝐢𝐧𝐠, a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv:

Introducing 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐧𝐲𝐭𝐡𝐢𝐧𝐠, a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv:

Xingang Pan

55,847 次观看 • 1 年前

We’ve just launched multi-reference in LTX Studio. You can now seamlessly integrate and composite objects, scenes, and characters from different images into a single frame with more control and precision. Now live.

We’ve just launched multi-reference in LTX Studio. You can now seamlessly integrate and composite objects, scenes, and characters from different images into a single frame with more control and precision. Now live.

LTX Studio

1,521,262 次观看 • 10 个月前

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

Gradio

41,770 次观看 • 1 年前

announcing multi-image prompts in real-time. now you can use up to 3 images to condition your generations with our new "HD" model.

announcing multi-image prompts in real-time. now you can use up to 3 images to condition your generations with our new "HD" model.

Krea

295,393 次观看 • 2 年前

[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code:

[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code:

Shubham Tulsiani

55,859 次观看 • 4 个月前

🌟 Create anything in 3D! 🌟 Introducing CAT3D: a new method that generates high-fidelity 3D scenes from any number of real or generated images in one minute, powered by multi-view diffusion models. w/ lovely coauthors Aleksander Holynski, Ben Poole and an amazing team!

🌟 Create anything in 3D! 🌟 Introducing CAT3D: a new method that generates high-fidelity 3D scenes from any number of real or generated images in one minute, powered by multi-view diffusion models. w/ lovely coauthors Aleksander Holynski, Ben Poole and an amazing team!

Ruiqi Gao

152,867 次观看 • 2 年前