正在加载视频...

视频加载失败

🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! Ziyi Yang Yang Liu Chenguang Zhu Mohit Bansal 🧵👇

97,533 次观看 • 2 年前 •via X (Twitter)

10 条评论

Zineng Tang 的头像
Zineng Tang2 年前

By aligning modalities with language for encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to understand complex modality-interleaved instructions and in-context examples and conduct zero-shot/few-shot multimodal generation.

Zineng Tang 的头像
Zineng Tang2 年前

Trained on a large-scale generation dataset encompassing in-context multi-modal instructions across text, vision, and audio, CoDi-2 can follow interleaved in-context text-audio-vision prompts and can zero-shot/few-shot jointly generate multiple modalities.

Zineng Tang 的头像
Zineng Tang2 年前

CoDi-2 also demonstrates a wide range of zero-shot abilities for image generation like reasoning, compositionality, instruction editing, exemplar learning, and subject driven generation, etc.

Zineng Tang 的头像
Zineng Tang2 年前

CoDi-2 also demonstrates zero-shot/few-shot abilities for audio generation with intricate prompting like instruction editing and exemplar learning.

Zineng Tang 的头像
Zineng Tang2 年前

CoDi-2 surpasses previous domain-specific models on tasks such as subject-driven image generation, vision transformation, and audio editing.

Zineng Tang 的头像
Zineng Tang2 年前

Overall, CoDi-2 signifies a substantial breakthrough in developing a comprehensive multimodal foundation model adept at interpreting in-context language-vision-audio interleaved instructions & producing multimodal outputs (in zero/few-shot way). @berkeley_ai @uncnlp @MSFTResearch

Zineng Tang 的头像
Zineng Tang2 年前

As a reminder CoDi-1 will be presented at #NeurIPS2023, happy to chat about CoDi-1 and CoDi-2 in New Orleans! ->

Connor Shorten 的头像
Connor Shorten2 年前

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 generative-CoDi Weaviate module coming soon? @ZainHasan6 @antas_marcin 👀🔥

Wenmeng Zhou 的头像
Wenmeng Zhou2 年前

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 cool! multi-modality in and out is the future I believe but it seems futue is coming now

SGM 的头像
SGM2 年前

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 @ZinengTang Will this product have a version where people can use it too i.e through a webui etc could be very helpful to a lot of people for sound effects perhaps you guys can release a version of it with freemium+paid plans

相关视频

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 次观看 • 1 年前