Video wird geladen...
Video konnte nicht geladen werden
🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! Ziyi Yang Yang Liu Chenguang Zhu Mohit Bansal 🧵👇
97,533 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

By aligning modalities with language for encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to understand complex modality-interleaved instructions and in-context examples and conduct zero-shot/few-shot multimodal generation.

Trained on a large-scale generation dataset encompassing in-context multi-modal instructions across text, vision, and audio, CoDi-2 can follow interleaved in-context text-audio-vision prompts and can zero-shot/few-shot jointly generate multiple modalities.

CoDi-2 also demonstrates a wide range of zero-shot abilities for image generation like reasoning, compositionality, instruction editing, exemplar learning, and subject driven generation, etc.

CoDi-2 also demonstrates zero-shot/few-shot abilities for audio generation with intricate prompting like instruction editing and exemplar learning.

CoDi-2 surpasses previous domain-specific models on tasks such as subject-driven image generation, vision transformation, and audio editing.

Overall, CoDi-2 signifies a substantial breakthrough in developing a comprehensive multimodal foundation model adept at interpreting in-context language-vision-audio interleaved instructions & producing multimodal outputs (in zero/few-shot way). @berkeley_ai @uncnlp @MSFTResearch

As a reminder CoDi-1 will be presented at #NeurIPS2023, happy to chat about CoDi-1 and CoDi-2 in New Orleans! ->

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 generative-CoDi Weaviate module coming soon? @ZainHasan6 @antas_marcin 👀🔥

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 cool! multi-modality in and out is the future I believe but it seems futue is coming now

@yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 @ZinengTang Will this product have a version where people can use it too i.e through a webui etc could be very helpful to a lot of people for sound effects perhaps you guys can release a version of it with freemium+paid plans


