
Humphrey Shi
@humphrey_shi • 4,591 subscribers
Vice President @NVIDIA · Professor @GeorgiaTech | Engineer–Researcher building next-generation high-performance, multimodal, and creative AI systems
Videos

While frontier closed models like Google’s Nano Banana can autonomously produce rich interleaved content (e.g., illustrated tutorials), open-source models still lag in both task coverage and generation quality. We introduce DuoGen, a dual transformer–diffusion framework that narrows this gap via an efficient decoupled design: a pretrained Multimodal LLM performs semantic reasoning and decides when to generate images, while a Video Diffusion Transformer ensures high-fidelity, consistent visuals—without costly mixed-modality pretraining. Enabled by a new large-scale interleaved instruction-tuning dataset. Code & data will be open-sourced. Paper: Project: Work led by Min Shi 🐝 originating from his summer internship at NVIDIA Research and continuing beyond, in collaboration with Ming-Yu Liu, Xiaohui Zeng, Jiannan Huang, Yin Cui, Jialuo Li, Tsung-Yi Lin, Max Li 李赵硕, Francesco Ferroni, Xiao Fu, Yogesh Balaji, Chieh-Yun Chen, and other colleagues 🚀
Humphrey Shi16,692 次观看 • 4 个月前
没有更多内容可加载