Загрузка видео...

Не удалось загрузить видео

На главную

🚀Introducing LLaVA Lightning: Train a lite, multimodal GPT-4 with just $40 in 3 hours! With our newly introduced datasets and the efficient design of LLaVA, you can now turbocharge your language model with image reasoning capabilities, in an incredibly affordable way.🧵

302,319 просмотров • 3 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля Haotian Liu
Haotian Liu3 лет назад

(2/5) Excited to release a 558K concept-balanced subset of LAION/CC/SBU & an 80K high-quality subset of LLaVA-Instruct-158K. The concept-balanced subset ensures a broad concept coverage, and the high-quality visual instruct tuning data enables models' visual reasoning capability.

Фото профиля Haotian Liu
Haotian Liu3 лет назад

(3/5) Upgrade your Vicuna-7B to LLaVA-Lightning in just 3 hrs: 2 hrs pretraining + 1 hr visual instruct tuning. Train on 8x A100s using cloud spot instances for just $40. Let's make this research more accessible to researchers, academia, and millions of AI enthusiasts today!

Фото профиля Haotian Liu
Haotian Liu3 лет назад

(4/5) We're also upgrading LLaVA to support Vicuna v0 & v1 weights, with more checkpoints arriving this week! Plus, we're working to support more hardware – stay tuned!

Фото профиля Haotian Liu
Haotian Liu3 лет назад

(5/5) 🤗 Demo: 🌐 Project page: 📄 Paper: Embark on your LLaVA-Lightning journey today and stay tuned for more models and support for more hardwares in the following weeks!

Фото профиля iamrobotbear (bk)
iamrobotbear (bk)3 лет назад

Any way to easily swap LLaMA out for OpenAI or Dolly 2.0?

Фото профиля Haotian Liu
Haotian Liu3 лет назад

Yes, it is definitely possible. And even easier with the introduction of LLaVA lightning. MPT-7B just joins the LLaVA family today!

Фото профиля Zongheng Yang
Zongheng Yang3 лет назад

Congrats on the work @imhaotian. Glad to see SkyPilot was of help!

Фото профиля Chris
Chris3 лет назад

Recently program of open source projects is super fast, definitely surpass my expectation.

Фото профиля web3工作坊
web3工作坊3 лет назад

@_akhaliq Thank you for sharing. @savetonotion #tweet #AI

Фото профиля Jake Harrison
Jake Harrison3 лет назад

Good job!

Похожие видео

New short course Multimodal RAG: Chat with Videos, developed with Intel and taught by vasudevlal! In this course, you’ll work with LLaVA (Large Language and Vision Assistant), a Large Vision Language Model (LVLM) that can process both images and text. For example, given an image of a person doing a handstand on a skateboard at the beach, LLaVA doesn't just caption the scene, it’s able to predict possible outcomes, like the person losing balance or falling off. By understanding not just what's in a video frame, but what might happen next, your application can provide more insightful answers to questions about video. You'll build a full multimodal RAG pipeline that can chat about video content: - Use the BridgeTower model to create joint text-image embeddings in a 512-dimensional multimodal semantic space. - Learn video processing techniques to extract keyframes, generate transcripts using Whisper, and create captions. - Use the LanceDB vector database to store and retrieve high-dimensional multimodal embeddings. - Integrate the LLaVA model, combining CLIP's (Contrastive Language Image Pretraining) vision transformer with Llama, for advanced visual-textual reasoning. Your final system will ingest video data, generate embeddings for frames and text, perform similarity searches for relevant content, and use the retrieved multimodal context to inform LVLM-based response generation. The result is a system capable of answering nuanced questions about video content, effectively chatting about the video it has processed. Please sign up here!

Andrew Ng

107,548 просмотров • 1 год назад