Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Experimenting with the magic of open-source! Whisper for text translation, XTTS for audio, and Video-retalker for seamless mouth sync in a short video Its not perfect, but I think we're close with open source #buildinpublic #opensource

80,941 Aufrufe • vor 2 Jahren •via X (Twitter)

10 Kommentare

Profilbild von Luis C
Luis Cvor 2 Jahren

For reference these were my runs on @replicate: Curious to see what you all come up with!

Profilbild von Michael Aubry — BasedLabs.ai
Michael Aubry — BasedLabs.aivor 2 Jahren

Do you know how d-id or heygen works? Wondering if there is a good oss model

Profilbild von Bilawal Sidhu
Bilawal Sidhuvor 2 Jahren

this is awesome, but gosh i wish it worked better for beards

Profilbild von Luis C
Luis Cvor 2 Jahren

Likewise. There seems to be limits when using the video-retalking model

Profilbild von AP
APvor 2 Jahren

Really cool! @artificialguybr

Profilbild von Luis C
Luis Cvor 2 Jahren

@artificialguybr Thanks! It was cool enough that I just had to share it

Profilbild von ⟁ndrew V
⟁ndrew Vvor 2 Jahren

Pretty damn good for a first pass, impressive!

Profilbild von Luis C
Luis Cvor 2 Jahren

Right? All these models have been open source for a while too, just needed to chain them together

Profilbild von Crypto Industry
Crypto Industryvor 2 Jahren

The magic of open source is real, and you're the tech wizard making it happen! 🧙‍♀️

Profilbild von Luis C
Luis Cvor 2 Jahren

Thanks! Im just testing out ideas tbh

Ähnliche Videos

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 Aufrufe • vor 1 Jahr

Here are 10 AI video editor GitHub repos worth bookmarking: 1. Shotcut Most actively maintained open source video editor in 2026. 14K stars. Cross-platform with AI-assisted features. Just shipped a new release April 30, 2026. 2. Kdenlive The closest open source alternative to Adobe Premiere Pro. Multi-track editing, proxy editing, VST audio, and customizable workspace. Best for professional workflows. 3. OpenShot The easiest entry point for beginners. Drag and drop, 400+ transitions, 3D titles, and AI-assisted trimming. 5,700 stars. 4. Blender Not just 3D. Blender's video sequence editor and compositing pipeline is used in professional film production. 18,300 stars. Unmatched for VFX. 5. Recordly Screen recorder with auto-zoom, cursor polish, webcam overlays, and styled frames built in. Built for demo videos and walkthroughs. 6. Wan2.1 Alibaba's open source text-to-video model. Cinema-grade 1080p generation. Apache 2.0. The gold standard for open source video generation in 2026. 7. HunyuanVideo Tencent's 13B parameter open source video model. 11.9K stars. Handles 720p and 1080p with high temporal coherence. 8. CogVideoX Apache 2.0 licensed. Loads natively via Hugging Face Diffusers. Strong prompt following and smooth frame transitions. Needs 16GB VRAM minimum. 12.5K stars. 9. Open-Sora Most starred open source video generation project at 24K stars. Full training pipeline for $200K. Production-level output quality. 10. Mochi 1 Focused entirely on motion quality. The most natural-looking physics of any open source video model. Water, fabric, and human gestures without AI jitter. Apache 2.0.

Kanika

17,309 Aufrufe • vor 20 Tagen