Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Experimenting with the magic of open-source! Whisper for text translation, XTTS for audio, and Video-retalker for seamless mouth sync in a short video Its not perfect, but I think we're close with open source #buildinpublic #opensource

Luis Catacora

8,813 subscribers

80,941 Aufrufe • vor 2 Jahren •via X (Twitter)

Wissenschaft & Technologie Kunst #buildinpublic #opensource

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Luis C

Luis Cvor 2 Jahren

For reference these were my runs on @replicate: Curious to see what you all come up with!

Profilbild von Michael Aubry — BasedLabs.ai

Michael Aubry — BasedLabs.aivor 2 Jahren

Do you know how d-id or heygen works? Wondering if there is a good oss model

Profilbild von Bilawal Sidhu

Bilawal Sidhuvor 2 Jahren

this is awesome, but gosh i wish it worked better for beards

Profilbild von Luis C

Luis Cvor 2 Jahren

Likewise. There seems to be limits when using the video-retalking model

Profilbild von AP

APvor 2 Jahren

Really cool! @artificialguybr

Profilbild von Luis C

Luis Cvor 2 Jahren

@artificialguybr Thanks! It was cool enough that I just had to share it

Profilbild von ⟁ndrew V

⟁ndrew Vvor 2 Jahren

Pretty damn good for a first pass, impressive!

Profilbild von Luis C

Luis Cvor 2 Jahren

Right? All these models have been open source for a while too, just needed to chain them together

Profilbild von Crypto Industry

Crypto Industryvor 2 Jahren

The magic of open source is real, and you're the tech wizard making it happen! 🧙‍♀️

Profilbild von Luis C

Luis Cvor 2 Jahren

Thanks! Im just testing out ideas tbh

Ähnliche Videos

🎧 IAMF: The Future of Audio with Open Standards! 🎧 At CES , #VLC with #FFMPEG showcases #IAMF redefining sound: 🎵 Spatial audio for next-level immersion 🌌 Cinema-quality sound 📱 Seamless playback across devices 💻 Powered by open-source tech #VLC #CES2025 #OPENSOURCE

🎧 IAMF: The Future of Audio with Open Standards! 🎧 At CES , #VLC with #FFMPEG showcases #IAMF redefining sound: 🎵 Spatial audio for next-level immersion 🌌 Cinema-quality sound 📱 Seamless playback across devices 💻 Powered by open-source tech #VLC #CES2025 #OPENSOURCE

VideoLAN

13,299 Aufrufe • vor 1 Jahr

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

Akshay 🚀

66,042 Aufrufe • vor 5 Monaten

New open-source LoRA built in-house: LTX2.3 Audio Reactive LoRA. Built for music-driven video generation: stronger beat sync, graphic texture, color separation, light pulses, particles, and shapes that move with the track. Model:

New open-source LoRA built in-house: LTX2.3 Audio Reactive LoRA. Built for music-driven video generation: stronger beat sync, graphic texture, color separation, light pulses, particles, and shapes that move with the track. Model:

fal

16,472 Aufrufe • vor 21 Tagen

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

Avthar

34,261 Aufrufe • vor 1 Jahr

Announcing SubStudio! Generate subtitles for any video in seconds with AI. 100% free & open source! Powered by Whisper on Together AI and FFmpeg via fluent-ffmpeg.

Announcing SubStudio! Generate subtitles for any video in seconds with AI. 100% free & open source! Powered by Whisper on Together AI and FFmpeg via fluent-ffmpeg.

Hassan

133,261 Aufrufe • vor 2 Monaten

We are thrilled to announce the open-source release of Hallo2: a new avatar video generative model capable of generating stunning 4K resolution videos for up to 1 hour! 🎥✨ Explore the project for the open source code: Let's push the boundaries of avatar video generative model! #OpenSource #AI #FaceAnimation #GenerativeArt

Siyu ZHU

58,323 Aufrufe • vor 1 Jahr

Elon Musk played a key role in the creation of OpenAI, even coming up with the name “OpenAI” to reflect its commitment to being open source. But Sam Altman transformed this open-source, non-profit company into a closed-source, for-profit company.

Elon Musk played a key role in the creation of OpenAI, even coming up with the name “OpenAI” to reflect its commitment to being open source. But Sam Altman transformed this open-source, non-profit company into a closed-source, for-profit company.

DogeDesigner

56,245 Aufrufe • vor 10 Monaten

Elon Musk played a key role in the creation of OpenAI, even coming up with the name “OpenAI” to reflect its commitment to being open source. But Sam Altman transformed this open-source, non-profit company into a closed-source, for-profit company.

Elon Musk played a key role in the creation of OpenAI, even coming up with the name “OpenAI” to reflect its commitment to being open source. But Sam Altman transformed this open-source, non-profit company into a closed-source, for-profit company.

DogeDesigner

2,360,335 Aufrufe • vor 1 Jahr

Build an open-source AI video studio with this Next.js template using Veo 3 and Imagen 4 in the Gemini API. Create text-to-video, image-to-video, and edit videos in the browser for a specific time range.

Build an open-source AI video studio with this Next.js template using Veo 3 and Imagen 4 in the Gemini API. Create text-to-video, image-to-video, and edit videos in the browser for a specific time range.

Google AI Developers

41,423 Aufrufe • vor 10 Monaten

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

Tencent HY

191,908 Aufrufe • vor 1 Jahr

a short video about "gorge music", a japanese open-source genre for rock climbers and mountaineers. exploring its mythology, IDM influence and how it solves artist ego

a short video about "gorge music", a japanese open-source genre for rock climbers and mountaineers. exploring its mythology, IDM influence and how it solves artist ego

RamonPang

17,317 Aufrufe • vor 3 Monaten

Japan reports strong growth in open source value and new expectations for support, security, and governance. Watch the highlights and explore the full findings from the 2025 State of Open Source Japan report. Read the full report: #OpenSource #LFResearch #WorldOfOpenSource

Japan reports strong growth in open source value and new expectations for support, security, and governance. Watch the highlights and explore the full findings from the 2025 State of Open Source Japan report. Read the full report: #OpenSource #LFResearch #WorldOfOpenSource

The Linux Foundation

28,543 Aufrufe • vor 6 Monaten

CODEX SKILL TO MAKE SHORTS FROM ANY VIDEO! Just give Codex a local video and it creates a short for you -> Removes dead air -> Cuts pauses -> Exports 1080x1920 vertical MP4 -> Adds local Whisper captions -> Gives an edit report No OpenAI API key required. Install: npx --yes codex-video-short-maker-skill@latest --with-captions Result shown in the video. 100% open source.

CODEX SKILL TO MAKE SHORTS FROM ANY VIDEO! Just give Codex a local video and it creates a short for you -> Removes dead air -> Cuts pauses -> Exports 1080x1920 vertical MP4 -> Adds local Whisper captions -> Gives an edit report No OpenAI API key required. Install: npx --yes codex-video-short-maker-skill@latest --with-captions Result shown in the video. 100% open source.

Kappaemme

22,689 Aufrufe • vor 2 Monaten

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 Aufrufe • vor 1 Jahr

Siri not using Whisper is one of the strangest things in tech. Whisper: open-source, years old, near-perfect speech-to-text. Siri: still mishears half of what you say. Hard to believe the engineers actually dogfood the product.

Siri not using Whisper is one of the strangest things in tech. Whisper: open-source, years old, near-perfect speech-to-text. Siri: still mishears half of what you say. Hard to believe the engineers actually dogfood the product.

AI Breakfast

42,020 Aufrufe • vor 3 Monaten

Holy smokes! AI video just had its DeepSeek moment. LTX Video just got a massive update in both quality & speed. 13B parameters! It is a game-changing moment for AI video and... ...it’s open-source! See for yourself: 1. Fading Glory, a short I made in LTX Studio

Holy smokes! AI video just had its DeepSeek moment. LTX Video just got a massive update in both quality & speed. 13B parameters! It is a game-changing moment for AI video and... ...it’s open-source! See for yourself: 1. Fading Glory, a short I made in LTX Studio

Alex Patrascu

90,171 Aufrufe • vor 1 Jahr

LTX-2 is natively supported in ComfyUI on Day 0 🎬🔊 The next chapter in controllability for open-source video generation. - Open-source audio-video foundation model - Generates motion, dialogue, SFX, and music together - Canny, Depth & Pose video-to-video control - Keyframe-driven generation - Native upscaling and prompt enhancement Everything in a single pass.

LTX-2 is natively supported in ComfyUI on Day 0 🎬🔊 The next chapter in controllability for open-source video generation. - Open-source audio-video foundation model - Generates motion, dialogue, SFX, and music together - Canny, Depth & Pose video-to-video control - Keyframe-driven generation - Native upscaling and prompt enhancement Everything in a single pass.

ComfyUI

108,130 Aufrufe • vor 5 Monaten

Here are 10 AI video editor GitHub repos worth bookmarking: 1. Shotcut Most actively maintained open source video editor in 2026. 14K stars. Cross-platform with AI-assisted features. Just shipped a new release April 30, 2026. 2. Kdenlive The closest open source alternative to Adobe Premiere Pro. Multi-track editing, proxy editing, VST audio, and customizable workspace. Best for professional workflows. 3. OpenShot The easiest entry point for beginners. Drag and drop, 400+ transitions, 3D titles, and AI-assisted trimming. 5,700 stars. 4. Blender Not just 3D. Blender's video sequence editor and compositing pipeline is used in professional film production. 18,300 stars. Unmatched for VFX. 5. Recordly Screen recorder with auto-zoom, cursor polish, webcam overlays, and styled frames built in. Built for demo videos and walkthroughs. 6. Wan2.1 Alibaba's open source text-to-video model. Cinema-grade 1080p generation. Apache 2.0. The gold standard for open source video generation in 2026. 7. HunyuanVideo Tencent's 13B parameter open source video model. 11.9K stars. Handles 720p and 1080p with high temporal coherence. 8. CogVideoX Apache 2.0 licensed. Loads natively via Hugging Face Diffusers. Strong prompt following and smooth frame transitions. Needs 16GB VRAM minimum. 12.5K stars. 9. Open-Sora Most starred open source video generation project at 24K stars. Full training pipeline for $200K. Production-level output quality. 10. Mochi 1 Focused entirely on motion quality. The most natural-looking physics of any open source video model. Water, fabric, and human gestures without AI jitter. Apache 2.0.

Here are 10 AI video editor GitHub repos worth bookmarking: 1. Shotcut Most actively maintained open source video editor in 2026. 14K stars. Cross-platform with AI-assisted features. Just shipped a new release April 30, 2026. 2. Kdenlive The closest open source alternative to Adobe Premiere Pro. Multi-track editing, proxy editing, VST audio, and customizable workspace. Best for professional workflows. 3. OpenShot The easiest entry point for beginners. Drag and drop, 400+ transitions, 3D titles, and AI-assisted trimming. 5,700 stars. 4. Blender Not just 3D. Blender's video sequence editor and compositing pipeline is used in professional film production. 18,300 stars. Unmatched for VFX. 5. Recordly Screen recorder with auto-zoom, cursor polish, webcam overlays, and styled frames built in. Built for demo videos and walkthroughs. 6. Wan2.1 Alibaba's open source text-to-video model. Cinema-grade 1080p generation. Apache 2.0. The gold standard for open source video generation in 2026. 7. HunyuanVideo Tencent's 13B parameter open source video model. 11.9K stars. Handles 720p and 1080p with high temporal coherence. 8. CogVideoX Apache 2.0 licensed. Loads natively via Hugging Face Diffusers. Strong prompt following and smooth frame transitions. Needs 16GB VRAM minimum. 12.5K stars. 9. Open-Sora Most starred open source video generation project at 24K stars. Full training pipeline for $200K. Production-level output quality. 10. Mochi 1 Focused entirely on motion quality. The most natural-looking physics of any open source video model. Water, fabric, and human gestures without AI jitter. Apache 2.0.

Kanika

17,309 Aufrufe • vor 20 Tagen

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,375 Aufrufe • vor 2 Jahren

Introducing LTX-2: the most complete open-source AI creative engine. - Synchronized audio and video generation - Native 4K fidelity, up to 50 fps and 10 s+ sequences - API-first design for seamless integration into creative pipelines - Runs efficiently on consumer GPUs - Fully open and accessible, with weights releasing later this year

Introducing LTX-2: the most complete open-source AI creative engine. - Synchronized audio and video generation - Native 4K fidelity, up to 50 fps and 10 s+ sequences - API-first design for seamless integration into creative pipelines - Runs efficiently on consumer GPUs - Fully open and accessible, with weights releasing later this year

LTX

1,700,457 Aufrufe • vor 8 Monaten