Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and...

1,248,952 görüntüleme • 6 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

AI at Meta

2,264,113 görüntüleme • 1 yıl önce

Type a sentence, get any sound - from talking cats to singing saxophones. Brilliant release by NVIDIA ✨ NVIDIA just unveiled Fugatto, a groundbreaking 2.5B parameter audio AI model that can generate and transform any combination of music, voices, and sounds using text prompts and audio inputs Fugatto could ultimately allow developers and creators to bring sounds to life simply by inputting text prompts, → The model demonstrates unique capabilities like creating hybrid sounds (trumpet barking), changing accents/emotions in voices, and allowing fine-grained control over sound transitions - trained on millions of audio samples using 32 NVIDIA H100 GPUs 👨‍🔧 Architecture Built as a foundational generative transformer model leveraging NVIDIA's previous work in speech modeling and audio understanding. The training process involved creating a specialized blended dataset containing millions of audio samples → ComposableART's Innovation in Audio Control Introduces a novel technique allowing combination of instructions that were only seen separately during training. Users can blend different audio attributes and control their intensity → Temporal Interpolation Capabilities Enables generation of evolving soundscapes with precise control over transitions. Can create dynamic audio sequences like rainstorms fading into birdsong at dawn → Processes both text and audio inputs flexibly, enabling tasks like removing instruments from songs or modifying specific audio characteristics while preserving others → Shows capabilities beyond its training data, creating entirely new sound combinations through interaction between different trained abilities 🔍 Real-world Applications → Allows rapid prototyping of musical ideas, style experimentation, and real-time sound creation during studio sessions → Enables dynamic audio asset generation matching gameplay situations, reducing pre-recorded audio requirements → Can modify voice characteristics for language learning applications, allowing content delivery in familiar voices NVIDIA AI Developer

Rohan Paul

96,354 görüntüleme • 1 yıl önce