Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Are you using OpenAI's Whisper for speech recognition and finding the timestamps are out of sync? Just dropped: WhisperX with word-level timestamp accuracy by force aligning whisper with wav2vec2.0 🧵 [1/n]

Max Bain

2,139 subscribers

78,290 görüntüleme • 3 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

11 Yorum

Max Bain profil fotoğrafı

Max Bain3 yıl önce

🧵[2/n] @openAI’s Whisper shows impressive transcription performance, but often the corresponding timestamps are out of sync by several seconds.

Rainmaker profil fotoğrafı

Rainmaker1 yıl önce

Heightened volatility got you on edge? In my latest free Substack post, discover how a Hidden Markov Model (HMM) can help you navigate market corrections and safeguard your investments.

Max Bain profil fotoğrafı

Max Bain3 yıl önce

🧵[3/n] However, phoneme-based models such as Wav2Vec2.0 produce much more accurate timestamps. WhisperX leverages these models using forced alignment on the whisper transcription to generate word-level timestamps.

Max Bain profil fotoğrafı

Max Bain3 yıl önce

🧵[4/n] The result is word-level timestamp output. See more examples and try it yourself at

Max Bain profil fotoğrafı

Max Bain3 yıl önce

🧵[5/n] Of course, it would be better if a single model did everything. One way would be teacher-student, where whisper is learning to output wav2vec's aligned timestamps. If @OpenAI open-sourced the training data and script, it would be cool to try this :)

Benjamin Warberg profil fotoğrafı

Benjamin Warberg3 yıl önce

@philipvollet @OpenAI Awesome!

august kamp profil fotoğrafı

august kamp3 yıl önce

@OpenAI any way to get this working for a musician who struggles aligning vocals to videos ?

Max Bain profil fotoğrafı

Max Bain3 yıl önce

@OpenAI do you mean aligning lyrics to the audio? You can feed the lyrics to the align function in the code, although aligning over such a long sequence could be tricky.

Hen³ profil fotoğrafı

Hen³3 yıl önce

@OpenAI Big bro killing it 🤝

hot-pocket.usd profil fotoğrafı

hot-pocket.usd3 yıl önce

@OpenAI @memdotai mem it

neb profil fotoğrafı

neb3 yıl önce

@OpenAI Thanks !

Benzer Videolar

Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇

Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇

Xenova

105,963 görüntüleme • 2 yıl önce

Introducing Scribe — the most accurate Speech to Text model. It has the highest accuracy on benchmarks, outperforming previous state-of-the-art models such as Gemini 2.0 and OpenAI Whisper v3. It’s now the leading model for English, Spanish, Italian, and many more. With support for 99 languages, speaker diarization, character-level timestamps, and non-speech events such as laughing.

Introducing Scribe — the most accurate Speech to Text model. It has the highest accuracy on benchmarks, outperforming previous state-of-the-art models such as Gemini 2.0 and OpenAI Whisper v3. It’s now the leading model for English, Spanish, Italian, and many more. With support for 99 languages, speaker diarization, character-level timestamps, and non-speech events such as laughing.

ElevenLabs

464,458 görüntüleme • 1 yıl önce

xAI just dropped 2 absolute monsters: Grok Speech-to-Text + Grok Text-to-Speech APIs Same battle-tested stack that runs Grok Voice, millions of Tesla cars, and Starlink support. - STT: Real-time WebSocket, speaker diarization, 25+ languages, word-level timestamps. - TTS: Super natural, expressive voices with tags for laugh, sigh, whisper, emphasis, pause; batch + streaming. Put voice AI on the list of things Grok already 100%'d xAI, Grok

xAI just dropped 2 absolute monsters: Grok Speech-to-Text + Grok Text-to-Speech APIs Same battle-tested stack that runs Grok Voice, millions of Tesla cars, and Starlink support. - STT: Real-time WebSocket, speaker diarization, 25+ languages, word-level timestamps. - TTS: Super natural, expressive voices with tags for laugh, sigh, whisper, emphasis, pause; batch + streaming. Put voice AI on the list of things Grok already 100%'d xAI, Grok

Mario Nawfal

164,850 görüntüleme • 2 ay önce

Introducing Nova-2, our next-gen model for superhuman speech-to-text. TL;DR Nova-2 delivers: 💥 Next-level accuracy: +18% accuracy than Nova-1 & over 36% accuracy than OpenAI Whisper large 💥 Up to 40x faster 💥 Same low cost: 3-7x cheaper 🧵👇

Introducing Nova-2, our next-gen model for superhuman speech-to-text. TL;DR Nova-2 delivers: 💥 Next-level accuracy: +18% accuracy than Nova-1 & over 36% accuracy than OpenAI Whisper large 💥 Up to 40x faster 💥 Same low cost: 3-7x cheaper 🧵👇

Deepgram

2,184,459 görüntüleme • 2 yıl önce

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

Xenova

262,755 görüntüleme • 2 yıl önce

Today we are announcing Avalon – the best transcription model in the world for prompting AIs. It outperforms the previous state-of-the-art models like ElevenLabs' Scribe and OpenAI's Whisper.

Today we are announcing Avalon – the best transcription model in the world for prompting AIs. It outperforms the previous state-of-the-art models like ElevenLabs' Scribe and OpenAI's Whisper.

Aqua Voice

472,132 görüntüleme • 10 ay önce

Whisper models now have the ability to do word-by-word level time stamps on GroqCloud! Check out this automated Instagram Reels captioner built by Groq AI App Engineer Intern Chris Ho using WLTS! GitHub repo linked below

Whisper models now have the ability to do word-by-word level time stamps on GroqCloud! Check out this automated Instagram Reels captioner built by Groq AI App Engineer Intern Chris Ho using WLTS! GitHub repo linked below

Groq Inc

20,922 görüntüleme • 1 yıl önce

Breaking: NiGHTS, Aiai, Tangle, and Whisper are confirmed DLC for #SonicRacingCrossworlds! Check out the footage from the State of Play Japan that just aired. #SonicNews

Breaking: NiGHTS, Aiai, Tangle, and Whisper are confirmed DLC for #SonicRacingCrossworlds! Check out the footage from the State of Play Japan that just aired. #SonicNews

Tails’ Channel

243,807 görüntüleme • 7 ay önce

george michael being the king of bridges: a 🧵 1) careless whisper

george michael being the king of bridges: a 🧵 1) careless whisper

blue🦋(WHITE PEOPLE DNI)

1,782,089 görüntüleme • 1 yıl önce

Whisper 🤝 Torch Compile Speed-up inference by a factor of 4x with just 2 additional lines of code No accuracy degradation. No additional training. Just kernels 😍 Code example in thread 👇

Whisper 🤝 Torch Compile Speed-up inference by a factor of 4x with just 2 additional lines of code No accuracy degradation. No additional training. Just kernels 😍 Code example in thread 👇

Sanchit Gandhi

12,868 görüntüleme • 1 yıl önce

Ugandan Special Force Commandos demonstrating high precision with live bullets while using their comrade to show the level of accuracy!

Ugandan Special Force Commandos demonstrating high precision with live bullets while using their comrade to show the level of accuracy!

Africa Today Media Group

88,649 görüntüleme • 2 ay önce

Loreley's Outfit | Twilight Whisper "Hehe... Why not just spend the whole day on the sofa with me? There are sooooo many things I want to do with you..."

Loreley's Outfit | Twilight Whisper "Hehe... Why not just spend the whole day on the sofa with me? There are sooooo many things I want to do with you..."

GIRLS' FRONTLINE 2: EXILIUM-EN Official

1,165,940 görüntüleme • 12 gün önce

Siri not using Whisper is one of the strangest things in tech. Whisper: open-source, years old, near-perfect speech-to-text. Siri: still mishears half of what you say. Hard to believe the engineers actually dogfood the product.

Siri not using Whisper is one of the strangest things in tech. Whisper: open-source, years old, near-perfect speech-to-text. Siri: still mishears half of what you say. Hard to believe the engineers actually dogfood the product.

AI Breakfast

42,020 görüntüleme • 3 ay önce

Parents are finding out and now their racist ass kids are finding out as well. Stop calling Black people the N-word. It truly isn't that hard.

Parents are finding out and now their racist ass kids are finding out as well. Stop calling Black people the N-word. It truly isn't that hard.

LanaQuest aka RosaSparks

38,450 görüntüleme • 3 ay önce

When Selena Gomez dropped this banger with ASAP Rocky, and pushed whisper pop mainstream

When Selena Gomez dropped this banger with ASAP Rocky, and pushed whisper pop mainstream

xolana

572,907 görüntüleme • 1 yıl önce

When Selena Gomez dropped this banger with ASAP Rocky, and pushed whisper pop mainstream

When Selena Gomez dropped this banger with ASAP Rocky, and pushed whisper pop mainstream

DenouementLawyer

36,375 görüntüleme • 8 ay önce

The video is a Llama v1 7B model implemented in MLX and running on an M2 Ultra. More here: * Train a Transformer LM or fine-tune with LoRA * Text generation with Mistral * Image generation with Stable Diffusion * Speech recognition with Whisper

The video is a Llama v1 7B model implemented in MLX and running on an M2 Ultra. More here: * Train a Transformer LM or fine-tune with LoRA * Text generation with Mistral * Image generation with Stable Diffusion * Speech recognition with Whisper

Awni Hannun

66,565 görüntüleme • 2 yıl önce

Rohit Shetty and #SalmanKhan made the whisper challenge really difficult for Ranveer Singh with the word PEN-CHOR Pooja Hegde and Jacqueline Fernandez's expressions💀

Rohit Shetty and #SalmanKhan made the whisper challenge really difficult for Ranveer Singh with the word PEN-CHOR Pooja Hegde and Jacqueline Fernandez's expressions💀

Devil V!SHAL

129,250 görüntüleme • 3 ay önce

Experimenting with the magic of open-source! Whisper for text translation, XTTS for audio, and Video-retalker for seamless mouth sync in a short video Its not perfect, but I think we're close with open source #buildinpublic #opensource

Experimenting with the magic of open-source! Whisper for text translation, XTTS for audio, and Video-retalker for seamless mouth sync in a short video Its not perfect, but I think we're close with open source #buildinpublic #opensource

Luis Catacora

80,941 görüntüleme • 2 yıl önce