steven's banner

steven

@Tu7uruu • 2,578 subscribers

whispering to neural networks @Huggingface

Shorts

Just dropped on HF — NeuTTS Air Next-gen on-device TTS that matches cloud-level quality while staying fully open source. > Real-time speech synthesis on CPU/GPU > 3-second voice cloning, no cloud or data upload > Compact: under 200 MB, runs on mobile and edge devices > Multilingual and expressive > Developed by Neuphonic , optimized for speed and fidelity

Just dropped on HF — NeuTTS Air Next-gen on-device TTS that matches cloud-level quality while staying fully open source. > Real-time speech synthesis on CPU/GPU > 3-second voice cloning, no cloud or data upload > Compact: under 200 MB, runs on mobile and edge devices > Multilingual and expressive > Developed by Neuphonic , optimized for speed and fidelity

72,273 views

Just dropped on HF: kani-tts-370m A lightweight open-source text-to-speech model that sounds great and runs fast! > 370M parameters — efficient and deployable on consumer GPUs > NanoCodec + LFM2-350M > Natural & expressive voice trained with modern neural TTS techniques > Fast inference: real-time on a single RTX 3060

Just dropped on HF: kani-tts-370m A lightweight open-source text-to-speech model that sounds great and runs fast! > 370M parameters — efficient and deployable on consumer GPUs > NanoCodec + LFM2-350M > Natural & expressive voice trained with modern neural TTS techniques > Fast inference: real-time on a single RTX 3060

34,558 views

Here is a tutorial on training LLaSA (LLaMA-based TTS) using GRPO to improve prosody, rhythm, and expressiveness in synthesized speech with TRL!

Here is a tutorial on training LLaSA (LLaMA-based TTS) using GRPO to improve prosody, rhythm, and expressiveness in synthesized speech with TRL!

15,488 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Meet the new Qwen3-TTS (2025-11-27): a major step forward in lifelike voice generation! > 49+ distinctive voices ranging from playful to authoritative, giving creators precise control over personality and style. > Global language coverage with 10 languages and multiple authentic dialects, including Minnan, Wu, Cantonese, Sichuan, Beijing, Nanjing, Tianjin, and Shaanxi. > Human-level delivery with adaptive rhythm, pacing, and intonation that makes speech sound genuinely performed.

Meet the new Qwen3-TTS (2025-11-27): a major step forward in lifelike voice generation! > 49+ distinctive voices ranging from playful to authoritative, giving creators precise control over personality and style. > Global language coverage with 10 languages and multiple authentic dialects, including Minnan, Wu, Cantonese, Sichuan, Beijing, Nanjing, Tianjin, and Shaanxi. > Human-level delivery with adaptive rhythm, pacing, and intonation that makes speech sound genuinely performed.

45,722 views • 7 months ago

Just released on Hugging Face: Vui, a 100M open-source NotebookLM! 3 models: > Vui.BASE is the base checkpoint trained on 40k hours of audio conversations > Vui.ABRAHAM is a single speaker model that can reply with context awareness. > Vui.COHOST is checkpoint with two speakers that can talk to each other. It clones voices, breathes, uhs, [laughs] — even non-speech sounds. Human-like TTS is here!

Just released on Hugging Face: Vui, a 100M open-source NotebookLM! 3 models: > Vui.BASE is the base checkpoint trained on 40k hours of audio conversations > Vui.ABRAHAM is a single speaker model that can reply with context awareness. > Vui.COHOST is checkpoint with two speakers that can talk to each other. It clones voices, breathes, uhs, [laughs] — even non-speech sounds. Human-like TTS is here!

43,645 views • 1 year ago

Just dropped on HF: Supertonic TTS, a blazing fast speech model. 🤯 > RTF as low as 0.001 on RTX4090 > Runs on-device (no latency, full privacy) > 66M params + 8+ language SDKs > Browser, mobile, edge — it just works. > You can adjust inference steps, batch processing, and other parameters to match your specific needs! > Open-source. Ready to build with!

Just dropped on HF: Supertonic TTS, a blazing fast speech model. 🤯 > RTF as low as 0.001 on RTX4090 > Runs on-device (no latency, full privacy) > 66M params + 8+ language SDKs > Browser, mobile, edge — it just works. > You can adjust inference steps, batch processing, and other parameters to match your specific needs! > Open-source. Ready to build with!

17,784 views • 8 months ago

No more content to load