
Vaibhav (VB) Srivastav
@reach_vb • 47,727 subscribers
Bringing Codex to developers @OpenAI | ex @huggingface | F1 fan | Here for @at_sofdog’s wisdom | *opinions my own
Shorts
Videos

Starting today you can use Codex in Claude Code 👀 /plugin marketplace add openai/codex-plugin-cc Try it out today with: /codex:review for a normal read-only Codex review /codex:adversarial-review for a steerable challenge review /codex:rescue to let codex rescue your code Enjoy Codex-ing!
Vaibhav (VB) Srivastav951,518 просмотров • 2 месяцев назад

Audio LMs scene is heating up! 🔥 Fixie.ai 🦊 Ultravox 0.4.1 - 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder, profit 💥 Bonus: MIT licensed checkpoints > Pre-trained on Llama3.1-8b/ 70b backbone as well as the encoder part of whisper-large-v3-turbo > Only the multi-modal adapter is trained, while Whisper encoder and LLM are kept frozen > Use knowledge-distillation loss where Ultravox is trying to match the logits of the LLM backbone GG Fixie.ai 🦊 - Play with it directly on the space and checkout the models on the hub 🤗
Vaibhav (VB) Srivastav964,051 просмотров • 1 год назад

idk what your AGI definition is but subagents & computer use in codex is pretty close!! *video in realtime
Vaibhav (VB) Srivastav47,920 просмотров • 1 месяц назад

This tweet was sent by Codex via Computer Use
Vaibhav (VB) Srivastav42,527 просмотров • 1 месяц назад

NEW: Kokoro 82M - APACHE 2.0 licensed, Text to Speech model, trained on < 100 hours of audio 🔥
Vaibhav (VB) Srivastav330,034 просмотров • 1 год назад

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗
Vaibhav (VB) Srivastav298,739 просмотров • 1 год назад

Fuck it! You can now run *any* GGUF on the Hugging Face Hub directly with ollama 🔥 This has been a constant ask from the community, starting today you can point to any of the 45,000 GGUF repos on the Hub* *Without any changes whatsoever! ⚡ All you need to do is: ollama run hf. co/{username}/{reponame}:latest For example, to run the Llama 3.2 1B, you can run: ollama run hf. co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest If you want to run a specific quant, all you need to do is specify the Quant type: ollama run hf. co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q8_0 That's it! We'll work closely with Ollama to continue developing this further! ⚡
Vaibhav (VB) Srivastav317,372 просмотров • 1 год назад

Excited to announce the Codex App: run multiple projects and threads in one focused app! 🔥 The app natively packs a lot of features making it easier to maximise your productivity: > Worktree mode keeps changes isolated - parallel tasks without touching your checkout > Automations run in background worktrees and drop findings into your inbox > Built‑in Git review: diff, stage/revert hunks, inline comments > Integrated terminal for test, lint, git - no need to switch apps > Voice dictation: hold Ctrl+M and speak your prompt > Skills + slash commands for faster workflows. > IDE sync with auto context - ask about files you’re viewing > Local / Worktree / Cloud modes - choose where tasks run > Shared MCP config across app/CLI/IDE Bonus: For a limited time, we've doubled the rate limits across the tiers from Free all the way to Enterprise! Enjoy! 🤗
Vaibhav (VB) Srivastav72,678 просмотров • 4 месяцев назад

Fuck yeah! Llama 3.2 3B running on your browser! 100% local, powered by WebGPU & MLC 🦙
Vaibhav (VB) Srivastav281,977 просмотров • 1 год назад

Fuck it, 685B parameter, DeepSeek V3 0324 running locally on M3 Ultra, fully private 🔥 Powered by llama.cpp & dynamic quants from Unsloth AI ⚡ Step 1: brew install llama.cpp Step 2: llama-cli -hf unsloth/DeepSeek-V3-0324-GGUF:Q2_K_XL That's it! 🤗 Honestly a bit surreal to be able to chat with such a chunky model at the touch of the keyboard - future is going to be wild!!
Vaibhav (VB) Srivastav168,141 просмотров • 1 год назад

Kyutai released their Streaming Text to Speech model, ~2B param model, ultra low latency (220ms), CC-BY-4.0 license 🔥 Trained on 2.5 Million Hours of audio, it can serve up to 32 users w/ less than 350ms latency on a SINGLE L40 🤯 Incredible release by kyutai folks, go check out their hugging face page now!
Vaibhav (VB) Srivastav93,512 просмотров • 11 месяцев назад

MARS5 TTS: Open Source Text to Speech with insane prosodic control! 🔥 > Voice cloning with less than 5 seconds of audio > Two stage Auto-Regressive (750M) + Non-Auto Regressive (450M) model architecture > Used BPE tokenizer to enable control over punctuations, pauses, stops etc. > AR model predicts L0 coarse tokens, refined further by the NAR DDPM model followed by the vocoder Great job Camb AI team! Kudos for open sourcing the artifacts - looking forward to what comes next ;)
Vaibhav (VB) Srivastav162,180 просмотров • 1 год назад

WOW! DeepMind *just* dropped Magenta Real-time - Apache 2.0 licensed 🔥 > 800M params transformer, trained on ~190K hours of instrumental stock music > adapts MusicLM for real-time generation via 2s audio chunks (conditioned on prior 10s context) > 48 KHz Stereo > MusicCoCa: New joint music-text embedding model, blending MuLan and CoCa approaches > 1.25s generation time for 2s audio on free-tier Colab TPUs > style embeddings (from text/audio prompts) allow real-time morphing of genres/instruments > on-device inference, personal fine-tuning - coming soon! > model weights on Hugging Face 🤗
Vaibhav (VB) Srivastav90,219 просмотров • 11 месяцев назад

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗
Vaibhav (VB) Srivastav79,556 просмотров • 10 месяцев назад

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. > Training + Inference code released. > The processed dataset and tagging scripts were released for further research. > English only for now. Next, we're scaling the training to 50K hours and even better dataset processing! Want to help us out? DMs open! 🤗
Vaibhav (VB) Srivastav156,386 просмотров • 2 лет назад