
Lior Alexander
@LiorOnAI • 115,524 subscribers
Founder @AlphaSignalAI → the Intelligence layer of AI (300k users) • MIT Lecturer • ex-MILA researcher • In ML since GANs
Shorts
Videos

Claude can make blue1brown animations in minutes. Education is about to explode.
Lior Alexander1,411,702 görüntüleme • 4 ay önce

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.
Lior Alexander928,139 görüntüleme • 3 yıl önce

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.
Lior Alexander808,070 görüntüleme • 3 yıl önce

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.
Lior Alexander485,014 görüntüleme • 2 yıl önce

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token
Lior Alexander344,787 görüntüleme • 2 yıl önce

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.
Lior Alexander60,568 görüntüleme • 4 ay önce

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏
Lior Alexander231,551 görüntüleme • 2 yıl önce

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K
Lior Alexander186,544 görüntüleme • 2 yıl önce
