
Lior Alexander
@LiorOnAI • 115,524 subscribers
Founder @AlphaSignalAI → the Intelligence layer of AI (300k users) • MIT Lecturer • ex-MILA researcher • In ML since GANs
Shorts
Videos

Claude can make blue1brown animations in minutes. Education is about to explode.
Lior Alexander1,411,702 Aufrufe • vor 4 Monaten

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.
Lior Alexander928,139 Aufrufe • vor 3 Jahren

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.
Lior Alexander808,070 Aufrufe • vor 3 Jahren

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.
Lior Alexander485,014 Aufrufe • vor 2 Jahren

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token
Lior Alexander344,787 Aufrufe • vor 2 Jahren

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.
Lior Alexander60,568 Aufrufe • vor 4 Monaten

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏
Lior Alexander231,551 Aufrufe • vor 2 Jahren

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K
Lior Alexander186,544 Aufrufe • vor 2 Jahren
