Lior Alexander's banner

Lior Alexander

@LiorOnAI • 116,247 subscribers

Founder @AlphaSignalAI (300k devs) • Ex-MILA researcher focusing on solving the explosion of information in AI.

Shorts

With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country.

With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country.

501,087 görüntüleme

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or audio recordings. Model: Demo: Paper: Sasha Rush

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or audio recordings. Model: Demo: Paper: Sasha Rush

500,721 görüntüleme

NVIDIA finally released Neuralangelo's source code! The model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real aworld objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. I selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.

NVIDIA finally released Neuralangelo's source code! The model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real aworld objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. I selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.

478,025 görüntüleme

Anthropic might've just solved Prompt Engineering. Their new "Prompt Generator" tool can turn simple descriptions into advanced prompts optimized for LLMs.

Anthropic might've just solved Prompt Engineering. Their new "Prompt Generator" tool can turn simple descriptions into advanced prompts optimized for LLMs.

261,614 görüntüleme

This is a sneak peak into the future of medicine.. GlassAI launched an LLM-based tool capable of generating a diagnosis or clinical plan based on symptoms. Also, ChatGPT recently passed the US Medical Licensing Exam. Demo: Glass Health

This is a sneak peak into the future of medicine.. GlassAI launched an LLM-based tool capable of generating a diagnosis or clinical plan based on symptoms. Also, ChatGPT recently passed the US Medical Licensing Exam. Demo: Glass Health

256,860 görüntüleme

NVIDIA just made Pandas 150x faster with zero code changes. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: Repo:

NVIDIA just made Pandas 150x faster with zero code changes. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: Repo:

194,216 görüntüleme

Meta just announced that Code Llama was now free for both research and commercial. This might the strongest competitor to ChatGPT: ▸ Can generate, explain, and debug your code ▸ Handles input 100,000 tokens ▸ Free for research + commercial use ▸ Outperforms most open models ▸ Comes in 7B, 13B, and 34B ▸ Supports Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash Available in: ▸ Foundation base models (Code Llama) ▸ Python specializations (Code Llama - Python), ▸ Instruction-following models (Code Llama - Instruct)

Meta just announced that Code Llama was now free for both research and commercial. This might the strongest competitor to ChatGPT: ▸ Can generate, explain, and debug your code ▸ Handles input 100,000 tokens ▸ Free for research + commercial use ▸ Outperforms most open models ▸ Comes in 7B, 13B, and 34B ▸ Supports Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash Available in: ▸ Foundation base models (Code Llama) ▸ Python specializations (Code Llama - Python), ▸ Instruction-following models (Code Llama - Instruct)

196,384 görüntüleme

NVIDIA just released a very impressive text-to-video paper. Video Latent Diffusion Models (Video LDMs) use a diffusion model in a compressed latent space to generate high-resolution videos. Here's a brief overview of how it works: 1. Pre-train image LDM on a dataset of images. 2. Turn the image LDM into a Video LDM by adding temporal layers to model video frames. 3. Fine-tune the Video LDM on encoded video sequences to create a video generator. 4. Temporally align diffusion model upsamplers to generate high-resolution videos. 5. Validate Video LDM on real driving videos of 512x1024 resolution, achieving state-of-the-art performance. 6. Apply the approach in creative content creation with text-to-video modeling. Paper: Project:

NVIDIA just released a very impressive text-to-video paper. Video Latent Diffusion Models (Video LDMs) use a diffusion model in a compressed latent space to generate high-resolution videos. Here's a brief overview of how it works: 1. Pre-train image LDM on a dataset of images. 2. Turn the image LDM into a Video LDM by adding temporal layers to model video frames. 3. Fine-tune the Video LDM on encoded video sequences to create a video generator. 4. Temporally align diffusion model upsamplers to generate high-resolution videos. 5. Validate Video LDM on real driving videos of 512x1024 resolution, achieving state-of-the-art performance. 6. Apply the approach in creative content creation with text-to-video modeling. Paper: Project:

158,558 görüntüleme

You can run full browser automations for AI agents without worrying about Chrome, Puppeteer, or infrastructure. Steel is an open-source browser API that wraps Chrome, manages sessions, handles proxies, and exposes everything via a REST API or SDKs.

You can run full browser automations for AI agents without worrying about Chrome, Puppeteer, or infrastructure. Steel is an open-source browser API that wraps Chrome, manages sessions, handles proxies, and exposes everything via a REST API or SDKs.

67,928 görüntüleme

JUST IN: Bard, Google's ChatGPT, is now available in the US and UK, with more countries to come. Waitlist:

JUST IN: Bard, Google's ChatGPT, is now available in the US and UK, with more countries to come. Waitlist:

101,809 görüntüleme

Quick tip, you can use pip-chill instead of pip freeze to get the packages you are actually using. I always wondered why my requirements.txt was so long. 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 >> 𝚛𝚎𝚚𝚞𝚒𝚛𝚎𝚖𝚎𝚗𝚝𝚜.𝚝𝚡𝚝

Quick tip, you can use pip-chill instead of pip freeze to get the packages you are actually using. I always wondered why my requirements.txt was so long. 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 >> 𝚛𝚎𝚚𝚞𝚒𝚛𝚎𝚖𝚎𝚗𝚝𝚜.𝚝𝚡𝚝

97,946 görüntüleme

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Claude can make blue1brown animations in minutes. Education is about to explode.

Claude can make blue1brown animations in minutes. Education is about to explode.

1,412,795 görüntüleme • 5 ay önce

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

1,222,086 görüntüleme • 2 yıl önce

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.

928,209 görüntüleme • 3 yıl önce

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

808,102 görüntüleme • 3 yıl önce

I just came across the most realistic text-to-audio model I've ever seen. You can even clone your voice. The audiobook industry is about to change forever. Demo: from ElevenLabs

I just came across the most realistic text-to-audio model I've ever seen. You can even clone your voice. The audiobook industry is about to change forever. Demo: from ElevenLabs

749,777 görüntüleme • 3 yıl önce

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.

485,074 görüntüleme • 2 yıl önce

One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source.

One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source.

270,967 görüntüleme • 1 yıl önce

3. Code understanding/debugging via voice commands

3. Code understanding/debugging via voice commands

414,816 görüntüleme • 2 yıl önce

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

630,453 görüntüleme • 3 yıl önce

This is the most impressive feature of the new bing. The GPT browser can understand and summarize a 15-page PDF in seconds. You can now ask for the the key takeaways of each page and chat about the content of the document.

This is the most impressive feature of the new bing. The GPT browser can understand and summarize a 15-page PDF in seconds. You can now ask for the the key takeaways of each page and chat about the content of the document.

600,029 görüntüleme • 3 yıl önce

Google AI just announced the PaLM API! It will be released with a new tool called MakerSuite, which lets you prototype ideas, do prompt engineering, synthetic data generation and custom-model tuning. Waitlist available soon.

Google AI just announced the PaLM API! It will be released with a new tool called MakerSuite, which lets you prototype ideas, do prompt engineering, synthetic data generation and custom-model tuning. Waitlist available soon.

525,932 görüntüleme • 3 yıl önce

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token

344,801 görüntüleme • 2 yıl önce

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

326,434 görüntüleme • 2 yıl önce

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

290,190 görüntüleme • 3 yıl önce

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.

61,122 görüntüleme • 6 ay önce

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

231,608 görüntüleme • 3 yıl önce

Microsoft is moving FAST. You can now vibe code with GitHub Copilot. They're rolling our Agent mode and MCP support to all VS Code users

Microsoft is moving FAST. You can now vibe code with GitHub Copilot. They're rolling our Agent mode and MCP support to all VS Code users

114,592 görüntüleme • 1 yıl önce

Adobe just added their first Generative AI tool to Photoshop! Big milestone. Generative Fill allows you to extend images as well as add and remove objects using simple text prompts.

Adobe just added their first Generative AI tool to Photoshop! Big milestone. Generative Fill allows you to extend images as well as add and remove objects using simple text prompts.

221,290 görüntüleme • 3 yıl önce

This is big. The Retrieval Plugin allows ChatGPT to have a memory! The model can now remember information from conversations and store it in the retrieval plugin for later use. This feature is a must if you want to develop GPT-based tools.

This is big. The Retrieval Plugin allows ChatGPT to have a memory! The model can now remember information from conversations and store it in the retrieval plugin for later use. This feature is a must if you want to develop GPT-based tools.

227,586 görüntüleme • 3 yıl önce

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

186,560 görüntüleme • 2 yıl önce