Lior Alexander's banner

Lior Alexander

@LiorOnAI • 116,247 subscribers

Founder @AlphaSignalAI (300k devs) • Ex-MILA researcher focusing on solving the explosion of information in AI.

Shorts

With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country.

With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country.

501,087 Aufrufe

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or audio recordings. Model: Demo: Paper: Sasha Rush

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or audio recordings. Model: Demo: Paper: Sasha Rush

500,721 Aufrufe

NVIDIA finally released Neuralangelo's source code! The model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real aworld objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. I selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.

NVIDIA finally released Neuralangelo's source code! The model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real aworld objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. I selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.

478,025 Aufrufe

Anthropic might've just solved Prompt Engineering. Their new "Prompt Generator" tool can turn simple descriptions into advanced prompts optimized for LLMs.

Anthropic might've just solved Prompt Engineering. Their new "Prompt Generator" tool can turn simple descriptions into advanced prompts optimized for LLMs.

261,614 Aufrufe

This is a sneak peak into the future of medicine.. GlassAI launched an LLM-based tool capable of generating a diagnosis or clinical plan based on symptoms. Also, ChatGPT recently passed the US Medical Licensing Exam. Demo: Glass Health

This is a sneak peak into the future of medicine.. GlassAI launched an LLM-based tool capable of generating a diagnosis or clinical plan based on symptoms. Also, ChatGPT recently passed the US Medical Licensing Exam. Demo: Glass Health

256,860 Aufrufe

NVIDIA just made Pandas 150x faster with zero code changes. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: Repo:

NVIDIA just made Pandas 150x faster with zero code changes. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: Repo:

194,216 Aufrufe

Meta just announced that Code Llama was now free for both research and commercial. This might the strongest competitor to ChatGPT: ▸ Can generate, explain, and debug your code ▸ Handles input 100,000 tokens ▸ Free for research + commercial use ▸ Outperforms most open models ▸ Comes in 7B, 13B, and 34B ▸ Supports Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash Available in: ▸ Foundation base models (Code Llama) ▸ Python specializations (Code Llama - Python), ▸ Instruction-following models (Code Llama - Instruct)

Meta just announced that Code Llama was now free for both research and commercial. This might the strongest competitor to ChatGPT: ▸ Can generate, explain, and debug your code ▸ Handles input 100,000 tokens ▸ Free for research + commercial use ▸ Outperforms most open models ▸ Comes in 7B, 13B, and 34B ▸ Supports Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash Available in: ▸ Foundation base models (Code Llama) ▸ Python specializations (Code Llama - Python), ▸ Instruction-following models (Code Llama - Instruct)

196,384 Aufrufe

NVIDIA just released a very impressive text-to-video paper. Video Latent Diffusion Models (Video LDMs) use a diffusion model in a compressed latent space to generate high-resolution videos. Here's a brief overview of how it works: 1. Pre-train image LDM on a dataset of images. 2. Turn the image LDM into a Video LDM by adding temporal layers to model video frames. 3. Fine-tune the Video LDM on encoded video sequences to create a video generator. 4. Temporally align diffusion model upsamplers to generate high-resolution videos. 5. Validate Video LDM on real driving videos of 512x1024 resolution, achieving state-of-the-art performance. 6. Apply the approach in creative content creation with text-to-video modeling. Paper: Project:

NVIDIA just released a very impressive text-to-video paper. Video Latent Diffusion Models (Video LDMs) use a diffusion model in a compressed latent space to generate high-resolution videos. Here's a brief overview of how it works: 1. Pre-train image LDM on a dataset of images. 2. Turn the image LDM into a Video LDM by adding temporal layers to model video frames. 3. Fine-tune the Video LDM on encoded video sequences to create a video generator. 4. Temporally align diffusion model upsamplers to generate high-resolution videos. 5. Validate Video LDM on real driving videos of 512x1024 resolution, achieving state-of-the-art performance. 6. Apply the approach in creative content creation with text-to-video modeling. Paper: Project:

158,558 Aufrufe

You can run full browser automations for AI agents without worrying about Chrome, Puppeteer, or infrastructure. Steel is an open-source browser API that wraps Chrome, manages sessions, handles proxies, and exposes everything via a REST API or SDKs.

You can run full browser automations for AI agents without worrying about Chrome, Puppeteer, or infrastructure. Steel is an open-source browser API that wraps Chrome, manages sessions, handles proxies, and exposes everything via a REST API or SDKs.

67,928 Aufrufe

JUST IN: Bard, Google's ChatGPT, is now available in the US and UK, with more countries to come. Waitlist:

JUST IN: Bard, Google's ChatGPT, is now available in the US and UK, with more countries to come. Waitlist:

101,809 Aufrufe

Quick tip, you can use pip-chill instead of pip freeze to get the packages you are actually using. I always wondered why my requirements.txt was so long. 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 >> 𝚛𝚎𝚚𝚞𝚒𝚛𝚎𝚖𝚎𝚗𝚝𝚜.𝚝𝚡𝚝

Quick tip, you can use pip-chill instead of pip freeze to get the packages you are actually using. I always wondered why my requirements.txt was so long. 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 𝚙𝚒𝚙-𝚌𝚑𝚒𝚕𝚕 >> 𝚛𝚎𝚚𝚞𝚒𝚛𝚎𝚖𝚎𝚗𝚝𝚜.𝚝𝚡𝚝

97,946 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Claude can make blue1brown animations in minutes. Education is about to explode.

Claude can make blue1brown animations in minutes. Education is about to explode.

1,412,795 Aufrufe • vor 5 Monaten

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

1,222,086 Aufrufe • vor 2 Jahren

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.

AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.

928,209 Aufrufe • vor 3 Jahren

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

808,102 Aufrufe • vor 3 Jahren

I just came across the most realistic text-to-audio model I've ever seen. You can even clone your voice. The audiobook industry is about to change forever. Demo: from ElevenLabs

I just came across the most realistic text-to-audio model I've ever seen. You can even clone your voice. The audiobook industry is about to change forever. Demo: from ElevenLabs

749,777 Aufrufe • vor 3 Jahren

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.

OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.

485,074 Aufrufe • vor 2 Jahren

One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source.

One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source.

270,967 Aufrufe • vor 1 Jahr

3. Code understanding/debugging via voice commands

3. Code understanding/debugging via voice commands

414,816 Aufrufe • vor 2 Jahren

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

630,453 Aufrufe • vor 3 Jahren

This is the most impressive feature of the new bing. The GPT browser can understand and summarize a 15-page PDF in seconds. You can now ask for the the key takeaways of each page and chat about the content of the document.

This is the most impressive feature of the new bing. The GPT browser can understand and summarize a 15-page PDF in seconds. You can now ask for the the key takeaways of each page and chat about the content of the document.

600,029 Aufrufe • vor 3 Jahren

Google AI just announced the PaLM API! It will be released with a new tool called MakerSuite, which lets you prototype ideas, do prompt engineering, synthetic data generation and custom-model tuning. Waitlist available soon.

Google AI just announced the PaLM API! It will be released with a new tool called MakerSuite, which lets you prototype ideas, do prompt engineering, synthetic data generation and custom-model tuning. Waitlist available soon.

525,932 Aufrufe • vor 3 Jahren

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name --batch-size 2 --device-id mps --hf_token

344,801 Aufrufe • vor 2 Jahren

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

326,434 Aufrufe • vor 2 Jahren

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

290,190 Aufrufe • vor 3 Jahren

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.

You can now generate real-time speech that sounds conversational. Microsoft just open-sourced VibeVoice, a real-time text-to-speech system with ~300 ms first audio latency and streaming input. It handles long conversations without falling apart. 𝗧𝗵𝗶𝘀 𝗺𝗼𝗱𝗲𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗹𝗼𝗻𝗴, 𝗺𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗽𝗲𝗲𝗰𝗵. It produces up to 90 minutes of audio. It supports up to four distinct speakers. Turn-taking stays consistent over long sessions. 𝗜𝘁 𝘄𝗼𝗿𝗸𝘀 𝗯𝘆 𝗿𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Audio compresses into semantic and acoustic tokens. They run at 7.5 Hz instead of frame-level audio. A language model predicts structure. A diffusion head restores acoustic detail. 𝗜𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗹𝗼𝘄-𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗮𝘂𝗱𝗶𝗼. The real-time variant streams text incrementally. First speech arrives in ~300 ms. A WebSocket demo shows live generation. The code is MIT-licensed and research-only. The repo already passed 20k GitHub stars.

61,122 Aufrufe • vor 6 Monaten

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

231,608 Aufrufe • vor 3 Jahren

Microsoft is moving FAST. You can now vibe code with GitHub Copilot. They're rolling our Agent mode and MCP support to all VS Code users

Microsoft is moving FAST. You can now vibe code with GitHub Copilot. They're rolling our Agent mode and MCP support to all VS Code users

114,592 Aufrufe • vor 1 Jahr

Adobe just added their first Generative AI tool to Photoshop! Big milestone. Generative Fill allows you to extend images as well as add and remove objects using simple text prompts.

Adobe just added their first Generative AI tool to Photoshop! Big milestone. Generative Fill allows you to extend images as well as add and remove objects using simple text prompts.

221,290 Aufrufe • vor 3 Jahren

This is big. The Retrieval Plugin allows ChatGPT to have a memory! The model can now remember information from conversations and store it in the retrieval plugin for later use. This feature is a must if you want to develop GPT-based tools.

This is big. The Retrieval Plugin allows ChatGPT to have a memory! The model can now remember information from conversations and store it in the retrieval plugin for later use. This feature is a must if you want to develop GPT-based tools.

227,586 Aufrufe • vor 3 Jahren

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

186,560 Aufrufe • vor 2 Jahren