Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing Nova-2, our next-gen model for superhuman speech-to-text. TL;DR Nova-2 delivers: 💥 Next-level accuracy: +18% accuracy than Nova-1 & over 36% accuracy than OpenAI Whisper large 💥 Up to 40x faster 💥 Same low cost: 3-7x cheaper 🧵👇

Deepgram

10,743 subscribers

2,184,459 просмотров • 2 лет назад •via X (Twitter)

Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Deepgram

Deepgram2 лет назад

Extending upon Nova's groundbreaking training, which spanned +100 domains and 47 billion tokens, Nova-2 continues to be the deepest-trained ASR model in the world.

Фото профиля Deepgram

Deepgram2 лет назад

Nova-2 was trained in a 2-stage curriculum starting from the largest, most diverse dataset in Deepgram’s history: nearly 6M resources and an extensive library of high-quality human transcriptions. The result? 👇

Фото профиля Deepgram

Deepgram2 лет назад

A new state-of-the-art model capable of superhuman transcription performance that consistently outperforms any other STT model in the market today across a wide range of speech application domains. Onto the benchmark results…

Фото профиля Deepgram

Deepgram2 лет назад

In our benchmarking, Nova-2 has an overall WER of 8.4% for the median files tested, representing a 16.8% relative error rate improvement compared to the closest provider. Nova-2 surpassed all tested competitors by an average of 30% and outperformed OpenAI Whisper large by 36%.

Фото профиля Deepgram

Deepgram2 лет назад

Modern speech apps are increasingly used to automate real-time interactions with end users for use cases like agent assist and live captioning. But there are limited options for true real-time STT and several providers like OpenAI lack native streaming models...

Фото профиля Deepgram

Deepgram2 лет назад

...However, in our real-time accuracy benchmarking, Nova-2 handily outperforms the field with an average relative reduction in WER of 28.6% across all domains.

Фото профиля Deepgram

Deepgram2 лет назад

Regarding speed, our benchmarks reveal that Nova-2 surpasses all other STT models, achieving a median inference time of 29.8 seconds per hour of diarized audio. This represents a significant speed advantage ranging from 5-40x faster than comparable vendors offering diarization.

Фото профиля Deepgram

Deepgram2 лет назад

In terms of cost, Nova-2 maintains the same starting price as Nova at just $0.0043 per minute of pre-recorded audio, nearly 3-5x more affordable than any other full-functionality provider (based on currently listed pricing) in the market.

Фото профиля Deepgram

Deepgram2 лет назад

Since launching Nova-1 this year, we have also released new features encompassing improved speaker diarization, smart formatting, filler words support, and our inaugural domain-specific language model for summarization.

Фото профиля Deepgram

Deepgram2 лет назад

You can dive deeper into our approach to model development and the benchmarks in the full announcement. Plus, get started with Nova-2 by requesting early access. Link to announcement:

Похожие видео

MAJOR NEWS🚨 We're excited to introduce Nova⚡️ our most powerful speech-to-text model. TL;DR it surpasses competitors on: 💥Speed: 23x faster 💥Accuracy: 22% fewer errors 💥Cost: 3-7x cheaper Let’s dig in on how Nova outperforms every ASR model… 🧵

MAJOR NEWS🚨 We're excited to introduce Nova⚡️ our most powerful speech-to-text model. TL;DR it surpasses competitors on: 💥Speed: 23x faster 💥Accuracy: 22% fewer errors 💥Cost: 3-7x cheaper Let’s dig in on how Nova outperforms every ASR model… 🧵

Deepgram

1,512,443 просмотров • 3 лет назад

Nova-3, our most advanced speech-to-text, is here! ⚡️ ✅ Leads in accuracy by 54.2% (streaming) and 47.4% (pre-recorded) vs. next-best ASR ✅ New real-time multilingual capabilities ✅ Self-serve customization (vocabulary adaptation without retraining) See the benchmarks (🧵)

Nova-3, our most advanced speech-to-text, is here! ⚡️ ✅ Leads in accuracy by 54.2% (streaming) and 47.4% (pre-recorded) vs. next-best ASR ✅ New real-time multilingual capabilities ✅ Self-serve customization (vocabulary adaptation without retraining) See the benchmarks (🧵)

Deepgram

495,166 просмотров • 1 год назад

The wait is over! With 60M+ minutes transcribed, our next-gen speech-to-text model Nova-2 is now available. What's new? ✅ Expanded languages: Spanish, Hindi, German, French, Portuguese ✅ Custom model training ✅ On-prem deployment Let's dive in...🧵

The wait is over! With 60M+ minutes transcribed, our next-gen speech-to-text model Nova-2 is now available. What's new? ✅ Expanded languages: Spanish, Hindi, German, French, Portuguese ✅ Custom model training ✅ On-prem deployment Let's dive in...🧵

Deepgram

1,385,276 просмотров • 2 лет назад

Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency⁠⁠) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing capabilities For the month of October, we’re offering 50% off our Creator plan - use code OCTAVE2 at checkout!

Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency⁠⁠) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing capabilities For the month of October, we’re offering 50% off our Creator plan - use code OCTAVE2 at checkout!

Hume AI

7,077,358 просмотров • 9 месяцев назад

📢 Big release! Introducing Starlight Precise 2, a new video enhancement model that delivers next-gen REALISM. Say goodbye to plastic & artificial look. Get natural faces, skin textures, and details. Like & comment ASTRA to try now for free. More details in 🧵👇

📢 Big release! Introducing Starlight Precise 2, a new video enhancement model that delivers next-gen REALISM. Say goodbye to plastic & artificial look. Get natural faces, skin textures, and details. Like & comment ASTRA to try now for free. More details in 🧵👇

Topaz Labs

50,237 просмотров • 7 месяцев назад

💥 New Weapon Alert – Meet the Antimatter Gun! 🚀 Get ready to level up your arsenal with the Antimatter Gun, now live in BOSS FIGHTERS! This powerful weapon delivers devastating damage when used with precision. 🔥 🎯 Fire from the hip or go for accuracy with the scope 💣 Hold down the trigger for maximum impact Time to take the fight to the next level—who’s ready to dominate with the Antimatter Gun? 🎮👇

💥 New Weapon Alert – Meet the Antimatter Gun! 🚀 Get ready to level up your arsenal with the Antimatter Gun, now live in BOSS FIGHTERS! This powerful weapon delivers devastating damage when used with precision. 🔥 🎯 Fire from the hip or go for accuracy with the scope 💣 Hold down the trigger for maximum impact Time to take the fight to the next level—who’s ready to dominate with the Antimatter Gun? 🎮👇

BOSS FIGHTERS ⚡️ $BFTOKEN

26,743 просмотров • 1 год назад

AI transcribing 1 hour of audio in less than 30 seconds! Think about that! That's lightning-fast, especially when it can do that with very few mistakes. The team Deepgram released their new Nova 2 model. It's another leap forward: • 36% more accurate than OpenAI Whisper • 5-40x faster than every other alternative • 30% fewer errors than the competitors in real-time transcriptions I like to travel and can't wait for headphones with integrated live transcription and translation. We are getting close! By the way, this is the cheapest transcription service out in the market now (if you discount Whisper, which is open-source.) It costs $0.0043 to transcribe a minute of audio. When you sign up to access their API, you get $200 of credit for free, which will let you transcribe around 45,000 minutes of audio! Here is the link: I wanted to record a quick video showing their live transcription in Spanish and English. You can see it attached here. This post was sponsored by Deepgram.

AI transcribing 1 hour of audio in less than 30 seconds! Think about that! That's lightning-fast, especially when it can do that with very few mistakes. The team Deepgram released their new Nova 2 model. It's another leap forward: • 36% more accurate than OpenAI Whisper • 5-40x faster than every other alternative • 30% fewer errors than the competitors in real-time transcriptions I like to travel and can't wait for headphones with integrated live transcription and translation. We are getting close! By the way, this is the cheapest transcription service out in the market now (if you discount Whisper, which is open-source.) It costs $0.0043 to transcribe a minute of audio. When you sign up to access their API, you get $200 of credit for free, which will let you transcribe around 45,000 minutes of audio! Here is the link: I wanted to record a quick video showing their live transcription in Spanish and English. You can see it attached here. This post was sponsored by Deepgram.

Santiago

354,266 просмотров • 2 лет назад

Are you ready for the Next-Gen Fantasy Experience!🔥 The release is closer than ever... Register for the Alpha now and have the chance to win your favorite EuroLeague signed jersey🏀 To participate:👇 1. Like & Retweet 2. Follow Ultimate Champions Basketball 3. Tag your favourite player

Are you ready for the Next-Gen Fantasy Experience!🔥 The release is closer than ever... Register for the Alpha now and have the chance to win your favorite EuroLeague signed jersey🏀 To participate:👇 1. Like & Retweet 2. Follow Ultimate Champions Basketball 3. Tag your favourite player

EuroLeague

28,249 просмотров • 3 лет назад

💥Introducing FACTR 2, learning external force sensing on commodity robot arms without needing dedicated sensors. We show that learned force signals enable force-feedback teleop on low-cost arms and improve BC policies. FACTR 2 consists of: 1. Neural External Torque (NEXT): learns external forces without needing dedicated force sensors. 2. Force-Informed Re-Sampling Training (FIRST): uses the learned force signal to identify task-critical regions and upsample them during training. w/ Steven Oh Tony Tao 🧵(1/N)

💥Introducing FACTR 2, learning external force sensing on commodity robot arms without needing dedicated sensors. We show that learned force signals enable force-feedback teleop on low-cost arms and improve BC policies. FACTR 2 consists of: 1. Neural External Torque (NEXT): learns external forces without needing dedicated force sensors. 2. Force-Informed Re-Sampling Training (FIRST): uses the learned force signal to identify task-critical regions and upsample them during training. w/ Steven Oh Tony Tao 🧵(1/N)

Jason Liu

108,976 просмотров • 1 месяц назад

In feature film VFX, our lighting teams spend a lot of time shaving off as much render time as possible. Even 1 minute saved per frame adds up. For a 5 second shot, 1 minute less per frame saves you 2 hours of rendering per shot. Our shots are usually a lot heavier than this and our teams usually get a lot more optimization for render times than 1 minute per frame. Some shots can be rendering for up to 30 hours per frame depending on complexity. Large studios have a plethora of optimization tools which shave off a lot of render time from our shots. Rendering technology is always improving, bringing render times down and increasing physical accuracy. Video is of Hyperion with interactive rendering with Pixar's RenderMan XPU renderer, which is up to 5x faster than their older RIS architecture.

In feature film VFX, our lighting teams spend a lot of time shaving off as much render time as possible. Even 1 minute saved per frame adds up. For a 5 second shot, 1 minute less per frame saves you 2 hours of rendering per shot. Our shots are usually a lot heavier than this and our teams usually get a lot more optimization for render times than 1 minute per frame. Some shots can be rendering for up to 30 hours per frame depending on complexity. Large studios have a plethora of optimization tools which shave off a lot of render time from our shots. Rendering technology is always improving, bringing render times down and increasing physical accuracy. Video is of Hyperion with interactive rendering with Pixar's RenderMan XPU renderer, which is up to 5x faster than their older RIS architecture.

Rassoul Edji

16,286 просмотров • 1 год назад

Introducing ultra-fast voice typing for Mac. Not only is Almond 3x faster than the #1 cloud alternative, but it runs 100% locally on your Mac, making it infinitely more private. This is because Almond is architecturally different. Rather than sending your speech to a cloud LLM to "think" and return cleaned-up text (which is slow, unpredictable, and a privacy nightmare), Almond uses rule-based linguistic processing, enabling unparalleled speed and accuracy. Almond adapts to your writing style, supports custom vocabulary, personal shortcuts, name tagging in Slack, and file tagging for developers. Whether you've never tried voice typing or you're already using a cloud alternative, give Almond a try. You'll feel the difference immediately. Try it free for Mac today:

Introducing ultra-fast voice typing for Mac. Not only is Almond 3x faster than the #1 cloud alternative, but it runs 100% locally on your Mac, making it infinitely more private. This is because Almond is architecturally different. Rather than sending your speech to a cloud LLM to "think" and return cleaned-up text (which is slow, unpredictable, and a privacy nightmare), Almond uses rule-based linguistic processing, enabling unparalleled speed and accuracy. Almond adapts to your writing style, supports custom vocabulary, personal shortcuts, name tagging in Slack, and file tagging for developers. Whether you've never tried voice typing or you're already using a cloud alternative, give Almond a try. You'll feel the difference immediately. Try it free for Mac today:

Caleb

165,915 просмотров • 5 месяцев назад

PRODUCT UPDATE🎙 We love watching sports! 🥊 Who doesn’t? We understand that the way people consume sports has fundamentally shifted. While legacy players are stuck bickering over archaic broadcast models, they've completely missed the ball on reimagining the sports entertainment experience for the future. That's where we come in. 👊 Who's going to revolutionize how our devices interact with our viewing habits and transform the thrill of being a fan? You're looking at them. Forget incremental change – we're swinging for the fences to redefine sports engagement as you know it. Get ready to have your mind blown as we bring fans and the sports they obsess over closer than ever before. Our latest version showcases the experience within our live app across….. 💥Total strikes 💥Landed strikes 💥Accuracy of strikes 💥Force per strike 💥Fighter body map - colored by strike force All tracked in real-time using our powerhouse suite of proprietary computer vision models operating in concert to transform complex live action into clear, actionable insights. Download our app today to experience the next generation of in-fight sports entertainment!

PRODUCT UPDATE🎙 We love watching sports! 🥊 Who doesn’t? We understand that the way people consume sports has fundamentally shifted. While legacy players are stuck bickering over archaic broadcast models, they've completely missed the ball on reimagining the sports entertainment experience for the future. That's where we come in. 👊 Who's going to revolutionize how our devices interact with our viewing habits and transform the thrill of being a fan? You're looking at them. Forget incremental change – we're swinging for the fences to redefine sports engagement as you know it. Get ready to have your mind blown as we bring fans and the sports they obsess over closer than ever before. Our latest version showcases the experience within our live app across….. 💥Total strikes 💥Landed strikes 💥Accuracy of strikes 💥Force per strike 💥Fighter body map - colored by strike force All tracked in real-time using our powerhouse suite of proprietary computer vision models operating in concert to transform complex live action into clear, actionable insights. Download our app today to experience the next generation of in-fight sports entertainment!

FX1

40,095 просмотров • 2 лет назад

Introducing NovaToon: Astra Nova’s Webtoon dApp Is Now Live! 🔥 We’re thrilled to announce the official launch of NovaToon, Astra Nova’s webtoon dApp, built for the next generation of immersive storytelling and decentralized fandom. 💥 Deviants: Shards of the Forsaken is now live on the SKALE blockchain! Dive into a universe of action, lore, and cosmic mystery - where the Astra Nova saga begins its visual journey. ⚡ The NovaToon provides a decentralized access powered by SKALE, ensuring a fast, gas-free experience for comic readers. This is more than a Webtoon. It’s the future of digital storytelling in Web3.📖 Start your journey today! For more details check out our medium article: Explore the NovaToon on SKALE and be part of the Astra Nova legacy at :

Introducing NovaToon: Astra Nova’s Webtoon dApp Is Now Live! 🔥 We’re thrilled to announce the official launch of NovaToon, Astra Nova’s webtoon dApp, built for the next generation of immersive storytelling and decentralized fandom. 💥 Deviants: Shards of the Forsaken is now live on the SKALE blockchain! Dive into a universe of action, lore, and cosmic mystery - where the Astra Nova saga begins its visual journey. ⚡ The NovaToon provides a decentralized access powered by SKALE, ensuring a fast, gas-free experience for comic readers. This is more than a Webtoon. It’s the future of digital storytelling in Web3.📖 Start your journey today! For more details check out our medium article: Explore the NovaToon on SKALE and be part of the Astra Nova legacy at :

Astra Nova

71,626 просмотров • 1 год назад

💥 HeyElsa New Superboard campaign is live ⚡Earn min 0.15X points multiplier ✅Do this if u r already farming Points in Elsa 👇Steps 🔗 ✅Sign up if new 🔗 ✅Login to campaign with same wallet ✅Complete all 3 quests ✅Mint turbo pass ⛽ $2 fee ✅Do Social task ✅Do Bridge ✅Do swap 👇Rewards as below 🥇Top 10: x0.75 🏅Next 200: x0.5 🎖️Next 500: x0.25 ✨All: x0.15 🏆Ranking based on total volume 💥Done 🪂Reminder 0.6% supply is for Wallchain Quacks quacks 🎉Elsa recently received a trophy from OpenAI for passing 10 Billion Tokens 💙Like 🔁RT

💥 HeyElsa New Superboard campaign is live ⚡Earn min 0.15X points multiplier ✅Do this if u r already farming Points in Elsa 👇Steps 🔗 ✅Sign up if new 🔗 ✅Login to campaign with same wallet ✅Complete all 3 quests ✅Mint turbo pass ⛽ $2 fee ✅Do Social task ✅Do Bridge ✅Do swap 👇Rewards as below 🥇Top 10: x0.75 🏅Next 200: x0.5 🎖️Next 500: x0.25 ✨All: x0.15 🏆Ranking based on total volume 💥Done 🪂Reminder 0.6% supply is for Wallchain Quacks quacks 🎉Elsa recently received a trophy from OpenAI for passing 10 Billion Tokens 💙Like 🔁RT

CryptoTelugu

42,846 просмотров • 8 месяцев назад

Revolutionizing Bladder Cancer Diagnosis with ZayaAI We heard you Zayans, From our last videos we understood you all want a much easier to understand explanation of our AI models. So let's dive into the latest breakthrough in AI-powered healthcare, our Bladder Cancer Detection Model. This isn't just another AI tool, it's a game-changer for pathologists. Precision, Speed and Seamless Integration: Traditional diagnosis of urothelial carcinoma (bladder cancer) is often slow and prone to errors. Our AI model automatically detects low and high-grade tumors, along with lymphovascular invasion, with unmatched accuracy and speed. That means faster results, fewer mistakes and a more efficient workflow for medical professionals. Why This Matters: 1.Faster Diagnoses → AI speeds up the process, allowing pathologists to focus on critical cases. 2.Higher Accuracy → Reduces human error and reliance on costly, time-consuming tests. 3.Seamless Integration → No need for complex system overhauls. ZayaAI fits right into existing workflows. The Future of AI in Healthcare: Embracing our AI models isn’t just about using new technology, it’s an investment in a future where healthcare is more precise, efficient and patient-focused. Pathologists get to make faster, more informed decisions and ultimately, patient outcomes improve. This is how AI is redefining medical diagnostics and can help save lives. DeSci + AI = $ZAI ! 🔬⚕️

Revolutionizing Bladder Cancer Diagnosis with ZayaAI We heard you Zayans, From our last videos we understood you all want a much easier to understand explanation of our AI models. So let's dive into the latest breakthrough in AI-powered healthcare, our Bladder Cancer Detection Model. This isn't just another AI tool, it's a game-changer for pathologists. Precision, Speed and Seamless Integration: Traditional diagnosis of urothelial carcinoma (bladder cancer) is often slow and prone to errors. Our AI model automatically detects low and high-grade tumors, along with lymphovascular invasion, with unmatched accuracy and speed. That means faster results, fewer mistakes and a more efficient workflow for medical professionals. Why This Matters: 1.Faster Diagnoses → AI speeds up the process, allowing pathologists to focus on critical cases. 2.Higher Accuracy → Reduces human error and reliance on costly, time-consuming tests. 3.Seamless Integration → No need for complex system overhauls. ZayaAI fits right into existing workflows. The Future of AI in Healthcare: Embracing our AI models isn’t just about using new technology, it’s an investment in a future where healthcare is more precise, efficient and patient-focused. Pathologists get to make faster, more informed decisions and ultimately, patient outcomes improve. This is how AI is redefining medical diagnostics and can help save lives. DeSci + AI = $ZAI ! 🔬⚕️

ZayaAI

18,015 просмотров • 1 год назад

$Introducing LifeGPT, showing that LLMs can simulate complex, Turing-complete systems like Conway's Game of Life with near-perfect accuracy—no prior topology needed.🌐This unlocks new potential for AI in modeling self-organizing systems in biology, materials science, & beyond.🔬🤖 #AI #LifeGPT. Cellular Automata (CA), like Conway's Game of Life ("Life"), are computationally irreducible, meaning their evolution is difficult to predict without an a-priori understanding of the rules of the game, including the topology on which it is played. LifeGPT is a topology-agnostic generative model that learns the rules of Life without prior knowledge of its grid structure or boundary conditions, from only a tiny number of game states. The success in simulating Life suggests promising avenues for scientific discovery, particularly in bridging the gap between AI, artificial life, and real-world biological systems, for both forward and inverse problems. The potential for universal computation within generative AI, including LLMs, through approaches like LifeGPT, represents an exciting area for future research, especially when combined with reinforcement learning. Model Convergence: LifeGPT exhibits rapid convergence during training, achieving high accuracy in predicting next-game-states. We attribute the non-zero cross-entropy loss to the lack of causal relationships within randomly generated ICs. Accuracy & Temperature: LifeGPT achieves near-perfect accuracy, particularly at lower sampling temperatures, but can be continually tuned towards higher creativity to discover patterns that the original ruleset would not be able to produce. This finding highlights the trade-off between model creativity (higher temperature) and accuracy in deterministic predictions, with high relevance to model real-world dynamical systems for which no closed-form rulesets exist. Zero/Few-Shot Learning: Trained on a small fraction of possible initial conditions, LifeGPT demonstrates strong zero/few-shot learning, accurately simulating Life for unseen initial conditions. However, rare prediction errors highlight that LifeGPT approximates rather than perfectly replicates the Life algorithm. Autoregressive Autoregressor: A recursive implementation of LifeGPT demonstrates the model's ability to simulate Life over multiple timesteps. LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. The work showcases the possibility for future models to synthesize stochastic generative capabilities with deterministic computational capabilities. Link to code, paper, etc. below. Podcast generated using #NotebookLM. LAMM@MIT DMSE at MIT$

Introducing LifeGPT, showing that LLMs can simulate complex, Turing-complete systems like Conway's Game of Life with near-perfect accuracy—no prior topology needed.🌐This unlocks new potential for AI in modeling self-organizing systems in biology, materials science, & beyond.🔬🤖 #AI #LifeGPT. Cellular Automata (CA), like Conway's Game of Life ("Life"), are computationally irreducible, meaning their evolution is difficult to predict without an a-priori understanding of the rules of the game, including the topology on which it is played. LifeGPT is a topology-agnostic generative model that learns the rules of Life without prior knowledge of its grid structure or boundary conditions, from only a tiny number of game states. The success in simulating Life suggests promising avenues for scientific discovery, particularly in bridging the gap between AI, artificial life, and real-world biological systems, for both forward and inverse problems. The potential for universal computation within generative AI, including LLMs, through approaches like LifeGPT, represents an exciting area for future research, especially when combined with reinforcement learning. Model Convergence: LifeGPT exhibits rapid convergence during training, achieving high accuracy in predicting next-game-states. We attribute the non-zero cross-entropy loss to the lack of causal relationships within randomly generated ICs. Accuracy & Temperature: LifeGPT achieves near-perfect accuracy, particularly at lower sampling temperatures, but can be continually tuned towards higher creativity to discover patterns that the original ruleset would not be able to produce. This finding highlights the trade-off between model creativity (higher temperature) and accuracy in deterministic predictions, with high relevance to model real-world dynamical systems for which no closed-form rulesets exist. Zero/Few-Shot Learning: Trained on a small fraction of possible initial conditions, LifeGPT demonstrates strong zero/few-shot learning, accurately simulating Life for unseen initial conditions. However, rare prediction errors highlight that LifeGPT approximates rather than perfectly replicates the Life algorithm. Autoregressive Autoregressor: A recursive implementation of LifeGPT demonstrates the model's ability to simulate Life over multiple timesteps. LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. The work showcases the possibility for future models to synthesize stochastic generative capabilities with deterministic computational capabilities. Link to code, paper, etc. below. Podcast generated using #NotebookLM. LAMM@MIT DMSE at MIT

Markus J. Buehler

114,194 просмотров • 1 год назад

Gemini + π0 = actually useful robots! (Similar to what Physical Intelligence did with "Hi Robot") I can now verbally tell the robot that I'm building a red Lego wall or wooden tower, and it will infer the next steps by itself and pass me the necessary pieces, tools, or materials, ha! You can also just ask it to bring you things! The pipeline works as follows: - OpenAI Whisper (local) → speech to text - Gemini → makes sense of user requests, converts to robot tasks, bounding boxes, grasping points, etc. (System 2 thinking FTW!) - π0 → robotic actions The π0 was finetuned just for pick-and-place Lego bricks only, and it generalizes beautifully to all kinds of tasks. However, there's lots of room for improvement when it comes to grasping & accuracy. Things that could help: - Conditioning on grasping points - Better data collection (I'm not that great at teleop) - Lots more synthetic data and simulations

Gemini + π0 = actually useful robots! (Similar to what Physical Intelligence did with "Hi Robot") I can now verbally tell the robot that I'm building a red Lego wall or wooden tower, and it will infer the next steps by itself and pass me the necessary pieces, tools, or materials, ha! You can also just ask it to bring you things! The pipeline works as follows: - OpenAI Whisper (local) → speech to text - Gemini → makes sense of user requests, converts to robot tasks, bounding boxes, grasping points, etc. (System 2 thinking FTW!) - π0 → robotic actions The π0 was finetuned just for pick-and-place Lego bricks only, and it generalizes beautifully to all kinds of tasks. However, there's lots of room for improvement when it comes to grasping & accuracy. Things that could help: - Conditioning on grasping points - Better data collection (I'm not that great at teleop) - Lots more synthetic data and simulations

Shreyas Gite

25,485 просмотров • 1 год назад

The web was never meant to be flattened into text. Yet most web RAG systems start by parsing HTML --- a complex and lossy process. 🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels. Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly. PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%. Why? (1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals. (2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever. (3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines. Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG. Demo below 👇 Code: Paper: Playground:

The web was never meant to be flattened into text. Yet most web RAG systems start by parsing HTML --- a complex and lossy process. 🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels. Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly. PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%. Why? (1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals. (2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever. (3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines. Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG. Demo below 👇 Code: Paper: Playground:

Yichuan Wang

89,492 просмотров • 1 месяц назад

LEVEL 2 FROM FIRST WAVE? WTF? New INSANE cheese strategy to GUARANTEE priority on top lane. Introducing the TrundleDaddy Deluxe™️ (Video below) 1️⃣ Sit in mid bush and leech 1 minion 2️⃣Run straight to top lane and play around your level up timer (level 2 on first wave) 3️⃣Fight aggressively with that level up advantage to gain an advantage (kill/summoner spells/priority) 4️⃣Slow push second wave, crash third with level advantage 5️⃣Cheater reset or roam mid on this free timer 6️⃣Your mid laner loses essentially NOTHING, they will get the level 2 & 3 timer the SAME as the enemy mid laner. (Tested and verified) Will likely only work in higher elos where the enemy actually thinks about level up timers rather than in low elo where both top laners don't care eitherway. TLDR: Get level 2 on first wave and cheese your opponent into a free win. Probably works the BEST on champions with actual CC like Sett, Riven, Darius, Pantheon but can be done with ANY champion. How long before we see both top laners contesting mid experience level 1 in competitive? All credit to trundledaddy for the strategy.

LEVEL 2 FROM FIRST WAVE? WTF? New INSANE cheese strategy to GUARANTEE priority on top lane. Introducing the TrundleDaddy Deluxe™️ (Video below) 1️⃣ Sit in mid bush and leech 1 minion 2️⃣Run straight to top lane and play around your level up timer (level 2 on first wave) 3️⃣Fight aggressively with that level up advantage to gain an advantage (kill/summoner spells/priority) 4️⃣Slow push second wave, crash third with level advantage 5️⃣Cheater reset or roam mid on this free timer 6️⃣Your mid laner loses essentially NOTHING, they will get the level 2 & 3 timer the SAME as the enemy mid laner. (Tested and verified) Will likely only work in higher elos where the enemy actually thinks about level up timers rather than in low elo where both top laners don't care eitherway. TLDR: Get level 2 on first wave and cheese your opponent into a free win. Probably works the BEST on champions with actual CC like Sett, Riven, Darius, Pantheon but can be done with ANY champion. How long before we see both top laners contesting mid experience level 1 in competitive? All credit to trundledaddy for the strategy.

Kagaroo ☀️

3,407,570 просмотров • 5 месяцев назад

Learn more about Boltz-2, the new open source AI model from MIT and Recursion capable of predicting protein binding affinity with unprecedented speed, scale and accuracy. ▪️ Boltz-2 is the first model to combine structure and binding affinity prediction, approaching the accuracy of physics-based free energy perturbation (FEP) calculations while being over 1,000 times faster and less computationally expensive. 💥 The model addresses a critical bottleneck in small molecule drug discovery. ▪️ A powerful tool built on novel machine learning: “By predicting both molecular structure and binding affinity simultaneously with unprecedented speed and scale, Boltz-2 gives R&D teams a powerful tool to triage more effectively and focus resources on the most promising compounds,” says Najat Khan, PhD, Chief R&D Officer and Chief Commercial Officer at Recursion. ▪️ The added power of open source: “Because Boltz-2 is open-source, including its training code, scientists can easily adapt it for specific types of molecules, making it even more powerful as a tool to accelerate discovery," says Regina Barzilay, PhD, MIT School of Engineering Distinguished Professor for AI and Health, AI faculty lead at MIT Jameel Clinic for AI & Health and MIT CSAIL principal investigator. 👉 Learn more, read the preprint, and access Boltz-2 here: 📅 Join us for live presentations, demos and discussions: 📎 MIT Presentation (Cambridge) – Monday, June 9. 📎 NVIDIA #GTC25 (Paris) – Wednesday, June 11. 📎 Molecular Machine Learning Conference - MoML (Montreal) – Tuesday, June 17.

Learn more about Boltz-2, the new open source AI model from MIT and Recursion capable of predicting protein binding affinity with unprecedented speed, scale and accuracy. ▪️ Boltz-2 is the first model to combine structure and binding affinity prediction, approaching the accuracy of physics-based free energy perturbation (FEP) calculations while being over 1,000 times faster and less computationally expensive. 💥 The model addresses a critical bottleneck in small molecule drug discovery. ▪️ A powerful tool built on novel machine learning: “By predicting both molecular structure and binding affinity simultaneously with unprecedented speed and scale, Boltz-2 gives R&D teams a powerful tool to triage more effectively and focus resources on the most promising compounds,” says Najat Khan, PhD, Chief R&D Officer and Chief Commercial Officer at Recursion. ▪️ The added power of open source: “Because Boltz-2 is open-source, including its training code, scientists can easily adapt it for specific types of molecules, making it even more powerful as a tool to accelerate discovery," says Regina Barzilay, PhD, MIT School of Engineering Distinguished Professor for AI and Health, AI faculty lead at MIT Jameel Clinic for AI & Health and MIT CSAIL principal investigator. 👉 Learn more, read the preprint, and access Boltz-2 here: 📅 Join us for live presentations, demos and discussions: 📎 MIT Presentation (Cambridge) – Monday, June 9. 📎 NVIDIA #GTC25 (Paris) – Wednesday, June 11. 📎 Molecular Machine Learning Conference - MoML (Montreal) – Tuesday, June 17.

Recursion

15,593 просмотров • 1 год назад