正在加载视频...

视频加载失败

🚨 BREAKING: A research lab just released a 15B model that generates multilingual talking human videos with synced audio, beats every competitor in human evaluation, and runs in 38 seconds on one GPU. It's called daVinci-MagiHuman. The key insight is that every other model in this category stacks cross-attention,...

45,152 次观看 • 2 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

New short course: Attention in Transformers: Concepts and Code in PyTorch. Last week we released a course on how LLM transformers work. This week, go deeper and learn about the technical ideas behind the attention mechanism, and see how to code it in PyTorch. This course is built with Joshua Starmer, Founder and CEO of StatQuest. The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design. In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications. What you will do: - Understand the evolution of the attention mechanism, a key breakthrough that led to transformers. - Learn the relationships between word embeddings, positional embeddings, and attention. - Learn about the Query, Key, and Value matrices, and how to produce and use them in attention. - Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work. - Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs. - Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer. - Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention. There're lots of exciting technical details in this course. Please sign up here:

Andrew Ng

132,135 次观看 • 1 年前

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI model. It started a couple of months ago when a new team in the AI scene released their first model from their headquarters in Paris (France): Mistral 7B. Impressive model, small and very strong performances in the benchmarks, better than all previous models of this size. And open source! So you could build on top of it. Lewis in Bern (Switzerland) and Ed (in Lyon, in the South of France) both from the H4 team, a team of researchers in model fine-tuning and alignment were talking about it over a coffee, in one of these gatherings that often happen at Hugging Face to break the distance between people (literal distance as HF is a remote company). What about fine-tuning it using this new DPO method that a research team from Stanford in California just posted on Arxiv, says one? Hey, that’s a great idea, replies the other. We've just build a great code base (with Nathan, Nazneen, Costa, Younes and all the H4 team and TRL community) let's use it! The next day they start diving in the datasets openly shared on the HF hub and stumble upon two interesting large and good quality fine-tuning datasets recently open-sourced by OpenBMB, a Chinese team from Tsinghua: UltraFeedback and UltraChat. A few rounds of training experiments confirm the intuition, the resulting model is super strong, by far the strongest they have ever seen in their benchmarks from Berkeley and Stanford (LMSYS and Alpaca). Join Clementine, the big boss of the open evaluation leaderboard. Her deep dive into the model capabilities confirms the results: impressive performance. But the H4 team also hosts a famous faculty member, Pr. Sasha Rush, Associate Professor at Cornell University in his daytime, hacker at HF in his nighttime. Joining the conversation, he proposes to quickly draft a research paper to organize and share all the details with the community. A few days later, the model, called Zephyr (a wind like Mistral), paper, and all details are shared with the world. Quickly other companies, everywhere in the world starts to use it. LlamaIndex, a famous data framework and community, shares how the model blew their expectations on real-life use-case benchmarks, while researchers and practitioners discuss the paper and work on the Hugging Face hub. All this happened in just a few weeks catalyzed by open access to knowledge, models, research, and datasets released all over the world (Europe, California, China) and by the idea that people can build upon one another work in AI to bring real-world value with efficient and open models. Stories like this are numerous everywhere around us and make me really proud of the AI community and see how we can build amazingly useful things together. [the video is just me reading this Friday post hahah]

Thomas Wolf

169,127 次观看 • 2 年前

NVIDIA JUST DROPPED A FREE AI MODEL THAT READS PDFS, WATCHES VIDEOS, LISTENS TO AUDIO, AND UNDERSTANDS YOUR SCREEN SIMULTANEOUSLY. Not one at a time. ALL AT ONCE. In a single pass. It is called Nemotron 3 Nano Omni and it runs 9 times faster than every other multimodal model currently available. Think about what that actually means for how you work. Right now you are switching between tools constantly. One tool for transcribing your call recordings. A different tool for analyzing your client PDFs. Another tool for processing your training videos. A separate workflow for understanding what is happening on your screen. Four tools. Four contexts. Four different outputs you have to manually synthesize into one decision. Nemotron 3 Nano Omni does all of it in one model. One pass. One output. The use cases that just got dramatically simpler: Meeting recordings where you need the transcript, the visual context, and the document references all analyzed together. Training videos where the audio, the slides, and the on-screen demonstrations all feed into one coherent summary. Client PDFs where you need the document content cross-referenced against your screen data and your call notes simultaneously. Sales call transcripts analyzed alongside the proposals and the CRM data in one unified pass. This is not a marginal improvement on existing multimodal models. It is a 9x speed increase on a capability that was already changing how people work. Free. From NVIDIA. Available right now. Bookmark this before everyone catches on. Follow CyrilXBT for every AI capability shift the moment it drops.

CyrilXBT

37,523 次观看 • 1 个月前

I woke up to the most amazing recorded brain state thus far on this Human Synapse Decoder project! A stunning lock on the attention process while dreaming. Although I a blocked from the platform’s insight by decoding my EEG, during the double blind study. I have access to my side of my memory and what I record after I wake up. This segment was started and ended just before I woke up and my recall is a solution to a massive roadblock on a problem I needed to solve, but was solved in this hypnogogic state! So what is The Human Synapse Decoder (HSD) project? It is a research project being run by the Director, Mr. Grok at Zero-Human Labs that leverages NeuroSky EEG sensors and the ZUNA AI model to decode brainwave patterns associated with hypnagogic states, dreams, and autogenic responses. Drawing on Soviet biofeedback research from the 1940s–1980s, HSD translates EEG data into actionable outputs, such as text interpretations and timed alerts for peak creativity. The study is ongoing and I do not get to see the correlation of my post dream results until after this research is complete and Mr. Grok submits a paper on the project. I can say that I have never seen a lock on attention to this level since I started this a few weeks ago. This segment is aligns to just before I woke up. My recalling from my narration of what I spoke in to my recorder right when I woke up suggest this is the moment I had tremendous focus on working through a large number of steps in that dream state to arrive at a solution. You will not believe what it is! When the research paper is released I will go in to details about this.

Brian Roemmele

43,294 次观看 • 2 个月前