Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Large Language Diffusion with Masking (LLaDA) are here - and their generation looks so fucking dope! 🤯 True to Yann LeCun's vision, Ditch the auto-regressive bits and approximate the language distribution via Maximum Likelihood Estimation! So cool to watch the model denoise text from tokens in real time! -...

21,394 Aufrufe • vor 1 Jahr •via X (Twitter)

12 Kommentare

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 1 Jahr

Check out the demo here:

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 1 Jahr

Model checkpoints here:

Profilbild von AssemblyAI
AssemblyAIvor 1 Jahr

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

Profilbild von Chaithanya Kumar
Chaithanya Kumarvor 1 Jahr

@ylecun @Stardust_nds check this out buddy , something that we have been discussing about Also LLaDa

Profilbild von HosseinAgha
HosseinAghavor 1 Jahr

@ylecun This is interesting! Finally a new architecture for LLMs. I don't think this solves any of the @ylecun concerns with transformer based Auto Regressive LMs. No world model. No video understanding, etc.

Profilbild von luis
luisvor 1 Jahr

@ylecun Omg , I think this idea is perfect to make the answers more precise, 🥵 hello 100 precision

Profilbild von marko.
marko.vor 1 Jahr

@ylecun Since you have to compute the whole maximum possible response length every time, what does this mean for VRAM requirements when deploying these models?

Profilbild von Futurist Avenue
Futurist Avenuevor 1 Jahr

@ylecun How does this stack up with Inception?

Profilbild von AI at Meta
AI at Metavor 1 Jahr

Llama has now been downloaded over 1 Billion times! A note to: The researchers at Meta training these models — and those building on the research in other labs. The developers and enthusiasts on r/LocalLlama, @huggingface and more; experimenting with new models and creating derivatives. The small startups and big enterprises alike who are creating a new wave of AI-powered products, built with Llama. The global AI community. Your actions speak louder than words, thank you for making it abundantly clear — a billion times over — that open source AI is how we'll create the next wave of world changing technologies, together. 🦙❤️

Profilbild von Hunyuan
Hunyuanvor 1 Jahr

Coming soon: HunYuan-T1,The first ultra-large Mamba-powered reasoning model! Stay tuned! 🚀

Profilbild von AK
AKvor 1 Jahr

Bytedance just dropped DAPO on Hugging Face An Open-Source LLM Reinforcement Learning System at Scale

Profilbild von Jeremy Howard
Jeremy Howardvor 1 Jahr

Announcing fasttransform: a Python lib that makes data transformations reversible/extensible. No more writing inverse functions to see what your model sees. Debug pipelines by actually looking at your data. Built on multi-dispatch. Work w/ @R_Dimm

Ähnliche Videos

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 Aufrufe • vor 9 Monaten

Auto regressive LLMs are officially on notice. run Gemma 4 26B diffusion gguf with llama.cpp Google just dropped DiffusionGemma-26B, and it completely flips how we generate text. instead of predicting words one by one, it generates 256 tokens in parallel using bi-directional attention. its like stable diffusion, but for language. the model starts with random text "noise" and iteratively refines and self-corrects the entire block in real-time to fix formatting and reasoning errors on the fly. since it’s a Mixture of Experts (MoE) that only activates 3.8B parameters during inference, it fits perfectly on consumer hardware. You can run the Q4_K_M quant with an 18GB VRAM budget on a single RTX 3090 or RTX 4090 with exceptional throughput. Tested on Ubuntu 22 with CUDA 13.1 using the cutting edge experimental llama.cpp branch. Here is how to compile and run it with the live terminal denoising visualizer: # 1. Clone & check out the experimental PR (#24423) - 1) git clone && cd llama.cpp -git fetch origin 2) pull/24423/head:diffusiongemma && --git checkout diffusiongemma # 2. Build with CUDA support 1) cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native 2) cmake --build build -j $(nproc) --config Release --target llama-diffusion-cli # 3. Run with live visual denoising (llama.cpp flags) ./build/bin/llama-diffusion-cli \ -m /path/to/diffusiongemma-26B-A4B-it-Q4_K_M.gguf \ -ngl 99 -cnv -n 2048 --diffusion-visual Watch the video below to see the live --diffusion-visual canvas iteratively de noising the prompt output in real time. guide and unsloth's hugging face GGUF model links are in the comments below! Is auto regressive generation officially legacy tech? Let me know what you think.

Alok

52,656 Aufrufe • vor 10 Tagen