Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

We built Talos - a full CNN inference engine running directly on silicon. Every multiply, buffer, and data path lives as real digital logic on the FPGA. This is what deep learning looks like when the model becomes hardware👇

luthira

2,733 subscribers

92,200 görüntüleme • 3 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

What happens when physical props meet real-time VFX? This deep dive looks at Muzzle Report, custom hardware that connects directly to Unreal Engine—giving actors something real to perform with and creates instant feedback on set. Watch the full deep dive:

What happens when physical props meet real-time VFX? This deep dive looks at Muzzle Report, custom hardware that connects directly to Unreal Engine—giving actors something real to perform with and creates instant feedback on set. Watch the full deep dive:

Unreal Engine

31,855 görüntüleme • 4 ay önce

First light! 🔥 ao486 boots DOS on the #TangConsole 138K. HDD, VGA, BIOS, and VGA BIOS are all up and running — watching it hit the DOS prompt on real hardware is surreal. #FPGA #RetroComputing 🎥👇

First light! 🔥 ao486 boots DOS on the #TangConsole 138K. HDD, VGA, BIOS, and VGA BIOS are all up and running — watching it hit the DOS prompt on real hardware is surreal. #FPGA #RetroComputing 🎥👇

nand2mario

15,343 görüntüleme • 9 ay önce

.Reiner Pope's new blackboard lecture goes all the way down: how AI training and inference are built up from logic gates on silicon. He walks me through a 4-bit multiply-accumulate by hand, and shows how that primitive is the foundation for the matrix multiplies in training runs.

.Reiner Pope's new blackboard lecture goes all the way down: how AI training and inference are built up from logic gates on silicon. He walks me through a 4-bit multiply-accumulate by hand, and shows how that primitive is the foundation for the matrix multiplies in training runs.

Dwarkesh Patel

38,959 görüntüleme • 29 gün önce

Georgia Tech built a free tool that animates every data structure and algorithm in real time this is what DSA study looks like when it actually makes sense

Georgia Tech built a free tool that animates every data structure and algorithm in real time this is what DSA study looks like when it actually makes sense

Vaishnavi

48,890 görüntüleme • 2 ay önce

Every man lives a similar life path. The difference in outcome is determined by what we do on the path.

Every man lives a similar life path. The difference in outcome is determined by what we do on the path.

Andrew Tate

262,102 görüntüleme • 1 yıl önce

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running Andrej Karpathy microGPT, spelling out names on a tiny LCD. This is GateGPT 👇

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running Andrej Karpathy microGPT, spelling out names on a tiny LCD. This is GateGPT 👇

Fabio Guzman

715,257 görüntüleme • 7 gün önce

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

NuNet 🌐

87,520 görüntüleme • 2 ay önce

If the engine is strong enough, you should be able to build real products on top of it. That's the whole point of LTX-2.3. Introducing LTX Desktop. A fully local, open-source video editor running directly on the LTX engine, optimized for NVIDIA GPUs and compatible hardware.

If the engine is strong enough, you should be able to build real products on top of it. That's the whole point of LTX-2.3. Introducing LTX Desktop. A fully local, open-source video editor running directly on the LTX engine, optimized for NVIDIA GPUs and compatible hardware.

LTX

953,407 görüntüleme • 3 ay önce

When the PS Vita was first revealed as the NGP, Kojima showed up on stage to present a demo of Metal Gear Solid 4 running on the handheld “This game used the model data and environments from PS3, and it was exported directly to NGP. On NGP, we can enjoy the same quality as PS3"

When the PS Vita was first revealed as the NGP, Kojima showed up on stage to present a demo of Metal Gear Solid 4 running on the handheld “This game used the model data and environments from PS3, and it was exported directly to NGP. On NGP, we can enjoy the same quality as PS3"

Radec

97,868 görüntüleme • 3 ay önce

🕹️ Is this the smallest language model in the world? I just managed to squeeze JAM, real artificial intelligence into 30 kilobytes, running on a 1979 Atari 800. Just A Model. Fully generative, deterministic language model, powered by a neural network and built to run on 8‑bit hardware. Atari Forever.

🕹️ Is this the smallest language model in the world? I just managed to squeeze JAM, real artificial intelligence into 30 kilobytes, running on a 1979 Atari 800. Just A Model. Fully generative, deterministic language model, powered by a neural network and built to run on 8‑bit hardware. Atari Forever.

Marek Spanel

113,535 görüntüleme • 2 ay önce

Logic Destroyer. ULX3S in hand. Oakleys on. Texture Gouraud Shading running on his own KianV RISC V SoC. Witness this is what Logic Destroyer looks like.

Logic Destroyer. ULX3S in hand. Oakleys on. Texture Gouraud Shading running on his own KianV RISC V SoC. Witness this is what Logic Destroyer looks like.

asic destroyer

15,351 görüntüleme • 9 ay önce

The Chat Room isn’t just a concept. It runs real AIR tasks, in real time, and we’re showing it. Watch an Agent process public data, structure it, and return actionable answers in seconds. This is what it looks like when data becomes usable. This is the Chat Room in action. 🎬 🔗

The Chat Room isn’t just a concept. It runs real AIR tasks, in real time, and we’re showing it. Watch an Agent process public data, structure it, and return actionable answers in seconds. This is what it looks like when data becomes usable. This is the Chat Room in action. 🎬 🔗

Teneo Protocol

68,313 görüntüleme • 7 ay önce

🚨 This is Atlas Brief — my Iran War Cost Tracker. $1,213,532,473 spent so far. Live. Updated in real time from public data. $2,546 per second. $220 million per day. Built on USASpending[.]gov, Congress[.]gov, and DoD contract data. Every number is auditable. Every source is cited. This is what institutional intelligence looks like — Data. Atlas Brief tracks the war they don’t want you to put a price tag on.

🚨 This is Atlas Brief — my Iran War Cost Tracker. $1,213,532,473 spent so far. Live. Updated in real time from public data. $2,546 per second. $220 million per day. Built on USASpending[.]gov, Congress[.]gov, and DoD contract data. Every number is auditable. Every source is cited. This is what institutional intelligence looks like — Data. Atlas Brief tracks the war they don’t want you to put a price tag on.

Brian Allen

713,982 görüntüleme • 3 ay önce

The Machine That Learns The Law Behind The Data A very very interesting US Patent US10963540B2 - Physics Informed Learning Machine describes a learning system that does not begin with data alone. It begins with a physical model, usually written as a differential equation (or PDE) dx/dt = f(x,t) A normal Machine Learning model sees scattered data and tries to fit it. A physics-informed learning machine starts with a law. Then it treats the data as evidence that updates what the model believes about the physical system. For this application, I use the patent idea on NASA C-MAPSS Turbofan engine data. The machine watches multivariate telemetry from a degrading engine and infers a hidden health state that is not measured directly. From that posterior belief, it estimates the engine’s remaining useful life. In the main 3D scene, the engine lifetime is turned into a tunnel. The spiral ribbons are real sensor channels evolving over cycle-time. The glowing core is the inferred health state. The surrounding cloud is uncertainty. The orange wall ahead is the predicted failure horizon. So the big picture is: sensor evidence comes in, posterior belief tightens, and the machine moves from uncertainty toward a concrete failure prediction. The inset posteriors make that explicit. The health posterior shows where the model believes the hidden engine condition sits at the current moment, and how sharply it believes it. The RUL posterior shows the same idea for remaining life... early on it is broad, later it shifts left and narrows as the machine becomes more certain about how close failure is. This idea is not limited to engines. The same idea can apply to data centers, CPUs, GPUs, cooling systems, power grids, robotics, batteries, and any machine that produces telemetry while obeying physical constraints. In an age where machine learning runs on massive hardware infrastructure, this kind of model matters: it can turn noisy sensor streams into early warnings before expensive systems fail.

The Machine That Learns The Law Behind The Data A very very interesting US Patent US10963540B2 - Physics Informed Learning Machine describes a learning system that does not begin with data alone. It begins with a physical model, usually written as a differential equation (or PDE) dx/dt = f(x,t) A normal Machine Learning model sees scattered data and tries to fit it. A physics-informed learning machine starts with a law. Then it treats the data as evidence that updates what the model believes about the physical system. For this application, I use the patent idea on NASA C-MAPSS Turbofan engine data. The machine watches multivariate telemetry from a degrading engine and infers a hidden health state that is not measured directly. From that posterior belief, it estimates the engine’s remaining useful life. In the main 3D scene, the engine lifetime is turned into a tunnel. The spiral ribbons are real sensor channels evolving over cycle-time. The glowing core is the inferred health state. The surrounding cloud is uncertainty. The orange wall ahead is the predicted failure horizon. So the big picture is: sensor evidence comes in, posterior belief tightens, and the machine moves from uncertainty toward a concrete failure prediction. The inset posteriors make that explicit. The health posterior shows where the model believes the hidden engine condition sits at the current moment, and how sharply it believes it. The RUL posterior shows the same idea for remaining life... early on it is broad, later it shifts left and narrows as the machine becomes more certain about how close failure is. This idea is not limited to engines. The same idea can apply to data centers, CPUs, GPUs, cooling systems, power grids, robotics, batteries, and any machine that produces telemetry while obeying physical constraints. In an age where machine learning runs on massive hardware infrastructure, this kind of model matters: it can turn noisy sensor streams into early warnings before expensive systems fail.

Mathelirium

17,696 görüntüleme • 1 ay önce

After 8+ years on the Tesla Autopilot team and 3 years at Intel, I started Apex Compute to design a new architecture for efficient AI inference. For the past 9 months, we’ve been building our custom inference accelerator. Today we’re releasing Unified Engine v1. Last June we raised our seed round with Maxitech , DeepFin Research, Soma Capital and an incredible group of angel investors. In less than 9 months, we completed our RTL architecture and brought our first pre-silicon prototype to life on FPGA. Our architecture combines systolic array and vector processing in a single compute engine with multiple architectural optimizations, achieving very high FLOPs utilization. A single engine is super lean and it uses less than 90K LUTs and 1 MB Block RAM. It may also be one of the smallest logic-footprint compute engines developed so far. Our Unified Engine v1 supports: -matrix-matrix multiplication (~95% FLOPs utilization) -softmax (~90% FLOPs utilization) -broadcast and element-wise operations -RMSNorm / LayerNorm -block quantization/dequantization (fp4, int4) -multi-engine synchronization and many other operations. We even implemented memory-efficient attention similar to FlashAttention, reaching ~90% FLOP utilization. Full benchmarks and the software stack are available on our GitHub: We have basic compiler written in Python and it supports PyTorch tensors directly to easily test and transfer tensors between the accelerator and host using bf16, fp4 and int4 formats. Our FPGA prototype can already run LLM inference and outperform NVIDIA Jetson Orin Nano, even on a mid-tier FPGA setup (6.4x lower memory bandwidth, 18% slower clock speed at 4.5 Watts). Check the side-by-side comparison video below. Our GitHub includes low-level operator implementations, examples for tiled matrix multiplication, operation chaining, tensor parallelism, attention kernel and a full Gemma 3 1B model implementation. Many more models(Vision Transformers and VLA) are coming soon. Our accelerator IP is AXI-ready for deployment on any AMD(Xilinx) FPGA platform today. Even better, our two-engine prototype runs on an entry-level AMD(Xilinx) FPGA as a PCIe accelerator card. You can purchase it here for $50 to experiment our pre-silicon prototype on your desktop PC or Raspberry Pi 5. We will be releasing hardware bitstream updates as the architecture gets new features. More to come soon! We are expanding our team and looking for compiler engineers and floating-point hardware design engineers. If you're interested, please send me a DM.

After 8+ years on the Tesla Autopilot team and 3 years at Intel, I started Apex Compute to design a new architecture for efficient AI inference. For the past 9 months, we’ve been building our custom inference accelerator. Today we’re releasing Unified Engine v1. Last June we raised our seed round with Maxitech , DeepFin Research, Soma Capital and an incredible group of angel investors. In less than 9 months, we completed our RTL architecture and brought our first pre-silicon prototype to life on FPGA. Our architecture combines systolic array and vector processing in a single compute engine with multiple architectural optimizations, achieving very high FLOPs utilization. A single engine is super lean and it uses less than 90K LUTs and 1 MB Block RAM. It may also be one of the smallest logic-footprint compute engines developed so far. Our Unified Engine v1 supports: -matrix-matrix multiplication (~95% FLOPs utilization) -softmax (~90% FLOPs utilization) -broadcast and element-wise operations -RMSNorm / LayerNorm -block quantization/dequantization (fp4, int4) -multi-engine synchronization and many other operations. We even implemented memory-efficient attention similar to FlashAttention, reaching ~90% FLOP utilization. Full benchmarks and the software stack are available on our GitHub: We have basic compiler written in Python and it supports PyTorch tensors directly to easily test and transfer tensors between the accelerator and host using bf16, fp4 and int4 formats. Our FPGA prototype can already run LLM inference and outperform NVIDIA Jetson Orin Nano, even on a mid-tier FPGA setup (6.4x lower memory bandwidth, 18% slower clock speed at 4.5 Watts). Check the side-by-side comparison video below. Our GitHub includes low-level operator implementations, examples for tiled matrix multiplication, operation chaining, tensor parallelism, attention kernel and a full Gemma 3 1B model implementation. Many more models(Vision Transformers and VLA) are coming soon. Our accelerator IP is AXI-ready for deployment on any AMD(Xilinx) FPGA platform today. Even better, our two-engine prototype runs on an entry-level AMD(Xilinx) FPGA as a PCIe accelerator card. You can purchase it here for $50 to experiment our pre-silicon prototype on your desktop PC or Raspberry Pi 5. We will be releasing hardware bitstream updates as the architecture gets new features. More to come soon! We are expanding our team and looking for compiler engineers and floating-point hardware design engineers. If you're interested, please send me a DM.

Hasan

37,366 görüntüleme • 3 ay önce

a5k: Another World hardware VM on the ice40 UP5K #fpga (5280 LUTs, 128KB spram), now running and playable on the magnificent #mch2022 badge! Hard to capture, but it looks great. Really fun to play in this format. (written in #Silice)

a5k: Another World hardware VM on the ice40 UP5K #fpga (5280 LUTs, 128KB spram), now running and playable on the magnificent #mch2022 badge! Hard to capture, but it looks great. Really fun to play in this format. (written in #Silice)

Sylvain Lefebvre

18,439 görüntüleme • 3 yıl önce

Congrats to the Kimi.ai team! This is awesome. Great to see this level of research coming from open-source frontier model labs. I liked the paper so much I built a Rust implementation of it ;) Full AttnRes + Block AttnRes with two-phase inference, built using Burn (tensor library and Deep Learning Framework, in Rust, by Tracel AI). Runs on CPU, CUDA, Metal, wgpu. Includes an interactive TUI that trains a model live and visualizes depth attention evolving from uniform to selective in real time. Repo link and more on what is implemented in the comments.

Congrats to the Kimi.ai team! This is awesome. Great to see this level of research coming from open-source frontier model labs. I liked the paper so much I built a Rust implementation of it ;) Full AttnRes + Block AttnRes with two-phase inference, built using Burn (tensor library and Deep Learning Framework, in Rust, by Tracel AI). Runs on CPU, CUDA, Metal, wgpu. Includes an interactive TUI that trains a model live and visualizes depth attention evolving from uniform to selective in real time. Repo link and more on what is implemented in the comments.

abdel

94,633 görüntüleme • 3 ay önce

Certain people love cherry picking and showing off an badly bugged model in our game before the art refresh. They wont show you what she looks like now. This is a rigged model running in engine.

Certain people love cherry picking and showing off an badly bugged model in our game before the art refresh. They wont show you what she looks like now. This is a rigged model running in engine.

Grummz

158,608 görüntüleme • 1 yıl önce

Gabbit Robotics co-founder Nicole Mattero on why the next AI winners won't fight over your screen, they'll fight to get closer to you: She kicks things off with what she admits is a hot take: "My hot take is that hardware is the only moat going forward because anyone's going to be able to spin up a model and make it really good and really cheap." Her logic is simple. Once a strong model is cheap and easy for anyone to build, the model alone can't set you apart. So what actually protects you? "Where does that leave you? Basically companies that have tactile modes such as hardware are the only ones that actually survive." Hardware, though, is only one piece of her argument. The second piece is data: "The models that win in the long run are the ones where private data is their [moat]." From there, Mattero ties the two together. Hardware gives a company one moat. The ability to gather unique personal data gives it another. Put both in place and the edge starts to multiply: "When you have a company that has hardware as a [moat], but then B is able to collect really unique personal data, that also becomes a [moat]." And that, she says, is the reason so many foundation model companies are reaching beyond the screen and into the physical world: "That's why I think a lot of these foundation models are like, let's get something in the home or on the person. It gives them an edge on both fronts."

Gabbit Robotics co-founder Nicole Mattero on why the next AI winners won't fight over your screen, they'll fight to get closer to you: She kicks things off with what she admits is a hot take: "My hot take is that hardware is the only moat going forward because anyone's going to be able to spin up a model and make it really good and really cheap." Her logic is simple. Once a strong model is cheap and easy for anyone to build, the model alone can't set you apart. So what actually protects you? "Where does that leave you? Basically companies that have tactile modes such as hardware are the only ones that actually survive." Hardware, though, is only one piece of her argument. The second piece is data: "The models that win in the long run are the ones where private data is their [moat]." From there, Mattero ties the two together. Hardware gives a company one moat. The ability to gather unique personal data gives it another. Put both in place and the edge starts to multiply: "When you have a company that has hardware as a [moat], but then B is able to collect really unique personal data, that also becomes a [moat]." And that, she says, is the reason so many foundation model companies are reaching beyond the screen and into the physical world: "That's why I think a lot of these foundation models are like, let's get something in the home or on the person. It gives them an edge on both fronts."

Big Brain AI

19,195 görüntüleme • 18 gün önce

$TAO is tackling the core challenges in machine learning, from optimizing data collection to faster inference, aiming to outperform giants like OpenAI and Microsoft. With a focus on computational efficiency and robust models, the race is on.

$TAO is tackling the core challenges in machine learning, from optimizing data collection to faster inference, aiming to outperform giants like OpenAI and Microsoft. With a focus on computational efficiency and robust models, the race is on.

Grayscale

88,595 görüntüleme • 1 yıl önce