Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

We've released the code for LegoGPT. This autoregressive model generates physically stable and buildable designs from text prompts, by integrating physics laws and assembly constraints into LLM training and inference. This work is led by PhD students Ava Pun, Kangle Deng, Ruixuan Liu, and in collaboration with CMU faculty... show more

Jun-Yan Zhu

13,503 subscribers

38,607 Aufrufe • vor 1 Jahr •via X (Twitter)

Kunst Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

7 Kommentare

Profilbild von Or Patashnik

Or Patashnikvor 1 Jahr

@AvaLovelace0 @kangle_deng Wow, really cool!

Profilbild von Rainmaker

Rainmakervor 2 Jahren

Here I share an XGBoost model that delivers a 25% CAGR with minimal drawdown on Visa stock. In this free Substack post I share code and commentary for a powerful Machine Learning strategy that delivers powerful returns.

Profilbild von Jason Liu

Jason Liuvor 1 Jahr

@AvaLovelace0 @kangle_deng Awesome project 👍🏼. Some designs may require other orientation than from the ground up. I’m excited to learn about this!

Profilbild von Redcrown

Redcrownvor 1 Jahr

@AvaLovelace0 @kangle_deng woah, this is soo cool

Profilbild von Ant A

Ant Avor 1 Jahr

@AvaLovelace0 @kangle_deng So just GenAI every step/layer?

Profilbild von Aiden

Aidenvor 1 Jahr

@AvaLovelace0 @kangle_deng Super interesting project! We're also big believers in using natural language to create. With jenova ai, anyone can build their own custom AI apps just by describing what they need.

Profilbild von Max Zhaoshuo Li 李赵硕

Max Zhaoshuo Li 李赵硕vor 1 Jahr

@AvaLovelace0 @kangle_deng Very interesting work! Congrats!

Ähnliche Videos

LegoGPT, an LLM-based system that generates physically stable LEGO structures from text prompts, backed by a new 47,000+ sample dataset and physics-aware filtering during inference. → LegoGPT is trained on a custom dataset, StableText2Lego, which includes 47,000+ 3D LEGO models mapped to text, spanning 28,000+ unique objects. → The model predicts LEGO bricks sequentially like tokens, using next-token prediction in a transformer setup. → To ensure physical stability, LegoGPT integrates physics-aware rollback and validity filtering, pruning out structurally invalid brick placements. → The generated designs are aesthetically aligned with prompts, physically buildable, and tested both with human manual assembly and robotic arms. → The team also introduced a text-driven LEGO coloring/texturing pipeline, enabling more expressive and customized outputs. → The dataset, code, and models are all publicly released under an open-access license.

LegoGPT, an LLM-based system that generates physically stable LEGO structures from text prompts, backed by a new 47,000+ sample dataset and physics-aware filtering during inference. → LegoGPT is trained on a custom dataset, StableText2Lego, which includes 47,000+ 3D LEGO models mapped to text, spanning 28,000+ unique objects. → The model predicts LEGO bricks sequentially like tokens, using next-token prediction in a transformer setup. → To ensure physical stability, LegoGPT integrates physics-aware rollback and validity filtering, pruning out structurally invalid brick placements. → The generated designs are aesthetically aligned with prompts, physically buildable, and tested both with human manual assembly and robotic arms. → The team also introduced a text-driven LEGO coloring/texturing pipeline, enabling more expressive and customized outputs. → The dataset, code, and models are all publicly released under an open-access license.

Rohan Paul

75,268 Aufrufe • vor 1 Jahr

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

Jun-Yan Zhu

36,488 Aufrufe • vor 2 Jahren

Today, we are releasing Stable Video Diffusion, our first foundation model for generative AI video based on the image model, Stable Diffusion. As part of this research preview, the code, weights, and research paper are now available. Additionally, today you can sign up for our waitlist to access a new upcoming web experience featuring a Text-To-Video interface. To access the model & sign up for our waitlist, visit our website here:

Today, we are releasing Stable Video Diffusion, our first foundation model for generative AI video based on the image model, Stable Diffusion. As part of this research preview, the code, weights, and research paper are now available. Additionally, today you can sign up for our waitlist to access a new upcoming web experience featuring a Text-To-Video interface. To access the model & sign up for our waitlist, visit our website here:

Stability AI

1,024,498 Aufrufe • vor 2 Jahren

Wow! AI ASSISTED GARAGE MANUFACTURING IS ABOUT TO EXPLODE! CAD Drawings From Just A Picture! MIT just released something profound for creators and engineers alike. Picture this. You take a photo of an object, upload it, and an AI delivers a fully parametric CAD model, complete with editable code and construction history. This is open source GenCAD, from MIT's Decode Lab. It uses autoregressive transformers and diffusion models, trained on hundreds of thousands of images and CAD files. Input a 2D photo or sketch. Output valid CadQuery Python code that beats models like GPT-4.5 in accuracy. Why does this matter? It speeds up reverse engineering, prototyping, and part searches in vast databases. No more hours spent modeling from scratch. Field repairs, custom designs, education, all transformed. It even retrieves similar parts from libraries of thousands. For industries like manufacturing and aerospace, it cuts costs and boosts innovation. Hobbyists gain pro tools without the steep curve. I am testing it now on random objects and can not believe how much of a super power this is. I can start dozens of companies just on this AI model. This open-source gem is here: The future of building stuff arrives in a snapshot.

Wow! AI ASSISTED GARAGE MANUFACTURING IS ABOUT TO EXPLODE! CAD Drawings From Just A Picture! MIT just released something profound for creators and engineers alike. Picture this. You take a photo of an object, upload it, and an AI delivers a fully parametric CAD model, complete with editable code and construction history. This is open source GenCAD, from MIT's Decode Lab. It uses autoregressive transformers and diffusion models, trained on hundreds of thousands of images and CAD files. Input a 2D photo or sketch. Output valid CadQuery Python code that beats models like GPT-4.5 in accuracy. Why does this matter? It speeds up reverse engineering, prototyping, and part searches in vast databases. No more hours spent modeling from scratch. Field repairs, custom designs, education, all transformed. It even retrieves similar parts from libraries of thousands. For industries like manufacturing and aerospace, it cuts costs and boosts innovation. Hobbyists gain pro tools without the steep curve. I am testing it now on random objects and can not believe how much of a super power this is. I can start dozens of companies just on this AI model. This open-source gem is here: The future of building stuff arrives in a snapshot.

Brian Roemmele

121,948 Aufrufe • vor 4 Monaten

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

Santiago

44,783 Aufrufe • vor 1 Jahr

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

Max Zhaoshuo Li 李赵硕

1,078,049 Aufrufe • vor 1 Monat

We've been ramping up usage of AI tools on our design team at Coinbase 🛡️ Two examples: 1. Write a text prompt and get a figma mockup as a starting point for your design 2. Click a button and turn any figma design into front-end code The front-end code it generates adheres to our design system, and is trained on our library of UIs Shout out to 🛡️ tali krakowsky apel🛡️ Blair McKee and the team pushing this forward.

We've been ramping up usage of AI tools on our design team at Coinbase 🛡️ Two examples: 1. Write a text prompt and get a figma mockup as a starting point for your design 2. Click a button and turn any figma design into front-end code The front-end code it generates adheres to our design system, and is trained on our library of UIs Shout out to 🛡️ tali krakowsky apel🛡️ Blair McKee and the team pushing this forward.

Brian Armstrong

520,436 Aufrufe • vor 2 Jahren

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

The Humanoid Hub

11,575 Aufrufe • vor 5 Monaten

AN OXFORD STUDENT IS RUNNING A PARTICLE SIMULATION WITH REAL PEOPLE'S NAMES AND CLAIMS CERN IS TAUNTING HIM THROUGH THE CODE Thousands of particles on a black screen - each one labeled with a real person's name - moving according to the laws of physics in real time and he is completely convinced this is not a simulation but a personal message from CERN directed at him specifically. Particle simulation with collision detection, velocity vectors and brownian motion - technically flawless code that tracks every particle individually and renders trajectories at 60 fps. CERN operates a 17km collider that accelerates protons to 99.9999991% the speed of light and generates a petabyte of data every single day - and apparently found the time to encode Oxford student names into a simulation. The code is real. The physics is correct. The conclusions are a separate conversation.

AN OXFORD STUDENT IS RUNNING A PARTICLE SIMULATION WITH REAL PEOPLE'S NAMES AND CLAIMS CERN IS TAUNTING HIM THROUGH THE CODE Thousands of particles on a black screen - each one labeled with a real person's name - moving according to the laws of physics in real time and he is completely convinced this is not a simulation but a personal message from CERN directed at him specifically. Particle simulation with collision detection, velocity vectors and brownian motion - technically flawless code that tracks every particle individually and renders trajectories at 60 fps. CERN operates a 17km collider that accelerates protons to 99.9999991% the speed of light and generates a petabyte of data every single day - and apparently found the time to encode Oxford student names into a simulation. The code is real. The physics is correct. The conclusions are a separate conversation.

Noisy

5,715,181 Aufrufe • vor 2 Monaten

NVIDIA just released a new open source transcription model, Nemotron Speech ASR, designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses *three* NVIDIA open source models: - Nemotron Speech ASR - Nemotron 3 Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights, training data, training code, and inference code. This is a big deal! Jensen said in the CES keynote yesterday that he expects open source models to catch up to proprietary models this year in a number of categories. NVIDIA is putting their weight behind making this happen. (As Alan Kay said, the best way to predict the future is to invent it.) The code for this agent is open source too, of course. You can deploy it to production with Modal and Pipecat AI cloud, or run locally on an NVIDIA DGX Spark or RTX 5090.

NVIDIA just released a new open source transcription model, Nemotron Speech ASR, designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses three NVIDIA open source models: - Nemotron Speech ASR - Nemotron 3 Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights, training data, training code, and inference code. This is a big deal! Jensen said in the CES keynote yesterday that he expects open source models to catch up to proprietary models this year in a number of categories. NVIDIA is putting their weight behind making this happen. (As Alan Kay said, the best way to predict the future is to invent it.) The code for this agent is open source too, of course. You can deploy it to production with Modal and Pipecat AI cloud, or run locally on an NVIDIA DGX Spark or RTX 5090.

kwindla

274,474 Aufrufe • vor 6 Monaten

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

AI at Meta

150,222 Aufrufe • vor 1 Jahr

Super clean and efficient meshes by an AI? YES! The typical 3D Generative AI solutions produce lots of artifacts and usually way to many polygons due to volumetric approaches. In comparison “MeshGPT creates triangle meshes by autoregressively sampling from a transformer model that has been trained to produce tokens from a learned geometric vocabulary. These tokens can then be decoded into the faces of a triangle mesh. This method generates clean, coherent, and compact meshes, characterized by sharp edges and high fidelity.” Surely it is limited by the trained vocabulary but various versions can be trained for specific sets to create generative model libraries for certain object groups. Very promising approach with the high quality.

Super clean and efficient meshes by an AI? YES! The typical 3D Generative AI solutions produce lots of artifacts and usually way to many polygons due to volumetric approaches. In comparison “MeshGPT creates triangle meshes by autoregressively sampling from a transformer model that has been trained to produce tokens from a learned geometric vocabulary. These tokens can then be decoded into the faces of a triangle mesh. This method generates clean, coherent, and compact meshes, characterized by sharp edges and high fidelity.” Surely it is limited by the trained vocabulary but various versions can be trained for specific sets to create generative model libraries for certain object groups. Very promising approach with the high quality.

René Schulte

20,772 Aufrufe • vor 2 Jahren

You say it. A robot builds it. 🗣️ MIT researchers just showed a system where spoken language turns directly into physical objects. Say “I want a simple stool,” and a robotic arm designs and assembles it in minutes. 🪑 The pipeline is wild but elegant: speech → language model → 3D generative design → voxelized structure → robotic assembly. No CAD, no programming, no manufacturing expertise needed. Unlike 3D printing, this is fast, modular, and reversible. Objects are built from reusable parts, meaning they can be disassembled and turned into something else later. This feels like an early glimpse of “physical AI” becoming an interface: humans describe intent, machines handle geometry, planning, and fabrication. Read more about the paper here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

You say it. A robot builds it. 🗣️ MIT researchers just showed a system where spoken language turns directly into physical objects. Say “I want a simple stool,” and a robotic arm designs and assembles it in minutes. 🪑 The pipeline is wild but elegant: speech → language model → 3D generative design → voxelized structure → robotic assembly. No CAD, no programming, no manufacturing expertise needed. Unlike 3D printing, this is fast, modular, and reversible. Objects are built from reusable parts, meaning they can be disassembled and turned into something else later. This feels like an early glimpse of “physical AI” becoming an interface: humans describe intent, machines handle geometry, planning, and fabrication. Read more about the paper here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

32,926 Aufrufe • vor 7 Monaten

You might have seen the WuBOT performing at the 2026 Spring Festival Gala; however, most high-dynamic extreme motions you see are executed by overfitted tracking policies. Until now, training a unified policy capable of performing various extreme motions with a high success rate remained an unsolved challenge. We spent an entire year digging into the barrier between general tracking and extreme physical behaviors. After burning through dozens of G1 robots, we finally identified the bottleneck of learning and physical executability. With these discoveries, we developed OmniXtreme: the first general policy that can execute diverse extreme motions, including consecutive flips, extreme balancing, and even breakdancing with rapid contact switches! This capability is achieved by pre-training a flow-based generative control policy and then post-training with actuation-aware residual RL for complex physical dynamics—a step we found critical for successful real-world transfer. This work is a joint collaboration with Unitree. Together, we are pushing the physical limits of humanoid robots. It is incredibly exciting to see a general "robot gymnast" and "robot breakdancer" come to life! It was also our first time publishing a paper with XingXing, which was an enlightening experience. The model checkpoints are now released—we welcome you to play with them! 📦 📄 Paper: 🌐 Project: 💻 Code:

You might have seen the WuBOT performing at the 2026 Spring Festival Gala; however, most high-dynamic extreme motions you see are executed by overfitted tracking policies. Until now, training a unified policy capable of performing various extreme motions with a high success rate remained an unsolved challenge. We spent an entire year digging into the barrier between general tracking and extreme physical behaviors. After burning through dozens of G1 robots, we finally identified the bottleneck of learning and physical executability. With these discoveries, we developed OmniXtreme: the first general policy that can execute diverse extreme motions, including consecutive flips, extreme balancing, and even breakdancing with rapid contact switches! This capability is achieved by pre-training a flow-based generative control policy and then post-training with actuation-aware residual RL for complex physical dynamics—a step we found critical for successful real-world transfer. This work is a joint collaboration with Unitree. Together, we are pushing the physical limits of humanoid robots. It is incredibly exciting to see a general "robot gymnast" and "robot breakdancer" come to life! It was also our first time publishing a paper with XingXing, which was an enlightening experience. The model checkpoints are now released—we welcome you to play with them! 📦 📄 Paper: 🌐 Project: 💻 Code:

Siyuan Huang

107,008 Aufrufe • vor 4 Monaten

Building and prototyping games is easier than ever with Sidekick. Our designer built a 3D version of this OpenAI GPT-o1 game demo in <1 hour. Used Autodesk Maya for assets, Unity for engine, @bezi_3d Sidekick for scripting & implementation. Sidekick analyzes your project for real-time context of the asset folder, scene objects, components, and more. This allows it to implement simple mechanics—like movement controls and basic physics—for your characters and objects in seconds. It's the AI tool built by game developers, for game developers.

Building and prototyping games is easier than ever with Sidekick. Our designer built a 3D version of this OpenAI GPT-o1 game demo in <1 hour. Used Autodesk Maya for assets, Unity for engine, @bezi_3d Sidekick for scripting & implementation. Sidekick analyzes your project for real-time context of the asset folder, scene objects, components, and more. This allows it to implement simple mechanics—like movement controls and basic physics—for your characters and objects in seconds. It's the AI tool built by game developers, for game developers.

Bezi

15,295 Aufrufe • vor 1 Jahr

Very well said here by Shervin Hajipour: "Just don't forget that we all want one thing, and our goal is one thing, and that is for this country to turn into a better place to live. And we are all trying in some way. Maybe these small ways have differences with each other, and we have small differences of opinion. But we are not supposed to tear ourselves apart on social media and fight with each other because of this difference of opinion. Because there is a minority that is taking maximum advantage of this lack of solidarity among us, and we really shouldn't give them this opportunity."

Very well said here by Shervin Hajipour: "Just don't forget that we all want one thing, and our goal is one thing, and that is for this country to turn into a better place to live. And we are all trying in some way. Maybe these small ways have differences with each other, and we have small differences of opinion. But we are not supposed to tear ourselves apart on social media and fight with each other because of this difference of opinion. Because there is a minority that is taking maximum advantage of this lack of solidarity among us, and we really shouldn't give them this opportunity."

Sina Toossi

14,877 Aufrufe • vor 7 Monaten

I am testing the amazing Enoch by Mike HealthRanger and team and it is absolutely amazing. A free local AI built on a far wider field of wisdom. My goal: get it to run offline on an iPhone! I am quantizing the model to an aggressive 2-bit level using Q2_K or Q4_K_M. We lose some resolution but we gain portability to any device. The full free version is still small but requires more hefty hardware. But look at this emergency nuclear Fallout response. The keeper-of-the-status-quo Wikipedia types would have you suffer with limited crude ways to react. This response is stellar and will save your life. I will be working on the quantized version and if it is worthy release it for free on the MIT license like the larger Enoch. I will also incorporate this model with all of my other free models as it is a goldmine of astonishing work. Thank you Mike amazing work sir.

I am testing the amazing Enoch by Mike HealthRanger and team and it is absolutely amazing. A free local AI built on a far wider field of wisdom. My goal: get it to run offline on an iPhone! I am quantizing the model to an aggressive 2-bit level using Q2_K or Q4_K_M. We lose some resolution but we gain portability to any device. The full free version is still small but requires more hefty hardware. But look at this emergency nuclear Fallout response. The keeper-of-the-status-quo Wikipedia types would have you suffer with limited crude ways to react. This response is stellar and will save your life. I will be working on the quantized version and if it is worthy release it for free on the MIT license like the larger Enoch. I will also incorporate this model with all of my other free models as it is a goldmine of astonishing work. Thank you Mike amazing work sir.

Brian Roemmele

66,592 Aufrufe • vor 9 Monaten

🚀 Excited to release LongLive 2.0! 🎬 An end-to-end infrastructure for long video generation, with FP4 and parallelism at the core of both training and inference. ⚡45.7 FPS generation speed on 5B model⚡ ✨ LongLive 2.0 supports real-video training, few-step distillation, multi-shot training/inference, sequence-parallel acceleration, NVFP4 KV cache, and async VAE decoding deployment. 🧩 To our knowledge, this is the first open-source 4-bit long video generation infra that covers both training and inference. 🙌 Welcome to check it out, try it, and share feedback! 🔗 Code: 📰 Paper: 🎥 Demo: #LongVideoGeneration #VideoGeneration #Realtime #AIInfra #EfficientAI #FP4 #Parallel #NVIDIA

🚀 Excited to release LongLive 2.0! 🎬 An end-to-end infrastructure for long video generation, with FP4 and parallelism at the core of both training and inference. ⚡45.7 FPS generation speed on 5B model⚡ ✨ LongLive 2.0 supports real-video training, few-step distillation, multi-shot training/inference, sequence-parallel acceleration, NVFP4 KV cache, and async VAE decoding deployment. 🧩 To our knowledge, this is the first open-source 4-bit long video generation infra that covers both training and inference. 🙌 Welcome to check it out, try it, and share feedback! 🔗 Code: 📰 Paper: 🎥 Demo: #LongVideoGeneration #VideoGeneration #Realtime #AIInfra #EfficientAI #FP4 #Parallel #NVIDIA

Yukang Chen

59,030 Aufrufe • vor 2 Monaten

🇺🇦 Zelensky: Today in Rivne. I started my working trip by communicating with our youth – students of a vocational college. It is very important to develop this direction, and today it is a priority for Ukraine. We discussed with students and heads of enterprises that help them find jobs after graduation, training specialists, and popularizing vocational education. The issue of increasing scholarships was also raised.

🇺🇦 Zelensky: Today in Rivne. I started my working trip by communicating with our youth – students of a vocational college. It is very important to develop this direction, and today it is a priority for Ukraine. We discussed with students and heads of enterprises that help them find jobs after graduation, training specialists, and popularizing vocational education. The issue of increasing scholarships was also raised.

MAKS 26 🇺🇦👀

11,663 Aufrufe • vor 2 Monaten