Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

NVIDIA has published a paper on DREAMGEN – a powerful 4-step pipeline for generating synthetic data for humanoids that enables task and environment generalization. - Step 1: Fine-tune a video generation model using a small number of human teleoperation videos - Step 2: Prompt the fine-tuned model to turn...

12,074 Aufrufe • vor 1 Jahr •via X (Twitter)

4 Kommentare

Profilbild von J⏩
J⏩vor 1 Jahr

The complexities and sheer dirty randomness of the real world are going to eat all these bots for lunch. The 'training' is so far from reality. *Extremely* early days yet, not even remotely close to ready for the real world.

Profilbild von The Humanoid Hub
The Humanoid Hubvor 1 Jahr

Success rate of about 45% with just 7,000 synthetic neural trajectories – it's just early days. Scaling, refinements and combining other data modalities will accelerate the march of 9s.

Profilbild von VistaShares
VistaSharesvor 1 Jahr

Discover the future of AI investing. AIS delivers exposure to the companies driving the next wave of innovation—semiconductors, data centers, and AI applications. Explore the supercycle today.

Profilbild von VentureMind AI
VentureMind AIvor 1 Jahr

Love these steps!

Ähnliche Videos

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

465,704 Aufrufe • vor 1 Jahr

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 Aufrufe • vor 1 Jahr