Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

NVIDIA has published a paper on DREAMGEN – a powerful 4-step pipeline for generating synthetic data for humanoids that enables task and environment generalization. - Step 1: Fine-tune a video generation model using a small number of human teleoperation videos - Step 2: Prompt the fine-tuned model to turn... a single real image into new AI-imagined videos - Step 3: Automatically label actions in the generated videos - Step 4: Train a robot AI model with the labeled synthetic dataset This enabled humanoid robots to perform 22 novel behaviors – such as pouring, opening/closing articulated objects, and manipulating a variety of tools. The original teleoperation dataset only included pick-and-place tasks. This takes task extensibility to another level without requiring human teleoperation for every single task. The pipeline will be made open-source soon. Project page:show more

The Humanoid Hub

57,483 subscribers

12,074 просмотров • 1 год назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 4

Фото профиля J⏩

J⏩1 год назад

The complexities and sheer dirty randomness of the real world are going to eat all these bots for lunch. The 'training' is so far from reality. *Extremely* early days yet, not even remotely close to ready for the real world.

Фото профиля The Humanoid Hub

The Humanoid Hub1 год назад

Success rate of about 45% with just 7,000 synthetic neural trajectories – it's just early days. Scaling, refinements and combining other data modalities will accelerate the march of 9s.

Фото профиля VistaShares

VistaShares1 год назад

Discover the future of AI investing. AIS delivers exposure to the companies driving the next wave of innovation—semiconductors, data centers, and AI applications. Explore the supercycle today.

Фото профиля VentureMind AI

VentureMind AI1 год назад

Love these steps!

Похожие видео

A humanoid robot policy trained solely on synthetic data generated by a world model. Research Scientist Joel Jang presents NVIDIA's DreamGen pipeline: ⦿ Post-train the world model Cosmos-Predict2 with a small set of real teleoperation demos. ⦿ Prompt the world model to generate synthetic video data with verbs and scenarios not used in the world model’s post-training. ⦿ Auto-label synthetic video data with action sequences. ⦿ Train robot policies using only synthetic data. That's it. Deploy zero-shot to a real humanoid robot.

A humanoid robot policy trained solely on synthetic data generated by a world model. Research Scientist Joel Jang presents NVIDIA's DreamGen pipeline: ⦿ Post-train the world model Cosmos-Predict2 with a small set of real teleoperation demos. ⦿ Prompt the world model to generate synthetic video data with verbs and scenarios not used in the world model’s post-training. ⦿ Auto-label synthetic video data with action sequences. ⦿ Train robot policies using only synthetic data. That's it. Deploy zero-shot to a real humanoid robot.

The Humanoid Hub

20,968 просмотров • 1 год назад

The problem with humanoid teleoperation is that it is expensive and difficult to scale Enter NVIDIA's EgoScale: - A VLA model pretrained on thousands hours of egocentric human videos. - Mid-trained via 50 hours of human + 4 hours of robot "play" data for human-robot alignment. - Fine-tuned with very few examples of task-specific robot teleoperation (100 or fewer per task). - Successfully transfers across 5-finger (Sharpa) and 3-finger (Unitree G1) robot hands. - Performance scales predictably as data increases.

The problem with humanoid teleoperation is that it is expensive and difficult to scale Enter NVIDIA's EgoScale: - A VLA model pretrained on thousands hours of egocentric human videos. - Mid-trained via 50 hours of human + 4 hours of robot "play" data for human-robot alignment. - Fine-tuned with very few examples of task-specific robot teleoperation (100 or fewer per task). - Successfully transfers across 5-finger (Sharpa) and 3-finger (Unitree G1) robot hands. - Performance scales predictably as data increases.

The Humanoid Hub

44,441 просмотров • 5 месяцев назад

The World Model as NEO's Cognitive Core 1X has revealed a major AI development where the NEO humanoid can translate any natural language prompt into robotic action. It demonstrates this capability even for novel tasks, objects, and environments not found in its robot dataset. - the 1X World Model is trained on internet-scale human interaction videos and fine-tuned with robot data to ground its understanding in physics and in NEO's embodiment - from a simple voice or text prompt, the world model generates a visualization of future actions - a built-in inverse dynamics model then translates these into precise motor movements for NEO

The World Model as NEO's Cognitive Core 1X has revealed a major AI development where the NEO humanoid can translate any natural language prompt into robotic action. It demonstrates this capability even for novel tasks, objects, and environments not found in its robot dataset. - the 1X World Model is trained on internet-scale human interaction videos and fine-tuned with robot data to ground its understanding in physics and in NEO's embodiment - from a simple voice or text prompt, the world model generates a visualization of future actions - a built-in inverse dynamics model then translates these into precise motor movements for NEO

The Humanoid Hub

68,453 просмотров • 6 месяцев назад

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Santiago

84,747 просмотров • 1 год назад

Did you know you can create captivating motion sync videos on Yapper Follow the steps below and push your creativity boundaries Step 1: Go to on your browser step 2: Click on videos and select Motion Sync Step 3: Select Base Video The video's movement will be used to animate the reference image. Step 4: Add Reference Image Upload an image of a character to be animated by the base video. Step 5: Describe Your Vision Write a prompt that guides the background of the output video. Step 6: Generate Select your model and the output video will be generated in real-time

Did you know you can create captivating motion sync videos on Yapper Follow the steps below and push your creativity boundaries Step 1: Go to on your browser step 2: Click on videos and select Motion Sync Step 3: Select Base Video The video's movement will be used to animate the reference image. Step 4: Add Reference Image Upload an image of a character to be animated by the base video. Step 5: Describe Your Vision Write a prompt that guides the background of the output video. Step 6: Generate Select your model and the output video will be generated in real-time

Kuria | AI

46,344 просмотров • 3 месяцев назад

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

466,261 просмотров • 1 год назад

Today, we're introducing SimFoundry, our real2sim2real framework at NVIDIA GEAR that automatically turns real-world scenes into simulation-ready worlds from a single image or video. Website: Paper: This work marks a major step for our team toward leveraging simulations and synthetic data for foundation model training and systematic policy evaluation at scale. Code will be open-sourced soon. Stay tuned!

Today, we're introducing SimFoundry, our real2sim2real framework at NVIDIA GEAR that automatically turns real-world scenes into simulation-ready worlds from a single image or video. Website: Paper: This work marks a major step for our team toward leveraging simulations and synthetic data for foundation model training and systematic policy evaluation at scale. Code will be open-sourced soon. Stay tuned!

Yuke Zhu

48,530 просмотров • 1 месяц назад

SkildAI, which is developing a foundational AI model for robots, has released a new video of their humanoid robot walking over objects and exploring environments it had never seen before. The robot is using vision & adapts in real time. SkildAI: "From raw images and joint feedback, the model directly outputs low-level motor commands. This single neural network enables humanoid robots to seamlessly walk across flat ground, climb stairs, and step over obstacles without any planning, mapping, or manual switching between behaviors. For all of our testing in this video, we deployed the robot in each new environment without any prior planning or mapping. We took off the humanoid’s shoes so you can hear it.

SkildAI, which is developing a foundational AI model for robots, has released a new video of their humanoid robot walking over objects and exploring environments it had never seen before. The robot is using vision & adapts in real time. SkildAI: "From raw images and joint feedback, the model directly outputs low-level motor commands. This single neural network enables humanoid robots to seamlessly walk across flat ground, climb stairs, and step over obstacles without any planning, mapping, or manual switching between behaviors. For all of our testing in this video, we deployed the robot in each new environment without any prior planning or mapping. We took off the humanoid’s shoes so you can hear it.

Sawyer Merritt

69,297 просмотров • 11 месяцев назад

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

The Humanoid Hub

11,575 просмотров • 5 месяцев назад

NVIDIA just introduced Cosmos, a platform for world foundation models designed for robotics. ⦿ It features advanced tokenizers, an AI-accelerated data pipeline, and integration with NVIDIA Omniverse. Humanoid makers 1X, Figure, and Agility are among the first to adopt Cosmos. ⦿ Cosmos generates synthetic, physics-based data, accelerating model training and customization. ⦿ It also features a CUDA-accelerated data processing pipeline that enables developers to process, curate, and label 20 million hours of videos in 14 days using the NVIDIA Blackwell platform.

NVIDIA just introduced Cosmos, a platform for world foundation models designed for robotics. ⦿ It features advanced tokenizers, an AI-accelerated data pipeline, and integration with NVIDIA Omniverse. Humanoid makers 1X, Figure, and Agility are among the first to adopt Cosmos. ⦿ Cosmos generates synthetic, physics-based data, accelerating model training and customization. ⦿ It also features a CUDA-accelerated data processing pipeline that enables developers to process, curate, and label 20 million hours of videos in 14 days using the NVIDIA Blackwell platform.

The Humanoid Hub

129,481 просмотров • 1 год назад

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

RoboHub🤖

43,752 просмотров • 5 месяцев назад

What happens when robot world models learn from human experience at scale? 🤔 DreamDojo from NVIDIA Research is a generalist robot world model pretrained on 44K hours of egocentric human videos and then post-trained on robot data to generalize across new objects and environments. After distillation, it runs at 10 FPS for live teleoperation, policy evaluation, and model-based planning. Read the ICML paper to learn more 📄

What happens when robot world models learn from human experience at scale? 🤔 DreamDojo from NVIDIA Research is a generalist robot world model pretrained on 44K hours of egocentric human videos and then post-trained on robot data to generalize across new objects and environments. After distillation, it runs at 10 FPS for live teleoperation, policy evaluation, and model-based planning. Read the ICML paper to learn more 📄

NVIDIA Robotics

22,413 просмотров • 26 дней назад

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 просмотров • 2 лет назад

🤔 Ever wondered if simulation-based animation/avatar learnings can be applied to real humanoid in real-time? 🤖 Introducing H2O (Human2HumanOid): - 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework - 💃 Scalable retargeting and training using large human motion dataset - 🎥 With just an RGB camera, everyone can teleoperate a full-sized humanoid to perform actions like pick and place, walking, kicking, boxing, etc - 💡Unleash the potential of humanoids with human cognitive skills and adaptability 🔗: 📄: 🎬: H2O proposes: - A scalable retargeting framework for obtaining large-scale humanoid motion dataset, intelligently filtering out infeasible motion for the humanoid embodiment. - We train a full-body motion imitator (similar to PHC) and deploy to the real world zero-shot. - Using this framework, we enable real-time teleoperation of a humanoid via a human operator and webcam, performing skills such as pick and place, kicking, walking strollers, etc. Team: Tairan He Zhengyi “Zen” Luo Wenli Xiao @ChongZitaZhang Kris Kitani Changliu Liu Guanya Shi

Zhengyi “Zen” Luo

47,305 просмотров • 2 лет назад

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the Gradio demo. Paper: Code: Demo: This is a joint work with Gaurav Parmar (the leading author), Taesung Park, and Srinivasa Narasimhan. This work shows that a pre-trained one-step model can be easily adapted to conditional GANs frameworks for downstream image editing and synthesis tasks. #Edges2Cats

Jun-Yan Zhu

36,488 просмотров • 2 лет назад

For decades, we’ve dreamed of robots that can seamlessly step into our world and lend a hand. Today, we take a major stride toward making that dream a reality: Introducing Gemini Robotics 2 from Google DeepMind, the intelligence layer powering the next generation of truly adaptable robots. This major advance unlocks intelligent whole-body control, advanced dexterity, and even multi-robot collaboration 🤯. Ok but... how does a robot actually "think"? Real-world tasks take time and planning. To manage that complexity, our new embodied reasoning model, Gemini Robotics ER 2, acts as the robot’s high-level brain, enhancing the robot’s capabilities to: — Observe the environment — Reason about the actions needed to complete the task — Coordinate with the vision-language-action model to carry out actions — Track progress until the job is done This setup allows robots to execute complex multi-step workflows, self-correct if a step fails, and adapt to completely novel situations. Learn more about Gemini Robotics ER 2 (and our two other brand new models) here:

Google AI

340,549 просмотров • 2 дней назад

Training humanoid robots on teleoperation alone won’t scale, says Rhoda AI CEO Jagdeep Singh. Unlike self-driving cars - which basically have four actuators (left, right, speed up, and slow down) and operate in a single environment (the road) - humanoid robots are completely different: "You're dealing with the full dexterity of a human hand - 20 degrees of freedom per hand. Every object is different. Every type of task is different." The problem isn't just the quantity of data - the bigger issue is diversity in the data, and it's why many humanoid robot demos will struggle to adapt to the real world: "If all the data you have is data that you've intentionally collected, then you almost by definition haven't seen the corner cases. You haven't seen all those edge scenarios that cause failure." Jagdeep says teleoperation is useful for fine-tuning robot behavior, but for pretraining, it's completely inadequate.

Training humanoid robots on teleoperation alone won’t scale, says Rhoda AI CEO Jagdeep Singh. Unlike self-driving cars - which basically have four actuators (left, right, speed up, and slow down) and operate in a single environment (the road) - humanoid robots are completely different: "You're dealing with the full dexterity of a human hand - 20 degrees of freedom per hand. Every object is different. Every type of task is different." The problem isn't just the quantity of data - the bigger issue is diversity in the data, and it's why many humanoid robot demos will struggle to adapt to the real world: "If all the data you have is data that you've intentionally collected, then you almost by definition haven't seen the corner cases. You haven't seen all those edge scenarios that cause failure." Jagdeep says teleoperation is useful for fine-tuning robot behavior, but for pretraining, it's completely inadequate.

TBPN

19,218 просмотров • 4 месяцев назад

A 19-euro tool just replaced 8,000 Meta employees. Here’s the exact workflow a TikTok creator used to build AI influencer videos from scratch — no camera, no team, no budget. Step 1. Open Pinterest. Search “Girl Self.” Save the image you like. Open Nanobanana Pro. Step 2. Swap the face using a single prompt. (Link at the bottom.) Step 3. Take your generated photo into Crust. Currently the easiest video-from-image tool on the market. Step 4. Create 3 nodes: reference image, text, video generator. Step 5. Drop your photo into the image node. Write a video prompt into the text field. This is what makes the photo move. Step 6. Connect the nodes. Text to text-in. Image to image-in. Select Kling 2.6. Set duration. Toggle sound. Step 7. Hit generate. 2 prompts. 1 AI girl. 0 employees. Meta spent $14B on Reality Labs last year to build digital humans. This creator built one in 7 minutes for less than a dinner. The prompts are free in the reply.

A 19-euro tool just replaced 8,000 Meta employees. Here’s the exact workflow a TikTok creator used to build AI influencer videos from scratch — no camera, no team, no budget. Step 1. Open Pinterest. Search “Girl Self.” Save the image you like. Open Nanobanana Pro. Step 2. Swap the face using a single prompt. (Link at the bottom.) Step 3. Take your generated photo into Crust. Currently the easiest video-from-image tool on the market. Step 4. Create 3 nodes: reference image, text, video generator. Step 5. Drop your photo into the image node. Write a video prompt into the text field. This is what makes the photo move. Step 6. Connect the nodes. Text to text-in. Image to image-in. Select Kling 2.6. Set duration. Toggle sound. Step 7. Hit generate. 2 prompts. 1 AI girl. 0 employees. Meta spent $14B on Reality Labs last year to build digital humans. This creator built one in 7 minutes for less than a dinner. The prompts are free in the reply.

RGK

186,524 просмотров • 2 месяцев назад

Let's build a dashboard to evaluate and monitor your Agentic and RAG apps! . . In this video, I'll guide you through creating an evaluation and observability pipeline for your AI apps using a 100% open-source tool! Tech Stack: - Comet's Opik to eval and monitor - LlamaIndex to build a RAG pipeline - Ragas for synthetic datagen You'll learn: - Setting up Opik - Building a RAG pipeline - Creating an eval dataset - Evaluating the RAG pipeline - Monitoring all activities during the process It's a hands on demo with code and step-by-step guide to do everything listed above. CometML's Opik is fully open-source, offering the most Pythonic and easiest way to monitor LLM apps. I have shared link to their repo in next tweet!

Let's build a dashboard to evaluate and monitor your Agentic and RAG apps! . . In this video, I'll guide you through creating an evaluation and observability pipeline for your AI apps using a 100% open-source tool! Tech Stack: - Comet's Opik to eval and monitor - LlamaIndex to build a RAG pipeline - Ragas for synthetic datagen You'll learn: - Setting up Opik - Building a RAG pipeline - Creating an eval dataset - Evaluating the RAG pipeline - Monitoring all activities during the process It's a hands on demo with code and step-by-step guide to do everything listed above. CometML's Opik is fully open-source, offering the most Pythonic and easiest way to monitor LLM apps. I have shared link to their repo in next tweet!

Akshay 🚀

20,058 просмотров • 1 год назад