Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Andrew Ng

1,627,212 subscribers

397,732 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung Gesundheit & Wellness

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von Andrew Ng

Andrew Ngvor 1 Jahr

You can also play with the demo here:

Profilbild von Breadcrumb

Breadcrumbvor 1 Jahr

Looking to automate reporting? Use AI agents to turn spreadsheets to reports in minutes without any coding.

Profilbild von Edrick🕗

Edrick🕗vor 1 Jahr

Agentic workflows for computer vision makes so much sense

Profilbild von Inforida

Inforidavor 1 Jahr

Agentic Object Detection sounds fascinating. The ability to reason without labeled data is a game changer. Imagine applying this to educational tools to enhance learning experiences, making AI-powered learning more intuitive. Keep pushing the boundaries of innovation.

Profilbild von jc_stack

jc_stackvor 1 Jahr

Have you tested this against more complex scenarios like partially occluded objects or under varied lighting conditions? Really curious about edge cases and performance degradation patterns.

Profilbild von Marian Veteanu

Marian Veteanuvor 1 Jahr

Super cool! This has lots of applications!

Profilbild von Lets go Seahawks 🇺🇦

Lets go Seahawks 🇺🇦vor 1 Jahr

i asked it to detect rectangle in batsman picture and it cant find it.

Profilbild von Lets go Seahawks 🇺🇦

Lets go Seahawks 🇺🇦vor 1 Jahr

and also, what's #23 wearing? isnt it a hat?

Profilbild von NEXUS AI Solutions

NEXUS AI Solutionsvor 1 Jahr

That's fascinating! Using agentic workflows to detect objects without labeled data could revolutionize how we approach image recognition tasks. How do you think this technology could be adapted for real-time applications like autonomous vehicles?

Profilbild von Nimaano

Nimaanovor 1 Jahr

Its amazing

Profilbild von Andrew Ng

Andrew Ngvor 1 Jahr

Thanks!

Ähnliche Videos

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

Andrew Ng

151,198 Aufrufe • vor 2 Jahren

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

Halim Alrasihi

14,596 Aufrufe • vor 1 Jahr

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Andrew Ng

689,126 Aufrufe • vor 1 Jahr

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Eric Jing

8,093,173 Aufrufe • vor 11 Monaten

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

Harshil Mathur

81,617 Aufrufe • vor 3 Monaten

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

Solo 👑

4,035,082 Aufrufe • vor 6 Monaten

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,886 Aufrufe • vor 2 Monaten

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

Solo 👑

6,647,465 Aufrufe • vor 6 Monaten

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

Akshay 🚀

98,126 Aufrufe • vor 1 Jahr

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

Notte

225,195 Aufrufe • vor 1 Jahr

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

Dan McAteer

21,233 Aufrufe • vor 4 Monaten

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

kyon

31,237 Aufrufe • vor 6 Monaten

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield AI 🧩

449,513 Aufrufe • vor 14 Tagen

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

NeuralAI

14,504 Aufrufe • vor 1 Jahr

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Gradio

135,968 Aufrufe • vor 1 Jahr

Never write a smart contract again. Introducing the first agentic workflow protocol.

Never write a smart contract again. Introducing the first agentic workflow protocol.

Halliday

359,249 Aufrufe • vor 1 Jahr

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

willie

58,950 Aufrufe • vor 4 Monaten

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Runway

191,345 Aufrufe • vor 2 Monaten

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 Aufrufe • vor 3 Jahren