正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Andrew Ng

1,650,573 subscribers

397,772 次观看 • 1 年前 •via X (Twitter)

科学技术教育健康养生

Anya Rossi• Live Now

Private livecam show

11 条评论

Andrew Ng 的头像

Andrew Ng1 年前

You can also play with the demo here:

Breadcrumb 的头像

Breadcrumb1 年前

Looking to automate reporting? Use AI agents to turn spreadsheets to reports in minutes without any coding.

Edrick🕗 的头像

Edrick🕗1 年前

Agentic workflows for computer vision makes so much sense

Inforida 的头像

Inforida1 年前

Agentic Object Detection sounds fascinating. The ability to reason without labeled data is a game changer. Imagine applying this to educational tools to enhance learning experiences, making AI-powered learning more intuitive. Keep pushing the boundaries of innovation.

jc_stack 的头像

jc_stack1 年前

Have you tested this against more complex scenarios like partially occluded objects or under varied lighting conditions? Really curious about edge cases and performance degradation patterns.

Marian Veteanu 的头像

Marian Veteanu1 年前

Super cool! This has lots of applications!

Lets go Seahawks 🇺🇦 的头像

Lets go Seahawks 🇺🇦1 年前

i asked it to detect rectangle in batsman picture and it cant find it.

Lets go Seahawks 🇺🇦 的头像

Lets go Seahawks 🇺🇦1 年前

and also, what's #23 wearing? isnt it a hat?

NEXUS AI Solutions 的头像

NEXUS AI Solutions1 年前

That's fascinating! Using agentic workflows to detect objects without labeled data could revolutionize how we approach image recognition tasks. How do you think this technology could be adapted for real-time applications like autonomous vehicles?

Nimaano 的头像

Nimaano1 年前

Its amazing

Andrew Ng 的头像

Andrew Ng1 年前

Thanks!

相关视频

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

Andrew Ng

151,198 次观看 • 2 年前

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

Halim Alrasihi

14,596 次观看 • 1 年前

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Andrew Ng

689,160 次观看 • 1 年前

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Eric Jing

8,093,558 次观看 • 11 个月前

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

Harshil Mathur

81,735 次观看 • 3 个月前

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

Solo 👑

4,036,742 次观看 • 7 个月前

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,886 次观看 • 3 个月前

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

Solo 👑

6,647,924 次观看 • 6 个月前

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

Akshay 🚀

98,126 次观看 • 1 年前

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

Notte

225,195 次观看 • 1 年前

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

Dan McAteer

21,233 次观看 • 4 个月前

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

kyon

31,237 次观看 • 6 个月前

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield AI 🧩

453,351 次观看 • 23 天前

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

NeuralAI

14,504 次观看 • 1 年前

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Gradio

135,988 次观看 • 1 年前

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

willie

58,962 次观看 • 5 个月前

Never write a smart contract again. Introducing the first agentic workflow protocol.

Never write a smart contract again. Introducing the first agentic workflow protocol.

Halliday

359,301 次观看 • 1 年前

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Runway

191,345 次观看 • 3 个月前

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 次观看 • 3 年前