Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Andrew Ng

1,635,862 subscribers

397,751 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Образование Здоровье и велнес

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Andrew Ng

Andrew Ng1 год назад

You can also play with the demo here:

Фото профиля Breadcrumb

Breadcrumb1 год назад

Looking to automate reporting? Use AI agents to turn spreadsheets to reports in minutes without any coding.

Фото профиля Edrick🕗

Edrick🕗1 год назад

Agentic workflows for computer vision makes so much sense

Фото профиля Inforida

Inforida1 год назад

Agentic Object Detection sounds fascinating. The ability to reason without labeled data is a game changer. Imagine applying this to educational tools to enhance learning experiences, making AI-powered learning more intuitive. Keep pushing the boundaries of innovation.

Фото профиля jc_stack

jc_stack1 год назад

Have you tested this against more complex scenarios like partially occluded objects or under varied lighting conditions? Really curious about edge cases and performance degradation patterns.

Фото профиля Marian Veteanu

Marian Veteanu1 год назад

Super cool! This has lots of applications!

Фото профиля Lets go Seahawks 🇺🇦

Lets go Seahawks 🇺🇦1 год назад

i asked it to detect rectangle in batsman picture and it cant find it.

Фото профиля Lets go Seahawks 🇺🇦

Lets go Seahawks 🇺🇦1 год назад

and also, what's #23 wearing? isnt it a hat?

Фото профиля NEXUS AI Solutions

NEXUS AI Solutions1 год назад

That's fascinating! Using agentic workflows to detect objects without labeled data could revolutionize how we approach image recognition tasks. How do you think this technology could be adapted for real-time applications like autonomous vehicles?

Фото профиля Nimaano

Nimaano1 год назад

Its amazing

Фото профиля Andrew Ng

Andrew Ng1 год назад

Thanks!

Похожие видео

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

In Prompt Engineering for Vision Models, taught by Abby Jacques Verre and Caleb Kaiser of Comet , you’ll learn how to prompt and fine-tune vision models for personalized image generation, image editing, object detection and segmentation. The prompts you'll use for vision models could be text, point coordinates, or bounding boxes, depending on the model. You'll also learn to tune hyperparameters to shape the output. Models you'll use include Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. You'll also learn to fine-tune Stable Diffusion to generate personalized images (say, an image of a specific person), using a handful of images for training. As an example of a multi-step workflow, you'll use OWL-ViT to detect an object based on a text prompt, then pass the bounding box to SAM to create a segmentation mask, and input that mask into Stable Diffusion to replace the original object with a new one based on a text prompt. Controlling vision models can be tricky; this course will teach prompting and fine-tuning techniques to get precise control over their output. Get started here:

Andrew Ng

151,198 просмотров • 2 лет назад

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

2. Use the generated image as an image prompt for Runway or any video model that supports End-Frames. Use the image as the final/last frame and create a prompt that reveals the image in an ink-style animation. Example prompt:

Halim Alrasihi

14,596 просмотров • 1 год назад

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Announcing: Agentic Document Extraction! PDF files represent information visually - via layout, charts, graphs, etc. - and are more than just text. Unlike traditional OCR and most PDF-to-text approaches, which focus on extracting the text, an agentic approach lets us break a document down into components and reason about them, resulting in more accurate extraction of the underlying meaning for RAG and other applications. Watch the video for details.

Andrew Ng

689,130 просмотров • 1 год назад

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Goodbye NotebookLM. Introducing the World’s First Full Agentic AI Pods - Generate Any Professional Podcasts with One Prompt! Free to Use for Everyone!

Eric Jing

8,093,173 просмотров • 11 месяцев назад

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

So what does Razorpay look like in the agentic era? We started with the basics at FTX’26. • onboarding handled by an AI agent (~3 mins) • integration created from a prompt (~2 mins) • d̶a̶s̶h̶b̶o̶a̶r̶d̶ just talk to an agent for everything The first agentic fintech platform. And we timed it live. Live Demo ↓ (More to come)

Harshil Mathur

81,617 просмотров • 3 месяцев назад

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

It's Pikachu time. I Built an agentic AI workflow to tell me exactly what trending video to make for the day. It scrapes viral trending data and automatically craft the prompts to use. Here's what it created Today 🐭🟡 See below an example of the prompts used 👇

Solo 👑

4,035,697 просмотров • 6 месяцев назад

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,886 просмотров • 3 месяцев назад

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

It's Minion Time. I Built an agentic AI workflow to automatically generate prompts for a trending AI video. It scrapes viral trending data and automatically crafts the full video from A-to-Z. Here's what it created Today 👇

Solo 👑

6,647,686 просмотров • 6 месяцев назад

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

Akshay 🚀

98,126 просмотров • 1 год назад

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

INTRODUCING Notte Building the agentic internet with the strongest web browser for LLM agents. We transform ANY webpage into structured text, enabling better web understanding and navigation. Plug any LLM to to build your own AI agent

Notte

225,195 просмотров • 1 год назад

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

🔥 INTRODUCING: ACE >>> The Agentic Context Engineer ACE is an AI Agent that improves context for any AI Agent. Context gets better every time you use ACE. No fine-tuning. No manual prompt editing. Just use ACE and your AI Agents get smarter. 🧵 👇

Dan McAteer

21,233 просмотров • 4 месяцев назад

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

introducing astra: an agentic browser for your phone this is an early beta release, it includes: > sign in with anthropic > no data collection > a full computer use agent astra is free to use. link below. ps: 10% of profit goes to potocki

kyon

31,237 просмотров • 6 месяцев назад

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield AI 🧩

450,394 просмотров • 18 дней назад

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

🌎Neural AI's Agentic World Building $NEURAL offers advanced 3D asset generation from text or image inputs, alongside material and texture generation tools. Soon, we’ll release a complete character pipeline for NFTs and other applications. Our ultimate goal is agentic world building, creating fully realized virtual levels with minimal input. Starting from a heightmap, the process includes automated landscape painting, biome-based vegetation, and area segmentation for objects and NPCs. Users can provide feedback at key steps, shaping elements like NPCs, equipment, and ambient sounds. The first phase of this revolutionary workflow is launching soon. 🖥 Discover the full details in the video!

NeuralAI

14,504 просмотров • 1 год назад

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Gradio

135,968 просмотров • 1 год назад

Never write a smart contract again. Introducing the first agentic workflow protocol.

Never write a smart contract again. Introducing the first agentic workflow protocol.

Halliday

359,301 просмотров • 1 год назад

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

New I2V workflow for car videos. Uses image aware prompt enhancer to tune a specialized contact sheet prompt. Load in any car image and the workflow spits out hype. Tips, full prompts, workflow links in thread.

willie

58,962 просмотров • 5 месяцев назад

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Introducing the Multi-Shot App. An easy way to go from a simple prompt to a thoughtfully crafted scene. All with dialogue, sound effects, intentional cuts, pacing and cinematic framing. Start from an image or go purely Text to Video for total creative exploration. Available now on the web app. See examples in the thread below.

Runway

191,345 просмотров • 2 месяцев назад

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 просмотров • 3 лет назад