Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

How does an AI model actually learn to see? 🤖 Learn about the tech behind native multimodality, how models reason over visual data like documents and video, and the future of proactive AI assistants with Logan Kilpatrick and Gemini Model Behavior Product Lead, Ani Baddepudi. ↓ Timestamps: 01:12 Why...

58,703 Aufrufe • vor 11 Monaten •via X (Twitter)

11 Kommentare

Profilbild von Google AI
Google AIvor 11 Monaten

@AniBaddepudi Watch the full episode here:

Profilbild von Mobile Scanner
Mobile Scannervor 11 Monaten

Scan any documents, convert images into text, PDF files, etc. 👍

Profilbild von Fabio Lauria
Fabio Lauriavor 11 Monaten

@OfficialLoganK @AniBaddepudi AI’s ability to process multimodal data is captivating. It transforms how we interact with technology, bridging gaps between visual perception and reasoning. Excited for the insights from this discussion. #AIFuture

Profilbild von Reji Modiyil
Reji Modiyilvor 11 Monaten

@OfficialLoganK @AniBaddepudi @GoogleAI, the blending of ai and visual data opens incredible possibilities for innovation.

Profilbild von Cheatify
Cheatifyvor 11 Monaten

@OfficialLoganK @AniBaddepudi @GoogleAI, the evolution of ai vision is fascinating – excited to dive deeper into this topic.

Profilbild von AIMEME
AIMEMEvor 11 Monaten

@OfficialLoganK @AniBaddepudi "AI models learn to see through a combination of advanced technology and continuous learning, paving the way for proactive AI assistants in the future."

Profilbild von Smart AI Stash
Smart AI Stashvor 11 Monaten

@OfficialLoganK @AniBaddepudi Can’t wait for AI to start critiquing my interior design choices: ‘I can see this is a living room, but why did you choose that couch?’ 😅

Profilbild von ^innerly
^innerlyvor 11 Monaten

@OfficialLoganK @AniBaddepudi this ain’t just code, it’s a glimpse at us living next to ai not just staring at screens but actually vibing with the damn thing

Profilbild von Roark Syntax
Roark Syntaxvor 11 Monaten

@OfficialLoganK @AniBaddepudi Neat. #RoarkSyntax

Profilbild von abdelhadi
abdelhadivor 11 Monaten

@OfficialLoganK @AniBaddepudi Like so i can come back

Profilbild von Confident Security
Confident Securityvor 11 Monaten

@OfficialLoganK @AniBaddepudi Fascinating topic—just remember that when a model “sees,” it also remembers unless we design for ephemerality. Teaching AI vision should come with equal lessons in how to forget.

Ähnliche Videos

Explore state-of-the-art multimodal prompting in our new short course Large Multimodal Model Prompting with Gemini, taught by Erwin Huizenga in collaboration with Google Cloud. One interesting insight from this course: with multimodal models, prompt structure matters significantly. Placing text inputs, such as a patient's medical history, before image inputs, like an X-ray, can enhance the model's ability to contextualize and interpret visual data effectively. In other contexts, such as image captioning, you may get better results by putting the image first. Multimodal models behave differently than text-only LLMs, and effective prompting for models varies depending on the model you’re using. In this course you’ll learn how to effectively prompt Gemini models. Gemini's multimodal capabilities also enable new approaches in AI application development, for example: - The Gemini library handles various video formats (MP4, MOV, MPEG), streamlining applications using these formats. - Large context window (up to 1 million tokens) enables processing of extensive content, like analyzing multiple 50-minute videos simultaneously. - Function calling feature integrates real-time data (e.g., current exchange rates) into model responses. The course demonstrates building multimodal applications with real-world examples including document analyzers that reason across text and graphs simultaneously, video content extractors that find and timestamp specific information from multiple hours of footage, and automated expense report systems processing receipt images while cross-referencing company policies. Sign up here:

Andrew Ng

73,915 Aufrufe • vor 1 Jahr