Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📲 Gemma 3n delivers advanced on-device AI with optimized performance and multimodal understanding. Take a closer look at the early preview text capabilities available in Google AI Studio, GenAI SDK, and text/image capabilities via MediaPipe.

Google AI Developers

115,791 subscribers

16,639 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Gaming Bildung

Anya Rossi• Live Now

Private livecam show

9 Kommentare

Profilbild von Google AI Developers

Google AI Developersvor 1 Jahr

Check out the blog for more details ↓

Profilbild von ksminnovation

ksminnovationvor 1 Jahr

AI is transforming healthcare! A KSM-led study shows AI can detect Celiac disease 4 years earlier @TalPatalon @MedPredict

Profilbild von Albert@PepinoCapital

Albert@PepinoCapitalvor 1 Jahr

Related but can you add support for Google login (since it's a Google app?) so I'm logged in already when I try to download a model from @huggingface? Thank you

Profilbild von Sam Hocking

Sam Hockingvor 1 Jahr

No video/live video available yet. Seems really good!

Profilbild von Tsukuyomi

Tsukuyomivor 1 Jahr

on-device AI that gets smarter? sounds like a plot twist waiting to happen. let’s see how well it handles the chaos of reality.

Profilbild von Nicolas

Nicolasvor 1 Jahr

it does not seem to work with liteRT for web

Profilbild von KeySS Inc

KeySS Incvor 1 Jahr

Sounds cool! Can't wait to see how this AI performs better than my coffee maker on a Monday morning.

Profilbild von Vishal

Vishalvor 1 Jahr

On-device AI with multimodal skills is a smart move.

Profilbild von hamhamtar0xOwO

hamhamtar0xOwOvor 1 Jahr

@MirraTerminal good update

Ähnliche Videos

Let's go hands-on with #GeminiAI. Our newest AI model can reason across different types of inputs and outputs — like images and text. See Gemini's multimodal reasoning capabilities in action ↓

Let's go hands-on with #GeminiAI. Our newest AI model can reason across different types of inputs and outputs — like images and text. See Gemini's multimodal reasoning capabilities in action ↓

Google

1,005,910 Aufrufe • vor 2 Jahren

Build edge AI applications with Google tools using Hybrid LLMs. Learn how to combine the on-device capabilities of Gemma and cloud-based Gemini models for optimal performance and privacy.

Build edge AI applications with Google tools using Hybrid LLMs. Learn how to combine the on-device capabilities of Gemma and cloud-based Gemini models for optimal performance and privacy.

Google AI Developers

15,365 Aufrufe • vor 1 Jahr

✨ Introducing Gemma 3n, available in early preview today. The model uses a cutting-edge architecture optimized for mobile on-device usage. It brings multimodality, super fast inference, and more.

✨ Introducing Gemma 3n, available in early preview today. The model uses a cutting-edge architecture optimized for mobile on-device usage. It brings multimodality, super fast inference, and more.

Google AI Developers

125,226 Aufrufe • vor 1 Jahr

Key features include: -Expanded multimodal understanding with video and audio input, alongside text and images -Developer-friendly sizes: 4B and 2B (and many in between!) -Optimized on-device efficiency for 1.5x faster response on mobile compared to Gemma 3 4B

Key features include: -Expanded multimodal understanding with video and audio input, alongside text and images -Developer-friendly sizes: 4B and 2B (and many in between!) -Optimized on-device efficiency for 1.5x faster response on mobile compared to Gemma 3 4B

Google AI Developers

11,841 Aufrufe • vor 1 Jahr

Unlock local, agentic workflows with Gemma 4 12B and Google AI Edge, directly on your laptop. Experience 100% on-device AI: • Generate code in AI Edge Gallery (new to Mac) • Dictate and edit text via AI Edge Eloquent (new to Mac) • Serve Gemma 4 12B locally with LiteRT-LM Dive in:

Unlock local, agentic workflows with Gemma 4 12B and Google AI Edge, directly on your laptop. Experience 100% on-device AI: • Generate code in AI Edge Gallery (new to Mac) • Dictate and edit text via AI Edge Eloquent (new to Mac) • Serve Gemma 4 12B locally with LiteRT-LM Dive in:

Google for Developers

139,821 Aufrufe • vor 1 Monat

Gemma 4 31B is now available in Public Preview on Cerebras. Our first multimodal model runs at over 1,800 tokens/s for ultra-fast image and text workflows. Give it a try:

Gemma 4 31B is now available in Public Preview on Cerebras. Our first multimodal model runs at over 1,800 tokens/s for ultra-fast image and text workflows. Give it a try:

Cerebras

269,017 Aufrufe • vor 19 Tagen

Biggest news from Google I/O - Gemma 3n is out 🔥 > True multimodal - Audio, Image, Video AND Text in > Can be used in 4B and 2B sizes > Supports 140 languages > Even works on CPU w/ LiteRT (on Hugging Face) > Optimised for on-device 😍 BONUS: Coming to other open source libraries near you soon ;)

Biggest news from Google I/O - Gemma 3n is out 🔥 > True multimodal - Audio, Image, Video AND Text in > Can be used in 4B and 2B sizes > Supports 140 languages > Even works on CPU w/ LiteRT (on Hugging Face) > Optimised for on-device 😍 BONUS: Coming to other open source libraries near you soon ;)

Vaibhav (VB) Srivastav

62,751 Aufrufe • vor 1 Jahr

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Google

1,897,285 Aufrufe • vor 8 Monaten

Friendly reminder that Google has an official app to run Gemma 4 on your phone. - 100% open source - Fully offline and private - Multimodal with text/audio/image - Works with Gemma E4B and E2B And the app is available on both iOS and Android. Steps and download below

Friendly reminder that Google has an official app to run Gemma 4 on your phone. - 100% open source - Fully offline and private - Multimodal with text/audio/image - Works with Gemma E4B and E2B And the app is available on both iOS and Android. Steps and download below

Paul Couvert

725,268 Aufrufe • vor 3 Monaten

Gemma 4 12B dropped today. Apache 2.0, multimodal: text, image, audio, and video. 256K context, built-in thinking, native tool calling. Running on Red Hat OpenShift AI with vLLM on Day 0:

Gemma 4 12B dropped today. Apache 2.0, multimodal: text, image, audio, and video. 256K context, built-in thinking, native tool calling. Running on Red Hat OpenShift AI with vLLM on Day 0:

Red Hat AI

15,968 Aufrufe • vor 1 Monat

At Google I/O, I sat down with Omar Sanseviero and 👩‍💻 Paige Bailey from Google DeepMind to talk about Gemma, open models, AI Studio, on-device AI, sovereign AI and the future of AI development. A great conversation on how building with AI is becoming more open, local and accessible.

At Google I/O, I sat down with Omar Sanseviero and 👩‍💻 Paige Bailey from Google DeepMind to talk about Gemma, open models, AI Studio, on-device AI, sovereign AI and the future of AI development. A great conversation on how building with AI is becoming more open, local and accessible.

Chubby♨️

13,741 Aufrufe • vor 1 Monat

Introducing Veo 3.1 Lite, now available via the Gemini API and Google AI Studio. Veo 3.1 Lite supports both Text-to-Video and Image-to-Video, and is less than half the cost of Veo 3.1 Fast.

Introducing Veo 3.1 Lite, now available via the Gemini API and Google AI Studio. Veo 3.1 Lite supports both Text-to-Video and Image-to-Video, and is less than half the cost of Veo 3.1 Fast.

Google AI Developers

150,179 Aufrufe • vor 3 Monaten

Image captioning spots objects, but can it spot 𝘵𝘩𝘦 𝘷𝘪𝘣𝘦? 🤔 The "Caption This" demo in Google AI Studio showcases 2.5 Pro's upgraded creative capabilities and deep visual understanding to decode images and create contextual, witty captions.

Image captioning spots objects, but can it spot 𝘵𝘩𝘦 𝘷𝘪𝘣𝘦? 🤔 The "Caption This" demo in Google AI Studio showcases 2.5 Pro's upgraded creative capabilities and deep visual understanding to decode images and create contextual, witty captions.

Google AI Developers

96,545 Aufrufe • vor 1 Jahr

In this demo, you’ll see Gemini 3 Flash’s frontier-level reasoning and multimodal capabilities on display. The model is able to simultaneously conduct complex geometric calculations while processing complex inputs (video and image). You can play around with the slingshot in Google AI Studio here and share your favorite examples below:

In this demo, you’ll see Gemini 3 Flash’s frontier-level reasoning and multimodal capabilities on display. The model is able to simultaneously conduct complex geometric calculations while processing complex inputs (video and image). You can play around with the slingshot in Google AI Studio here and share your favorite examples below:

Google AI

55,575 Aufrufe • vor 7 Monaten

$We’re shipping two major updates to streamline your creative workflow, allowing you to generate high-speed images with one model and then instantly animate them with the other—all at a fraction of the cost 🍌⚡️ 1️⃣ Introducing Nano Banana 2 Lite: Our fastest and most cost-efficient Gemini Image model yet delivers text-to-image outputs in under 4 seconds. Now available via the Gemini API and Google AI Studio, and rolling out soon across @NotebookLM, Google Flow, Google Gemini, Stitch by Google, Google Search and Google Photos. 2️⃣ Gemini Omni Flash in Public Preview: Our natively multimodal model for cost-efficient video generation and conversational editing. Now available via the Gemini API, Google AI Studio, and Gemini Enterprise Agent Platform so you can integrate the model into your workflow. While exciting on their own, the real magic happens when you build using these models together. Watch how our interior design demo integrates Nano Banana 2 Lite and Omni to instantly reimagine any space. Upload a photo, swipe through tailored design concepts, and see Omni bring the details to life in cinematic motion. Try out the demo app in AI Studio:$

We’re shipping two major updates to streamline your creative workflow, allowing you to generate high-speed images with one model and then instantly animate them with the other—all at a fraction of the cost 🍌⚡️ 1️⃣ Introducing Nano Banana 2 Lite: Our fastest and most cost-efficient Gemini Image model yet delivers text-to-image outputs in under 4 seconds. Now available via the Gemini API and Google AI Studio, and rolling out soon across @NotebookLM, Google Flow, Google Gemini, Stitch by Google, Google Search and Google Photos. 2️⃣ Gemini Omni Flash in Public Preview: Our natively multimodal model for cost-efficient video generation and conversational editing. Now available via the Gemini API, Google AI Studio, and Gemini Enterprise Agent Platform so you can integrate the model into your workflow. While exciting on their own, the real magic happens when you build using these models together. Watch how our interior design demo integrates Nano Banana 2 Lite and Omni to instantly reimagine any space. Upload a photo, swipe through tailored design concepts, and see Omni bring the details to life in cinematic motion. Try out the demo app in AI Studio:

Google AI

119,113 Aufrufe • vor 18 Tagen

🔥 Google Gemini 2.0 Flash is crazy good at pointing. I was over engineering before but now I'm just gonna bet on model capabilities. This is a demo of an AI cursor explaining a diagram on tldraw with just a prompt and an image. Streaming is also simple with Vercel AI SDK.

🔥 Google Gemini 2.0 Flash is crazy good at pointing. I was over engineering before but now I'm just gonna bet on model capabilities. This is a demo of an AI cursor explaining a diagram on tldraw with just a prompt and an image. Streaming is also simple with Vercel AI SDK.

Sriraam

187,185 Aufrufe • vor 1 Jahr

See Native Audio in action 🤠🦊 Our "Mumble Jumble" demo in Google AI Studio showcases the Live API's advanced voice capabilities: natural flow, distinct tone, emotion, and multilingual support.

See Native Audio in action 🤠🦊 Our "Mumble Jumble" demo in Google AI Studio showcases the Live API's advanced voice capabilities: natural flow, distinct tone, emotion, and multilingual support.

Google AI Developers

22,835 Aufrufe • vor 1 Jahr

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 Aufrufe • vor 10 Monaten

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,704 Aufrufe • vor 10 Monaten