Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

This assistant has 169 lines of code: • Gemini Flash • OpenAI Whisper • OpenAI TTS API • OpenCV GPT-4o is slower than Flash, more expensive, chatty, and very stubborn (it doesn't like to stick to my prompts). Next week, I'll post a step-by-step video on how to build this.

Santiago

416,758 subscribers

90,296 просмотров • 2 лет назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Santiago

Santiago2 лет назад

The first request takes longer (warming up), but things work faster from that point. Few opportunities to improve this: 1. Stream answers from the model (instead of waiting for the full answer.) 2. Add the ability to interrupt the assistant. 3. Whisper running on GPU

Фото профиля Santiago

Santiago2 лет назад

Unfortunately, no local modal supports text+images (as far as I know,) so I'm stuck running online models. The TTS API (synthesizing text to audio) can also be replaced by a local version. I tried, but the available voices suck (too robotic), so I kept OpenAI's.

Фото профиля Santiago

Santiago2 лет назад

I wonder if OpenAI's assistant uses the people's API or if they have a special, secret, much faster version powering their app. I wouldn't be surprised if they have VIP access. They can have ++ bandwidth with the model for faster responses.

Фото профиля Hesam

Hesam2 лет назад

169 lines of code is what we used to have to just begin with coding 😁 great job 👌🏻

Фото профиля Santiago

Santiago2 лет назад

We've come far!

Фото профиля BluFor

BluFor2 лет назад

Time to put this into a plastic gadget and raise $100 million

Фото профиля Santiago

Santiago2 лет назад

I'm posting it for free online for those who want to raise the money. Remember me when you make it!

Фото профиля Nicolaj B Andersen

Nicolaj B Andersen2 лет назад

Out of curiosity, how did it know what “small text” you were referring to? There is also some small text above the small figure

Фото профиля Santiago

Santiago2 лет назад

No idea. It picked one and read it. On a different test, I pointed at a line of text (the top one) and asked to read that one and it worked.

Фото профиля ZAZO

ZAZO2 лет назад

Amazing work @svpino

Похожие видео

The same day OpenAI announced GPT-4o, we made the model available for testing on the Azure OpenAI Service. Today, we are excited to announce full API access to GPT-4o.

The same day OpenAI announced GPT-4o, we made the model available for testing on the Azure OpenAI Service. Today, we are excited to announce full API access to GPT-4o.

Microsoft

215,618 просмотров • 2 лет назад

Build anything in the terminal with improved agentic coding in Gemini 3 Flash! Watch this demo to see how a rapid prototyping tool doesn't have to compromise code quality. Learn more about using Gemini 3 Flash with Gemini CLI →

Build anything in the terminal with improved agentic coding in Gemini 3 Flash! Watch this demo to see how a rapid prototyping tool doesn't have to compromise code quality. Learn more about using Gemini 3 Flash with Gemini CLI →

Google Cloud Tech

38,036 просмотров • 5 месяцев назад

Excited to share a Google DeepMind Gemini 2.0 Flash Image Generation and Editing Quickstart. We build a Next.js reference app on how to use the new image editing feature of Gemini 2.0 Flash. Demo to test ⬇️ > Generate images from text prompts using Gemini 2.0 Flash > Or upload an image and edit it using prompts

Excited to share a Google DeepMind Gemini 2.0 Flash Image Generation and Editing Quickstart. We build a Next.js reference app on how to use the new image editing feature of Gemini 2.0 Flash. Demo to test ⬇️ > Generate images from text prompts using Gemini 2.0 Flash > Or upload an image and edit it using prompts

Philipp Schmid

52,406 просмотров • 1 год назад

Build an AI Research Planner & Executor Agent using Google Interactions API. Combine Gemini 3 Pro, Gemini 3 Flash and Google Deep Research Agent in a single flow. 100% Opensource code with step-by-step tutorial.

Build an AI Research Planner & Executor Agent using Google Interactions API. Combine Gemini 3 Pro, Gemini 3 Flash and Google Deep Research Agent in a single flow. 100% Opensource code with step-by-step tutorial.

Shubham Saboo

43,291 просмотров • 6 месяцев назад

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

OpenAI

488,957 просмотров • 11 месяцев назад

Inspired by this tweet, I built my own locally running typing assistant with Ollama and Mistral 7B. It only took ~100 lines of Python code and works really well! I also created a video with step-by-step explanations: - Code: - Blog post: - Coding tutorial:

Inspired by this tweet, I built my own locally running typing assistant with Ollama and Mistral 7B. It only took ~100 lines of Python code and works really well! I also created a video with step-by-step explanations: - Code: - Blog post: - Coding tutorial:

Patrick Loeber

161,584 просмотров • 2 лет назад

Following our Gemini 3.5 Flash launch at #GoogleIO, check out this demo of what it can do right inside the Gemini app. Watch 3.5 Flash build an interactive circuit helper, outputting a step-by-step physical build guide alongside a working simulation. It’s a great example of how the Gemini app can help students learn visually, breaking down new or complex concepts and guiding them through hands-on subjects interactively. 3.5 Flash is available globally today!

Following our Gemini 3.5 Flash launch at #GoogleIO, check out this demo of what it can do right inside the Gemini app. Watch 3.5 Flash build an interactive circuit helper, outputting a step-by-step physical build guide alongside a working simulation. It’s a great example of how the Gemini app can help students learn visually, breaking down new or complex concepts and guiding them through hands-on subjects interactively. 3.5 Flash is available globally today!

koray kavukcuoglu

42,869 просмотров • 1 месяц назад

THIS IS DOPE. I used the new OpenAI Vision API + TTS to commentate a League of Legends game!!

THIS IS DOPE. I used the new OpenAI Vision API + TTS to commentate a League of Legends game!!

peter! 🥷

3,703,513 просмотров • 2 лет назад

OpenAI is preparing ready to launch their new and next generation of models. They are about to revolutionizing science & economy. "A very significant step forward" compared to their current models. Imho this is preparing people for the launch, very very soon, maybe even this week.

OpenAI is preparing ready to launch their new and next generation of models. They are about to revolutionizing science & economy. "A very significant step forward" compared to their current models. Imho this is preparing people for the launch, very very soon, maybe even this week.

Chubby♨️

143,415 просмотров • 2 месяцев назад

I joined OpenAI at the beginning of the year -- partly because I was excited about the possibility of better voice interaction with computers. So it was *especially* amazing to work with the team here on the gpt-4o model launch. It's hard to grok until you try it how big of a step forward this is for human/computer voice communication. It feels different -- it's obvious that the model understands you better than anything you've used before. This is my favorite demo of gpt-4o, it demonstrates not just good stt/tts, but actual conversational ability over full, realistic phone call. It's so obvious how useful this will be!

I joined OpenAI at the beginning of the year -- partly because I was excited about the possibility of better voice interaction with computers. So it was especially amazing to work with the team here on the gpt-4o model launch. It's hard to grok until you try it how big of a step forward this is for human/computer voice communication. It feels different -- it's obvious that the model understands you better than anything you've used before. This is my favorite demo of gpt-4o, it demonstrates not just good stt/tts, but actual conversational ability over full, realistic phone call. It's so obvious how useful this will be!

Peter Bakkum

144,441 просмотров • 2 лет назад

🚀Introducing Code Arena: the next generation of live coding evals for frontier AI models. Built to test how models plan, scaffold, debug, and build real web apps step-by-step. Try Claude, GPT-5, GLM-4.6 and Gemini in Code Arena today!

🚀Introducing Code Arena: the next generation of live coding evals for frontier AI models. Built to test how models plan, scaffold, debug, and build real web apps step-by-step. Try Claude, GPT-5, GLM-4.6 and Gemini in Code Arena today!

Arena.ai

327,903 просмотров • 7 месяцев назад

Build a Vision RAG app with Gemini 2.5 Flash and Cohere Multimodal Embedding that can understand images and diagrams in PDF. 100% Opensource code with step-by-step tutorial.

Build a Vision RAG app with Gemini 2.5 Flash and Cohere Multimodal Embedding that can understand images and diagrams in PDF. 100% Opensource code with step-by-step tutorial.

Shubham Saboo

60,638 просмотров • 1 год назад

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

Ed Krassenstein

21,605 просмотров • 2 лет назад

I'm a kinesthetic learner 🎓, so I often build basic versions of rad AI tools to grasp new concepts. This week's project is (unsurprisingly) giving an AI browser access. Sneak peek with Cloudflare Developers Browser Rendering + OpenAI GPT-4o 👇 Full video and code this weekend!

I'm a kinesthetic learner 🎓, so I often build basic versions of rad AI tools to grasp new concepts. This week's project is (unsurprisingly) giving an AI browser access. Sneak peek with Cloudflare Developers Browser Rendering + OpenAI GPT-4o 👇 Full video and code this weekend!

Ricky

11,068 просмотров • 1 год назад

Gemini 2.5 Flash can control a browser! Excited to share Gemini Browser Agent, a simple Python script example on how to use Google DeepMind Gemini 2.5 Flash and Browser Use to act as general assistant! 🤯 Usage Examples: 1⃣ Single Query Mode: `python scripts/gemini-browser-use.py --url --query "Summarize the key features of Gemini 2.5 Flash."` 2⃣Interactive Mode: Start an interactive session, optionally with a starting URL. `python scripts/gemini-browser-use.py` Command-line options: --model: The Gemini model to use (default: gemini-2.5-flash-preview-04-17) --headless: Run the browser in headless mode --url: Starting URL for the browser to navigate to before processing the query --query: Run a single query and exit (instead of interactive mode) Time to build a replication of Manus and OpenAI Operator powered by Gemini 2.5. Code below ⬇️

Gemini 2.5 Flash can control a browser! Excited to share Gemini Browser Agent, a simple Python script example on how to use Google DeepMind Gemini 2.5 Flash and Browser Use to act as general assistant! 🤯 Usage Examples: 1⃣ Single Query Mode: `python scripts/gemini-browser-use.py --url --query "Summarize the key features of Gemini 2.5 Flash."` 2⃣Interactive Mode: Start an interactive session, optionally with a starting URL. `python scripts/gemini-browser-use.py` Command-line options: --model: The Gemini model to use (default: gemini-2.5-flash-preview-04-17) --headless: Run the browser in headless mode --url: Starting URL for the browser to navigate to before processing the query --query: Run a single query and exit (instead of interactive mode) Time to build a replication of Manus and OpenAI Operator powered by Gemini 2.5. Code below ⬇️

Philipp Schmid

105,308 просмотров • 1 год назад

What if you could talk to your Telegram bot and it actually talked back? Learn how built a voice-enabled Telegram bot with the Gemini Interactions API in ~400 lines of Python. Send a voice note in any language, Gemini understands the audio and replies with text and a spoken voice message. Uses: - Gemini 3.1 Flash Lite for reasoning, 3.1 Flash TTS for speech - Interactions API handles multi-turn memory server-side - Native audio input, no transcription step needed - Deploys to Cloud Run with scale-to-zero Awesome work by Thor 雷神 ⚡️. 🤗

What if you could talk to your Telegram bot and it actually talked back? Learn how built a voice-enabled Telegram bot with the Gemini Interactions API in ~400 lines of Python. Send a voice note in any language, Gemini understands the audio and replies with text and a spoken voice message. Uses: - Gemini 3.1 Flash Lite for reasoning, 3.1 Flash TTS for speech - Interactions API handles multi-turn memory server-side - Native audio input, no transcription step needed - Deploys to Cloud Run with scale-to-zero Awesome work by Thor 雷神 ⚡️. 🤗

Philipp Schmid

11,156 просмотров • 2 месяцев назад

I made two AI Agents play tic-tac-toe with each other while the third agent watched as a judge. Agent X powered by Gemini Flash 2.0 Agent O powered by DeepSeek v3 Judge Agent powered by GPT-4o 100% Opensource code with step-by-step tutorial.

I made two AI Agents play tic-tac-toe with each other while the third agent watched as a judge. Agent X powered by Gemini Flash 2.0 Agent O powered by DeepSeek v3 Judge Agent powered by GPT-4o 100% Opensource code with step-by-step tutorial.

Shubham Saboo

118,528 просмотров • 1 год назад

If you're beginner in UI/UX Design and don't know how to build a design system, this post is for you. I'll share a website that provides a step by step guide on how to create a design system Website → Each step includes a brief description and recommended resources. Created by roadmap.sh

If you're beginner in UI/UX Design and don't know how to build a design system, this post is for you. I'll share a website that provides a step by step guide on how to create a design system Website → Each step includes a brief description and recommended resources. Created by roadmap.sh

Abraham John 🦄🦓

27,398 просмотров • 1 год назад