Загрузка видео...
Не удалось загрузить видео
This assistant has 169 lines of code: • Gemini Flash • OpenAI Whisper • OpenAI TTS API • OpenCV GPT-4o is slower than Flash, more expensive, chatty, and very stubborn (it doesn't like to stick to my prompts). Next week, I'll post a step-by-step video on how to build this.
90,296 просмотров • 2 лет назад •via X (Twitter)
Комментарии: 10

The first request takes longer (warming up), but things work faster from that point. Few opportunities to improve this: 1. Stream answers from the model (instead of waiting for the full answer.) 2. Add the ability to interrupt the assistant. 3. Whisper running on GPU

Unfortunately, no local modal supports text+images (as far as I know,) so I'm stuck running online models. The TTS API (synthesizing text to audio) can also be replaced by a local version. I tried, but the available voices suck (too robotic), so I kept OpenAI's.

I wonder if OpenAI's assistant uses the people's API or if they have a special, secret, much faster version powering their app. I wouldn't be surprised if they have VIP access. They can have ++ bandwidth with the model for faster responses.

169 lines of code is what we used to have to just begin with coding 😁 great job 👌🏻

We've come far!

Time to put this into a plastic gadget and raise $100 million

I'm posting it for free online for those who want to raise the money. Remember me when you make it!

Out of curiosity, how did it know what “small text” you were referring to? There is also some small text above the small figure

No idea. It picked one and read it. On a different test, I pointed at a line of text (the top one) and asked to read that one and it worked.

Amazing work @svpino


