Загрузка видео...

Не удалось загрузить видео

На главную

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to...

192,766 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля Siddharth Sharma
Siddharth Sharma2 лет назад

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Фото профиля Siddharth Sharma
Siddharth Sharma2 лет назад

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Фото профиля Siddharth Sharma
Siddharth Sharma2 лет назад

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Фото профиля Sambhav Gupta
Sambhav Gupta2 лет назад

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Фото профиля Ishan Khare
Ishan Khare2 лет назад

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Фото профиля Soham Konar
Soham Konar2 лет назад

Great work! Does it work across multiple monitors? Could be a game changer if so

Фото профиля whistle
whistle2 лет назад

was waiting for something exactly like this!! what’s the token usage look like?

Фото профиля Saïd Aitmbarek
Saïd Aitmbarek2 лет назад

looks awesome, would love to showcase ambient on

Фото профиля Salman
Salman2 лет назад

Looks super useful . How does it determine context if you have multiple screens?

Фото профиля Rahul bansal 👀
Rahul bansal 👀2 лет назад

This is cool. I built a tool for taking to model via voice

Похожие видео

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 просмотров • 1 год назад