Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to...

192,766 görüntüleme • 2 yıl önce •via X (Twitter)

10 Yorum

Siddharth Sharma profil fotoğrafı
Siddharth Sharma2 yıl önce

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Siddharth Sharma profil fotoğrafı
Siddharth Sharma2 yıl önce

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Siddharth Sharma profil fotoğrafı
Siddharth Sharma2 yıl önce

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Sambhav Gupta profil fotoğrafı
Sambhav Gupta2 yıl önce

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Ishan Khare profil fotoğrafı
Ishan Khare2 yıl önce

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Soham Konar profil fotoğrafı
Soham Konar2 yıl önce

Great work! Does it work across multiple monitors? Could be a game changer if so

whistle profil fotoğrafı
whistle2 yıl önce

was waiting for something exactly like this!! what’s the token usage look like?

Saïd Aitmbarek profil fotoğrafı
Saïd Aitmbarek2 yıl önce

looks awesome, would love to showcase ambient on

Salman profil fotoğrafı
Salman2 yıl önce

Looks super useful . How does it determine context if you have multiple screens?

Rahul bansal 👀 profil fotoğrafı
Rahul bansal 👀2 yıl önce

This is cool. I built a tool for taking to model via voice

Benzer Videolar

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 görüntüleme • 1 yıl önce