Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to...

192,766 Aufrufe • vor 2 Jahren •via X (Twitter)

10 Kommentare

Profilbild von Siddharth Sharma
Siddharth Sharmavor 2 Jahren

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Profilbild von Siddharth Sharma
Siddharth Sharmavor 2 Jahren

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Profilbild von Siddharth Sharma
Siddharth Sharmavor 2 Jahren

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Profilbild von Sambhav Gupta
Sambhav Guptavor 2 Jahren

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Profilbild von Ishan Khare
Ishan Kharevor 2 Jahren

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Profilbild von Soham Konar
Soham Konarvor 2 Jahren

Great work! Does it work across multiple monitors? Could be a game changer if so

Profilbild von whistle
whistlevor 2 Jahren

was waiting for something exactly like this!! what’s the token usage look like?

Profilbild von Saïd Aitmbarek
Saïd Aitmbarekvor 2 Jahren

looks awesome, would love to showcase ambient on

Profilbild von Salman
Salmanvor 2 Jahren

Looks super useful . How does it determine context if you have multiple screens?

Profilbild von Rahul bansal 👀
Rahul bansal 👀vor 2 Jahren

This is cool. I built a tool for taking to model via voice

Ähnliche Videos

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 Aufrufe • vor 1 Jahr