Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to... show more

Siddharth Sharma

2,703 subscribers

192,766 Aufrufe • vor 2 Jahren •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Siddharth Sharma

Siddharth Sharmavor 2 Jahren

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Profilbild von Siddharth Sharma

Siddharth Sharmavor 2 Jahren

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Profilbild von Siddharth Sharma

Siddharth Sharmavor 2 Jahren

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Profilbild von Sambhav Gupta

Sambhav Guptavor 2 Jahren

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Profilbild von Ishan Khare

Ishan Kharevor 2 Jahren

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Profilbild von Soham Konar

Soham Konarvor 2 Jahren

Great work! Does it work across multiple monitors? Could be a game changer if so

Profilbild von whistle

whistlevor 2 Jahren

was waiting for something exactly like this!! what’s the token usage look like?

Profilbild von Saïd Aitmbarek

Saïd Aitmbarekvor 2 Jahren

looks awesome, would love to showcase ambient on

Profilbild von Salman

Salmanvor 2 Jahren

Looks super useful . How does it determine context if you have multiple screens?

Profilbild von Rahul bansal 👀

Rahul bansal 👀vor 2 Jahren

This is cool. I built a tool for taking to model via voice

Ähnliche Videos

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Robert Lukoszko

15,383 Aufrufe • vor 1 Jahr

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Scott Swingle

292,702 Aufrufe • vor 2 Jahren

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Sigma Browser

113,142 Aufrufe • vor 1 Monat

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,375 Aufrufe • vor 2 Jahren

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

Nous Research

184,759 Aufrufe • vor 1 Monat

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

Jason Zhou

93,273 Aufrufe • vor 1 Jahr

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 Aufrufe • vor 1 Jahr

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

Vijay Shekhar Sharma

49,736 Aufrufe • vor 4 Monaten

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

GitHub

44,912 Aufrufe • vor 6 Monaten

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Swami Sivasubramanian

32,185 Aufrufe • vor 1 Jahr

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Elias

629,456 Aufrufe • vor 1 Jahr

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

LaurieWired

39,762 Aufrufe • vor 10 Monaten

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Sumanth

62,485 Aufrufe • vor 1 Jahr

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Paul Couvert

106,261 Aufrufe • vor 10 Monaten

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

Databricks

10,068 Aufrufe • vor 10 Monaten

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

Milind S

37,681 Aufrufe • vor 14 Tagen

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

Akash Network

47,412 Aufrufe • vor 1 Jahr

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Mckay Wrigley

575,555 Aufrufe • vor 2 Jahren

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

Nik Shevchenko

52,065 Aufrufe • vor 4 Monaten

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

Python Space

131,330 Aufrufe • vor 1 Jahr