Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to... show more

Siddharth Sharma

2,703 subscribers

192,766 просмотров • 2 лет назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Фото профиля Sambhav Gupta

Sambhav Gupta2 лет назад

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Фото профиля Ishan Khare

Ishan Khare2 лет назад

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Фото профиля Soham Konar

Soham Konar2 лет назад

Great work! Does it work across multiple monitors? Could be a game changer if so

Фото профиля whistle

whistle2 лет назад

was waiting for something exactly like this!! what’s the token usage look like?

Фото профиля Saïd Aitmbarek

Saïd Aitmbarek2 лет назад

looks awesome, would love to showcase ambient on

Фото профиля Salman

Salman2 лет назад

Looks super useful . How does it determine context if you have multiple screens?

Фото профиля Rahul bansal 👀

Rahul bansal 👀2 лет назад

This is cool. I built a tool for taking to model via voice

Похожие видео

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Robert Lukoszko

15,383 просмотров • 1 год назад

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Scott Swingle

292,702 просмотров • 2 лет назад

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Sigma Browser

113,142 просмотров • 1 месяц назад

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,375 просмотров • 2 лет назад

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

Nous Research

184,759 просмотров • 1 месяц назад

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

Jason Zhou

93,273 просмотров • 1 год назад

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 просмотров • 1 год назад

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

Vijay Shekhar Sharma

49,736 просмотров • 4 месяцев назад

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

GitHub

44,912 просмотров • 6 месяцев назад

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Swami Sivasubramanian

32,185 просмотров • 1 год назад

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Elias

629,495 просмотров • 1 год назад

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

LaurieWired

39,762 просмотров • 10 месяцев назад

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Sumanth

62,485 просмотров • 1 год назад

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Paul Couvert

106,261 просмотров • 10 месяцев назад

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

Databricks

10,086 просмотров • 10 месяцев назад

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

Milind S

37,839 просмотров • 17 дней назад

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

Akash Network

47,412 просмотров • 1 год назад

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Mckay Wrigley

575,555 просмотров • 2 лет назад

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

Nik Shevchenko

52,065 просмотров • 4 месяцев назад

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

Python Space

131,330 просмотров • 1 год назад