Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to... show more

Siddharth Sharma

2,703 subscribers

192,766 görüntüleme • 2 yıl önce •via X (Twitter)

Eğitim Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

Siddharth Sharma profil fotoğrafı

Siddharth Sharma2 yıl önce

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Siddharth Sharma profil fotoğrafı

Siddharth Sharma2 yıl önce

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Siddharth Sharma profil fotoğrafı

Siddharth Sharma2 yıl önce

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Sambhav Gupta profil fotoğrafı

Sambhav Gupta2 yıl önce

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Ishan Khare profil fotoğrafı

Ishan Khare2 yıl önce

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Soham Konar profil fotoğrafı

Soham Konar2 yıl önce

Great work! Does it work across multiple monitors? Could be a game changer if so

whistle profil fotoğrafı

whistle2 yıl önce

was waiting for something exactly like this!! what’s the token usage look like?

Saïd Aitmbarek profil fotoğrafı

Saïd Aitmbarek2 yıl önce

looks awesome, would love to showcase ambient on

Salman profil fotoğrafı

Salman2 yıl önce

Looks super useful . How does it determine context if you have multiple screens?

Rahul bansal 👀 profil fotoğrafı

Rahul bansal 👀2 yıl önce

This is cool. I built a tool for taking to model via voice

Benzer Videolar

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Introducing Ambient Context Aware AI on your computer. I have just scraped the whole Discord with it and made a summary of pain points ppl have Features: - Sees the whole screen ( even parts you need to scroll to) - long context window - Has memory Link:

Robert Lukoszko

15,383 görüntüleme • 1 yıl önce

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Scott Swingle

292,702 görüntüleme • 2 yıl önce

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Private AI browser with the OpenClaw agent on free local models Run your agent on Qwen, Gemma, or Nemotron directly in the browser Open source. Private. Runs on your local device

Sigma Browser

113,142 görüntüleme • 1 ay önce

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,375 görüntüleme • 2 yıl önce

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

Nous Research

184,759 görüntüleme • 1 ay önce

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

Jason Zhou

93,273 görüntüleme • 1 yıl önce

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 görüntüleme • 1 yıl önce

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

Vijay Shekhar Sharma

49,736 görüntüleme • 4 ay önce

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

GitHub

44,912 görüntüleme • 6 ay önce

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Swami Sivasubramanian

32,185 görüntüleme • 1 yıl önce

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)

Elias

629,495 görüntüleme • 1 yıl önce

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

Just built an infinite contextual radio for programming. Think lo-fi girl; but endless, and dynamically changes based on what’s on your screen. All locally run models (MagentaRT + InternVL3) Open source, on Github now.

LaurieWired

39,762 görüntüleme • 10 ay önce

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Sumanth

62,485 görüntüleme • 1 yıl önce

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Paul Couvert

106,261 görüntüleme • 10 ay önce

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

Databricks

10,086 görüntüleme • 10 ay önce

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

Milind S

37,681 görüntüleme • 15 gün önce

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

DeepSeek-V3 is live in AkashChat. This is the most capable open-source AI model available today — directly rivaling the benchmark performance of GPT-4o and 3.5 Sonnet.

Akash Network

47,412 görüntüleme • 1 yıl önce

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Mckay Wrigley

575,555 görüntüleme • 2 yıl önce

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

We built open-source Rewind Records your screen and: - Keeps you focused - Auto-adds tasks to your TODOs - Helps proactively Github live ⬇️

Nik Shevchenko

52,065 görüntüleme • 4 ay önce

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

You can literally have a full computer running right in your browser! Puter is an open-source internet OS, which lets you have the full computer inside the browser. - super fast - privacy-first personal cloud - Open-source - 30K+ GitHub stars 🌟 Link 🔗 🧵 👇

Python Space

131,330 görüntüleme • 1 yıl önce