Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing ambientGPT: an open-source and multimodal MacOS foundation model GUI Run GPT-4o and open-source models with full ambient knowledge of your screen. Foundation models have long been confined to the browser. With ambientGPT, your screen context is directly inferred as part of the query, ensuring you never need to... show more

Siddharth Sharma

2,695 subscribers

192,851 просмотров • 2 лет назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

Unlike OpenAI’s desktop app where you must provide a screenshot or upload a file, the context from your screen is automatically parsed. We also provide the ability to run secure local models like Gemma and Phi-3 multimodal from our interface. Due to the local model sizes, at least 16 GB RAM would be preferred. This was possible via the apple MLX library - shoutout to @awnihannun, @reach_vb

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

ambientGPT is open-source and we plan to integrate vllm and ollama to provide more extensive inference hosting abilities with our multimodal GUI. We also aim to release ambientGPT on the apple app store soon.

Фото профиля Siddharth Sharma

Siddharth Sharma2 лет назад

thanks to @mihiranan (mr. clutch) for his help with the demo once again!

Фото профиля Sambhav Gupta

Sambhav Gupta2 лет назад

Very cool but chatgpt mac app will have the ability to see everything on screen when they release the update ..

Фото профиля Ishan Khare

Ishan Khare2 лет назад

so what does the “full ambient knowledge of your screen” entail privacy wise?? why would someone trust this if unwanted data is being captured forever and sent to openai?

Фото профиля Soham Konar

Soham Konar2 лет назад

Great work! Does it work across multiple monitors? Could be a game changer if so

Фото профиля whistle

whistle2 лет назад

was waiting for something exactly like this!! what’s the token usage look like?

Фото профиля Saïd Aitmbarek

Saïd Aitmbarek2 лет назад

looks awesome, would love to showcase ambient on

Фото профиля Salman

Salman2 лет назад

Looks super useful . How does it determine context if you have multiple screens?

Фото профиля Rahul bansal 👀

Rahul bansal 👀2 лет назад

This is cool. I built a tool for taking to model via voice

Похожие видео

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the context of your projects and allowing it to coordinate edits across multiple files! More videos and a link to github below:

Scott Swingle

292,711 просмотров • 3 лет назад

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from Cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Run state-of-the-art RAG applications locally on your computer with ollama and use all the fantastic open-source models like llama3, msk's awesome models, or Command R from Cohere With Verba 1.0, we put it all in your hands 🙌 Get on board for a wild open-source ride, we're bridging any moat as open-source is here to win

Philip Vollet

41,450 просмотров • 2 лет назад

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

This Cursor Extension is awesome Accurate tweaking of UI was always a struggle, But stagewise allows you to bring full context to Cursor, just point and command: 1. Directly choose specific elements in browser 2. Send to Cursor with full context And it's open source

Jason Zhou

93,308 просмотров • 1 год назад

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

VITA Towards Open-Source Interactive Omni Multimodal LLM discuss: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research.

AK

23,958 просмотров • 1 год назад

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

LM Studio is the most popular way to run open-source LLMs on your own hardware. Your Hermes Agent now runs natively on LM Studio: auto-discovering your models, loading them on demand with the right context size, and using the right reasoning level for each model.

Nous Research

186,036 просмотров • 2 месяцев назад

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

The classic sovereign model puzzle is well solved here. 👌🏼 INHO: Get open-source models, optimize and upgrade them to Indian level, and voila! You got foundation models owned and controlled by Indian entities.

Vijay Shekhar Sharma

49,736 просмотров • 6 месяцев назад

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

It started as a small idea to connect AI models to developer workflows. It turned into one of the fastest-growing open standards in the industry. 🚀 Now, the Model Context Protocol is officially joining the The Linux Foundation. Hear from the engineers and maintainers of GitHub, Microsoft, Anthropic, and OpenAI on the journey from day zero to now. 👇

GitHub

44,938 просмотров • 7 месяцев назад

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Thrilled to see Amazon Web Services making a major contribution to the open source AI community with the launch of the Strands Agents, an open source AI agents SDK! The core of Strands is the simple agentic loop that connects the model and tools together, like the two strands of DNA. This model-driven approach to agent building eliminates the need for complex agent orchestration by embracing the capabilities of state-of-the-art models to plan, chain thoughts, call tools, and reflect. Providing open source tools and interoperability with open source protocols is an important part of our strategy to enable an agentic future. Can't wait to see what you build with Strands!

Swami Sivasubramanian

32,185 просмотров • 1 год назад

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Open AI released Operator, an agent that can use the browser to perform and automate tasks for you! I have built an Open Source version of Operator using Browser Use, running locally on your computer. 100% Open Source

Sumanth

62,485 просмотров • 1 год назад

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Qwen just published the 'Thinking' variant of this model 🔥 So you can run a model EVEN MORE powerful than GPT-4o locally! - Still only 3B active parameters - Open source license - 256k context window extendable to 1M - Strong in math, science and coding Details below ↓

Paul Couvert

106,261 просмотров • 11 месяцев назад

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

People are overlooking Google Gemini Realtime models for computer use It gave me sub 100ms latency with computer use. It has a much larger context window and is much cheaper as well Combine that with local OCR and local screen detection model based on Omniparser by Microsoft it works under 100ms action taking when combined with Cua I also put in a harness for Nous Research Hermes with it. You can access it all at your tip of your cursor. You can draw on your screen to give your agents a context And I am making it Open Source! Link in the Comments Sundar Pichai Min-Liang Tan Mojtaba Seyedhosseini

Milind S

38,054 просмотров • 1 месяц назад

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

We’re excited to partner with OpenAI to launch their new open source models natively on Databricks! gpt-oss sets a new standard of quality for open language models, supporting advanced reasoning with the transparency, flexibility and control enterprises need. Running on Databricks, the gpt-oss models connect securely to your data and scale with built-in governance, and expand what you can build and do with GenAI. Try both the 20B and 120B today in the Mosaic AI Playground.

Databricks

10,132 просмотров • 11 месяцев назад

SITUATION EXPLAINED: What happens to open source when the US restricts access to frontier AI models? We asked Adrian Dittmann: "China is releasing these models [open source] to gain international dominance, people will work on these things and say, 'Hey, maybe I can contribute to this if it's open source.' This is the thing that actually gets them talent." "If you're not allowed into the next GPT release or you're not allowed to use Fable, then what are you gonna do? You're stuck with the existing stuff, or you will have to use an open source model that might have subtly better capabilities." "The United States will catch up to open source in some form as well. They seem to just be maximizing product creation." "The problem with open source models is you have to be kind of a nerd in order to use them properly. Consumer facing, the effects will be minimal because the average person doesn't care about open source models. They only care whether or not they can do a quick search with Gemini or ChatGPT."

SITUATION EXPLAINED: What happens to open source when the US restricts access to frontier AI models? We asked Adrian Dittmann: "China is releasing these models [open source] to gain international dominance, people will work on these things and say, 'Hey, maybe I can contribute to this if it's open source.' This is the thing that actually gets them talent." "If you're not allowed into the next GPT release or you're not allowed to use Fable, then what are you gonna do? You're stuck with the existing stuff, or you will have to use an open source model that might have subtly better capabilities." "The United States will catch up to open source in some form as well. They seem to just be maximizing product creation." "The problem with open source models is you have to be kind of a nerd in order to use them properly. Consumer facing, the effects will be minimal because the average person doesn't care about open source models. They only care whether or not they can do a quick search with Gemini or ChatGPT."

MTS

14,406 просмотров • 28 дней назад

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Open-source LLMs are getting really good. They’re not as powerful as GPT-4, but they’re improving quickly and worth experimenting with. If you want to run AI models like Mistral-7B on your laptop this is the easiest way to do it.

Mckay Wrigley

575,594 просмотров • 2 лет назад

Introducing Stack Overflow searches with GPT-4o in VSCode This new feature allows you to use the capabilities of /StackOverflow find related posts ensuring a far more accurate answer to your question. There's no waiting list! 🤩 You can use this today! All you have to do is install the CodeGPT extension in VSCode and use OpenAI models (GPT-4o)

Introducing Stack Overflow searches with GPT-4o in VSCode This new feature allows you to use the capabilities of /StackOverflow find related posts ensuring a far more accurate answer to your question. There's no waiting list! 🤩 You can use this today! All you have to do is install the CodeGPT extension in VSCode and use OpenAI models (GPT-4o)

Daniel San

20,105 просмотров • 2 лет назад

You can run local AI models up to 120B without a $10,000 Mac Studio This phone-sized device is your own server for open-source models. 100% local and private. And you can use it to: - Power an agent like OpenClaw 24/7 - Completely replace a chatbot - Literally anything that requires an API It’s called Tiiny, and, once again, you can run the latest open-source models on it. Link below

You can run local AI models up to 120B without a $10,000 Mac Studio This phone-sized device is your own server for open-source models. 100% local and private. And you can use it to: - Power an agent like OpenClaw 24/7 - Completely replace a chatbot - Literally anything that requires an API It’s called Tiiny, and, once again, you can run the latest open-source models on it. Link below

Paul Couvert

41,185 просмотров • 4 месяцев назад

With today’s launch of our Llama 3.1 collection of models we’re making history with the largest and most capable open source AI model ever released. 128K context length, multilingual support, and new safety tools. Download 405B and our improved 8B & 70B here.

With today’s launch of our Llama 3.1 collection of models we’re making history with the largest and most capable open source AI model ever released. 128K context length, multilingual support, and new safety tools. Download 405B and our improved 8B & 70B here.

Ahmad Al-Dahle

866,615 просмотров • 2 лет назад

AI that senses coding frustration, is this the future of learning? In our latest GitHub Podcast episode we talk to Angie Jones about Goose, Block’s open source AI agent and reference implementation of the Model Context Protocol (MCP).

AI that senses coding frustration, is this the future of learning? In our latest GitHub Podcast episode we talk to Angie Jones about Goose, Block’s open source AI agent and reference implementation of the Model Context Protocol (MCP).

GitHub

16,414 просмотров • 7 месяцев назад

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

Akshay 🚀

66,088 просмотров • 6 месяцев назад