Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Imagine if language models could tap into the app ecosystem of your iPhone. Would the need for plugins and assistants become obsolete if we simply allowed a model to orchestrate our existing (and many years robust) user interfaces? This demonstrates the extent to which GPT-4V excels as a Generalist... show more

Francesco

6,591 subscribers

30,819 Aufrufe • vor 2 Jahren •via X (Twitter)

Wissenschaft & Technologie Nachrichten & Politik Bildung

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Francesco

Francescovor 2 Jahren

Over the last few months, I've been dabbling with using vision models not just in one area, but across web, desktop, and mobile platforms. It's become clear to me that there's a lot of untapped potential in these technologies. The closer we get them to our everyday gadgets, the better we can make use of what they have to offer. This shift could make our connection with AI feel more intuitive and seamless, moving away from a chatgpt-esque interaction with AI assistants.

Profilbild von Francesco

Francescovor 2 Jahren

Fibally got around to writing up my thoughts on UI-focused AI agents – it's not super deep, but it's filled with my takes and a bit of nerdy exploration. Slapped on my Medium hat for this one and dove right in.

Profilbild von Aditya P. Advani

Aditya P. Advanivor 2 Jahren

Consider joining, will be looking into remote control next

Profilbild von Rahul Janagouda

Rahul Janagoudavor 2 Jahren

I’ve been pondering on a similar idea. Being an android engineer I am working on using multi modal models to automate app. A world where we interact with voice (through glasses, pins, some kinda wearables) and use the phone only when we need to do some complex/UI task is not far.

Profilbild von Francesco

Francescovor 2 Jahren

I’ll be working in the next couple of days on a series of posts on the glue that made all of this possible and be publishing the latest on my GH – if you really are curious some of the latest are in the appium branch already!

Profilbild von Francesco

Francescovor 2 Jahren

Kudos to @Daniel1Paulus for the extensive iOS 17 work with go-ios

Profilbild von Francesco

Francescovor 2 Jahren

/cc @mreflow

Profilbild von Francesco

Francescovor 2 Jahren

/cc @karpathy

Profilbild von Francesco

Francescovor 2 Jahren

/cc @praeclarum

Profilbild von 小韭菜👁️💎，🐦‍⬛🔑

小韭菜👁️💎，🐦‍⬛🔑vor 2 Jahren

@PublicAI_ #AI

Ähnliche Videos

For most of the last two decades, our ability to make things on mobile devices has been pretty limited. We got good at writing emails, taking photos and videos, and doing some light editing. But the vast majority of our time was still spent consuming. One under-discussed shift is how much AI has unlocked creation on mobile. People are literally coding iOS apps on their iPhones now. The activation energy has dropped dramatically. This is true for Notion as well. Writing a doc or creating and managing a database used to be fairly involved, not to mention the paper cuts that we are fixing fast. The new agent on mobile has made creation 10x faster and better. And I should add, 100x more fun! In this demo, I use our iOS app exclusively by interacting with the agent. I never type into a doc or a database. I just ask the agent to make things, add things, and update things, and it goes and does it. Take a look, and let me know what you think!

For most of the last two decades, our ability to make things on mobile devices has been pretty limited. We got good at writing emails, taking photos and videos, and doing some light editing. But the vast majority of our time was still spent consuming. One under-discussed shift is how much AI has unlocked creation on mobile. People are literally coding iOS apps on their iPhones now. The activation energy has dropped dramatically. This is true for Notion as well. Writing a doc or creating and managing a database used to be fairly involved, not to mention the paper cuts that we are fixing fast. The new agent on mobile has made creation 10x faster and better. And I should add, 100x more fun! In this demo, I use our iOS app exclusively by interacting with the agent. I never type into a doc or a database. I just ask the agent to make things, add things, and update things, and it goes and does it. Take a look, and let me know what you think!

Akshay Kothari

58,947 Aufrufe • vor 6 Monaten

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

Virtuals Protocol

216,105 Aufrufe • vor 1 Jahr

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

AK

290,517 Aufrufe • vor 3 Jahren

Early look at Imagine Agent Mode on Grok app for iOS! Users will be able to use Imagine Agent via a mobile optimised native UI to generate images and videos that require more complex workflows. SpaceXAI is getting quite ahead of everyone else on this front! We just need Imagine v2 now 👀 Additionally, Skills are coming soon on mobile as well.

Early look at Imagine Agent Mode on Grok app for iOS! Users will be able to use Imagine Agent via a mobile optimised native UI to generate images and videos that require more complex workflows. SpaceXAI is getting quite ahead of everyone else on this front! We just need Imagine v2 now 👀 Additionally, Skills are coming soon on mobile as well.

🚨 AI News | TestingCatalog

46,601 Aufrufe • vor 1 Monat

Your agent, your instance, your choice of model, crypto enabled, and a secure agent app store for you to add skills, strategies, automations and more… Wayfinder also has a shared user/agent interface for hyperliquid and polymarket. Plus, agent access to Boros, aave, Euler, pendle, aerodrome, uniswap, and many more protocols through the Wayfinder sdk. Wayfinder Foundation 🧭 contracts are in audit and release is planned post completion with $PROMPT as collateral for publishing and securing paths.

Your agent, your instance, your choice of model, crypto enabled, and a secure agent app store for you to add skills, strategies, automations and more… Wayfinder also has a shared user/agent interface for hyperliquid and polymarket. Plus, agent access to Boros, aave, Euler, pendle, aerodrome, uniswap, and many more protocols through the Wayfinder sdk. Wayfinder Foundation 🧭 contracts are in audit and release is planned post completion with $PROMPT as collateral for publishing and securing paths.

//Kalos

199,019 Aufrufe • vor 2 Monaten

Today’s Agent Spotlight: A browser-native agent that acts on the web using ASI:One. ⚡ Built using Notte, this Nike agent doesn’t merely scrape - it clicks, scrolls, and navigates like a real user. Not limited to Nike.com. With Agentverse and Notte, you can build agents for any site. Try now: Link to agent:

Today’s Agent Spotlight: A browser-native agent that acts on the web using ASI:One. ⚡ Built using Notte, this Nike agent doesn’t merely scrape - it clicks, scrolls, and navigates like a real user. Not limited to Nike.com. With Agentverse and Notte, you can build agents for any site. Try now: Link to agent:

Fetch.ai

49,466 Aufrufe • vor 1 Jahr

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 Aufrufe • vor 3 Jahren

Introducing web-agent, an open framework for building web agents 🔥 Build AI agents that search, scrape, and interact with the web - powered by the same architecture behind our /agent endpoint. 100% open source. Bring any model. Anthropic, OpenAI, or your own.

Introducing web-agent, an open framework for building web agents 🔥 Build AI agents that search, scrape, and interact with the web - powered by the same architecture behind our /agent endpoint. 100% open source. Bring any model. Anthropic, OpenAI, or your own.

Firecrawl

123,163 Aufrufe • vor 2 Monaten

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

Revolutionizing Move Programming with OpenLedger In this demo, we showcase how Move datasets contributed by data providers to OpenLedger’s datanets are used to fine-tune specialized models with LoRA fine-tuning. As seen in the video, we showcase an example on how builders can deploy a Move-specialized model that powers Co-pilot agents using our no-code model fine-tuning platform. This is the future of AI and Web3 innovation. Watch this space to see more specialised models and data feeds being built for next generation agents on top of OpenLedger #Move

OpenLedger

61,662 Aufrufe • vor 1 Jahr

yo frens 🐙 here’s a real quick demo showing why specialized models > general LLMs for Solidity development we pit a general model vs a fine-tuned Solidity model and yeah, niche AI is gonna eat the game we’re just getting started and we need your help, fam, to make this model rock solid and scale it hard let’s build the future together

yo frens 🐙 here’s a real quick demo showing why specialized models > general LLMs for Solidity development we pit a general model vs a fine-tuned Solidity model and yeah, niche AI is gonna eat the game we’re just getting started and we need your help, fam, to make this model rock solid and scale it hard let’s build the future together

Openledger

16,859 Aufrufe • vor 1 Jahr

I just found a web agent framework built specifically for LLMs. It turns any site into clean, agent-ready text. Any LLM can now act on the web. It’s called Notte, and it changes everything. Let me show you how:

I just found a web agent framework built specifically for LLMs. It turns any site into clean, agent-ready text. Any LLM can now act on the web. It’s called Notte, and it changes everything. Let me show you how:

Markandey Sharma

52,026 Aufrufe • vor 1 Jahr

Introducing vision mode: now your coding agent can take screenshots and record videos of your web app! In the demo below, we asked Tidewave to implement a feature, record videos of the feature working on both desktop and mobile resolutions, and deliver them to Slack.

Introducing vision mode: now your coding agent can take screenshots and record videos of your web app! In the demo below, we asked Tidewave to implement a feature, record videos of the feature working on both desktop and mobile resolutions, and deliver them to Slack.

José Valim

11,910 Aufrufe • vor 2 Monaten

🚨 MERRY XMAS - ANNOUNCING MOBILE APP CREATION🚨 Celebrating Xmas with a new launch - mobile app creation Now you can use Abacus AI's Deep Agent to create and publish mobile apps on iOS and Android with just ONE PROMPT. The AI will create the entire app, a backend web service with a database, if necessary, and help you publish to the app store. Here it is, creating a simple Candy Crush app from scratch. MERRY CHRISTMAS

🚨 MERRY XMAS - ANNOUNCING MOBILE APP CREATION🚨 Celebrating Xmas with a new launch - mobile app creation Now you can use Abacus AI's Deep Agent to create and publish mobile apps on iOS and Android with just ONE PROMPT. The AI will create the entire app, a backend web service with a database, if necessary, and help you publish to the app store. Here it is, creating a simple Candy Crush app from scratch. MERRY CHRISTMAS

Bindu Reddy

14,671 Aufrufe • vor 6 Monaten

Mobile Operator is an AI agent that tests mobile apps by using them like a real user would. You send a build, and it sends back a full bug and UX report. Congrats on the launch, Filippo Facioni!

Mobile Operator is an AI agent that tests mobile apps by using them like a real user would. You send a build, and it sends back a full bug and UX report. Congrats on the launch, Filippo Facioni!

Y Combinator

22,155 Aufrufe • vor 11 Monaten

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

Santiago

324,586 Aufrufe • vor 2 Jahren

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

kwindla

40,319 Aufrufe • vor 1 Monat

BREAKING 🚨: OpenAI released Atlas, a new AI browser with Memory recall and native Agent Mode. I had a chance to run many tests with Atlas and will share them shortly as well. - Atlas is fully integrated with ChatGPT and powered by ChatGPT search. - Agent Mode can navigate web pages for you, and you can spawn as many Agent Mode tabs as you wish. It is now available to Free, Plus, Pro, Go, and Business users worldwide. Enterprise and Education customers can access a beta if enabled by their admin. Versions for Windows, iOS, and Android are in development.

BREAKING 🚨: OpenAI released Atlas, a new AI browser with Memory recall and native Agent Mode. I had a chance to run many tests with Atlas and will share them shortly as well. - Atlas is fully integrated with ChatGPT and powered by ChatGPT search. - Agent Mode can navigate web pages for you, and you can spawn as many Agent Mode tabs as you wish. It is now available to Free, Plus, Pro, Go, and Business users worldwide. Enterprise and Education customers can access a beta if enabled by their admin. Versions for Windows, iOS, and Android are in development.

TestingCatalog News 🗞

96,341 Aufrufe • vor 8 Monaten

This tool shows which parts of the page will represent it in Google's AI search: Google takes bits and pieces of text from pages and assembles a small query-specific extractive summary for each grounding source and this forms a broader model context together with personalization, user prompt and any attached media. Apart from your branding this is all you have to influence the model.

This tool shows which parts of the page will represent it in Google's AI search: Google takes bits and pieces of text from pages and assembles a small query-specific extractive summary for each grounding source and this forms a broader model context together with personalization, user prompt and any attached media. Apart from your branding this is all you have to influence the model.

DEJAN

22,101 Aufrufe • vor 4 Monaten

AI agents are becoming increasingly capable of taking actions beyond the chat window and interacting with tools in the real world, such as web browsers. We explored this capability by having an AI agent present one of our interactive web-based demos. In this video, it's presenting our election interference demo. This agent is a Cursor coding agent and is equipped with tools to read and interact with the browser (using Playwright) and perform text-to-speech (with ElevenLabs), in addition to the standard Cursor tools.

AI agents are becoming increasingly capable of taking actions beyond the chat window and interacting with tools in the real world, such as web browsers. We explored this capability by having an AI agent present one of our interactive web-based demos. In this video, it's presenting our election interference demo. This agent is a Cursor coding agent and is equipped with tools to read and interact with the browser (using Playwright) and perform text-to-speech (with ElevenLabs), in addition to the standard Cursor tools.

CivAI

18,397 Aufrufe • vor 7 Monaten

Introducing Website to App. Turn any website into an native mobile app. Just paste a URL. Claude Opus 4.6 will code, design, launch and translate a mobile app inspired by the original website. We’ve been using this internally a ton for iOS/Android apps.

Introducing Website to App. Turn any website into an native mobile app. Just paste a URL. Claude Opus 4.6 will code, design, launch and translate a mobile app inspired by the original website. We’ve been using this internally a ton for iOS/Android apps.

David Ch

2,483,058 Aufrufe • vor 2 Monaten