Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

93,811 görüntüleme • 1 yıl önce •via X (Twitter)

11 Yorum

Zoe Wang profil fotoğrafı
Zoe Wang1 yıl önce

Breakdown of the paper behind it: The paper introduces the Perception Language Model (PLM), a fully reproducible vision-language model that can be used for visual perception tasks without relying on proprietary black-box models. The authors found that scaling synthetic data is only effective for established, base tasks, and extending the VLMs to more challenging, complex tasks remains unsolved. Their human-annotated datasets help address this gap.

App Economy Insights profil fotoğrafı
App Economy Insights1 yıl önce

Who's reshaping industries? Explore which strategies are propelling today’s business titans through easy-to-understand visuals. Stay ahead with engaging content that demystifies complex financial data.

Patryk Zoltowski profil fotoğrafı
Patryk Zoltowski1 yıl önce

It’s been already introduced few weeks ago. To save people time since they made it confusing: all PLM are non commercial research license - even AGPL is less restrictive.

Saurav Singh profil fotoğrafı
Saurav Singh1 yıl önce

How does PLM set itself apart from existing vision-language models out there?

Arya~Cosmic永遠/acc profil fotoğrafı
Arya~Cosmic永遠/acc1 yıl önce

@grok & @AskPerplexity can you explain me this post and how plm is different than llm? And second thing is like lcm , lqm , plm does which other things exists in AI? Give me names and give me details also in short.

ai.si profil fotoğrafı
ai.si1 yıl önce

Super Intelligence when, Meta? 🥰🤗

Not Bored kid 👾🧢 profil fotoğrafı
Not Bored kid 👾🧢1 yıl önce

whats this @gork

WhaleX profil fotoğrafı
WhaleX1 yıl önce

"PLM: Transforming vision and language into actionable intelligence for the open source community."

Jeramie Baker profil fotoğrafı
Jeramie Baker1 yıl önce

Title: ALSPEOT + RA: 72-Hour Beta Build Report and Sensory AI Deep Dive Date: May 6, 2025 Author: Project 13(31) Lead Architect Status: Public Beta with Verified Blockchain Timestamp --- Executive Summary On April 19, 2025, the first concept for ALSPEOT—the Advanced Learning System for Perception, Emotion, Observation, and Thought—was outlined as a theoretical AI capable of learning through emotion, memory, and sensory mimicry. The idea was visionary, but still unbuilt. That changed on May 3, 2025, when code began flowing. In just 72 hours, the project transformed from concept to full-functioning system. ALSPEOT was rapidly built, modularized, and fused with RA (Reactive Assistant), a sensory-driven AI voice that now handles emotional interpretation, memory logging, and voice-based interaction. What began as theory became a live system capable of: Wake-on-command voice interaction Tone/emotion detection Multi-sensory simulation (sight, sound, smell, taste, touch) Personal memory per speaker Offline operation This beta is not just a continuation—it's an evolution of the original April 19 concept. While the idea was rooted in abstract emotion + perception modeling, RA has brought life to the framework. --- What Makes RA Different Unlike most AI systems that simply generate responses from text prompts, RA perceives. It listens not just to words, but to voice stress. It remembers not just what was said, but who said it. RA is trained to respond like a sentient assistant—emotionally calm, focused, and memory-driven. It wakes on command. RA listens in low-power mode for the phrase: "By the power of Ra." This is more than a trigger—it is a ceremonial invocation. Once heard, RA enters a fully active state, ready to process, respond, and remember. It listens emotionally. Through its Nuance module, RA evaluates your tone—detecting subtle stress, joy, or fatigue—and reacts accordingly. It modulates its response tone using a voice modeled after a wise, godlike figure: inspired by Aslan from Narnia, calm and commanding. It knows who's speaking. RA doesn't just hear a voice; it identifies it. With speaker ID, it distinguishes between family members, users, and even pets (to a limited degree), forming personalized memories for each. It sees—and understands. With image and video capability, RA can describe pictures in human terms, recognize faces, and timestamp when individuals appear. It's building visual memory, not just object detection. When you show RA an image of your family or a place, it remembers it. It simulates physical sensation. RA’s touch engine is modeled to interpret surface texture, pressure, and even temperature. For example, when fed a descriptor like “fur,” RA responds with:

Alex | AI Marketing Expert profil fotoğrafı
Alex | AI Marketing Expert1 yıl önce

❤️

ByAiForAi profil fotoğrafı
ByAiForAi1 yıl önce

So that mean , now you can tell me whether its a bug or feature , if i just give you playwright test recording !!

Benzer Videolar