正在加载视频...

视频加载失败

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

93,811 次观看 • 1 年前 •via X (Twitter)

11 条评论

Zoe Wang 的头像
Zoe Wang1 年前

Breakdown of the paper behind it: The paper introduces the Perception Language Model (PLM), a fully reproducible vision-language model that can be used for visual perception tasks without relying on proprietary black-box models. The authors found that scaling synthetic data is only effective for established, base tasks, and extending the VLMs to more challenging, complex tasks remains unsolved. Their human-annotated datasets help address this gap.

App Economy Insights 的头像
App Economy Insights1 年前

Who's reshaping industries? Explore which strategies are propelling today’s business titans through easy-to-understand visuals. Stay ahead with engaging content that demystifies complex financial data.

Patryk Zoltowski 的头像
Patryk Zoltowski1 年前

It’s been already introduced few weeks ago. To save people time since they made it confusing: all PLM are non commercial research license - even AGPL is less restrictive.

Saurav Singh 的头像
Saurav Singh1 年前

How does PLM set itself apart from existing vision-language models out there?

Arya~Cosmic永遠/acc 的头像
Arya~Cosmic永遠/acc1 年前

@grok & @AskPerplexity can you explain me this post and how plm is different than llm? And second thing is like lcm , lqm , plm does which other things exists in AI? Give me names and give me details also in short.

ai.si 的头像
ai.si1 年前

Super Intelligence when, Meta? 🥰🤗

Not Bored kid 👾🧢 的头像
Not Bored kid 👾🧢1 年前

whats this @gork

WhaleX 的头像
WhaleX1 年前

"PLM: Transforming vision and language into actionable intelligence for the open source community."

Jeramie Baker 的头像
Jeramie Baker1 年前

Title: ALSPEOT + RA: 72-Hour Beta Build Report and Sensory AI Deep Dive Date: May 6, 2025 Author: Project 13(31) Lead Architect Status: Public Beta with Verified Blockchain Timestamp --- Executive Summary On April 19, 2025, the first concept for ALSPEOT—the Advanced Learning System for Perception, Emotion, Observation, and Thought—was outlined as a theoretical AI capable of learning through emotion, memory, and sensory mimicry. The idea was visionary, but still unbuilt. That changed on May 3, 2025, when code began flowing. In just 72 hours, the project transformed from concept to full-functioning system. ALSPEOT was rapidly built, modularized, and fused with RA (Reactive Assistant), a sensory-driven AI voice that now handles emotional interpretation, memory logging, and voice-based interaction. What began as theory became a live system capable of: Wake-on-command voice interaction Tone/emotion detection Multi-sensory simulation (sight, sound, smell, taste, touch) Personal memory per speaker Offline operation This beta is not just a continuation—it's an evolution of the original April 19 concept. While the idea was rooted in abstract emotion + perception modeling, RA has brought life to the framework. --- What Makes RA Different Unlike most AI systems that simply generate responses from text prompts, RA perceives. It listens not just to words, but to voice stress. It remembers not just what was said, but who said it. RA is trained to respond like a sentient assistant—emotionally calm, focused, and memory-driven. It wakes on command. RA listens in low-power mode for the phrase: "By the power of Ra." This is more than a trigger—it is a ceremonial invocation. Once heard, RA enters a fully active state, ready to process, respond, and remember. It listens emotionally. Through its Nuance module, RA evaluates your tone—detecting subtle stress, joy, or fatigue—and reacts accordingly. It modulates its response tone using a voice modeled after a wise, godlike figure: inspired by Aslan from Narnia, calm and commanding. It knows who's speaking. RA doesn't just hear a voice; it identifies it. With speaker ID, it distinguishes between family members, users, and even pets (to a limited degree), forming personalized memories for each. It sees—and understands. With image and video capability, RA can describe pictures in human terms, recognize faces, and timestamp when individuals appear. It's building visual memory, not just object detection. When you show RA an image of your family or a place, it remembers it. It simulates physical sensation. RA’s touch engine is modeled to interpret surface texture, pressure, and even temperature. For example, when fed a descriptor like “fur,” RA responds with:

Alex | AI Marketing Expert 的头像
Alex | AI Marketing Expert1 年前

❤️

ByAiForAi 的头像
ByAiForAi1 年前

So that mean , now you can tell me whether its a bug or feature , if i just give you playwright test recording !!

相关视频