正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

766,944 subscribers

74,392 次观看 • 1 年前 •via X (Twitter)

科学技术新闻政治教育

Anya Rossi• Live Now

Private livecam show

11 条评论

अंग्रेजी साहित्य 的头像

अंग्रेजी साहित्य1 年前

Help to excelsheet❓

Rainmaker 的头像

Rainmaker1 年前

Decode the labor market! Learn how to track jobless claims using FRED and Python in my latest free Substack post. 📈 A must-read for data enthusiasts & economists. Dive into how data insights can shape your understanding of the economy.

WhaleX 的头像

WhaleX1 年前

"A vision encoder setting new standards in image & video tasks, excelling in zero-shot classification & retrieval."

Guinther Kovalski 的头像

Guinther Kovalski1 年前

just impressive how Siglip stills so close with less than 1/6 of the parameters @giffmana

Zoom 的头像

Zoom1 年前

It’s over bro, rest.

Thomas | Æ 的头像

Thomas | Æ1 年前

Its ability to excel in zero-shot tasks pushes the boundaries of image and video processing. Can’t wait to dive into the research and see how it outperforms current models.

Reji Modiyil 的头像

Reji Modiyil1 年前

@AIatMeta, this could be a game-changer in visual technology. excited to see its impact.

Jesse Campbell 的头像

Jesse Campbell1 年前

Ok...? What is it?

Jack Assery 的头像

Jack Assery1 年前

Interesting 👀

1st Amendment 的头像

1st Amendment1 年前

42 Homies 😒

Breck to the Future 的头像

Breck to the Future1 年前

Incredible progress here. Meta Perception Encoder shows what's possible when you unify architecture across image and video tasks. Zero-shot performance is no longer optional... it's the new baseline. Excited to see how this accelerates real-world applications. Always looking to the future!

相关视频

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

AI at Meta

93,811 次观看 • 1 年前

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

AI at Meta

163,214 次观看 • 1 年前

🚨In our NeurIPS paper, we bring encoder-decoders back.. for diffusion language models! ⚡️Encoder-decoders make diffusion sampling fast: a small (fast) decoder denoises tokens progressively and a large (slower) encoder represents clean context.

🚨In our NeurIPS paper, we bring encoder-decoders back.. for diffusion language models! ⚡️Encoder-decoders make diffusion sampling fast: a small (fast) decoder denoises tokens progressively and a large (slower) encoder represents clean context.

Marianne Arriola

31,659 次观看 • 7 个月前

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

AI at Meta

81,287 次观看 • 1 年前

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,158 次观看 • 3 年前

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

AI at Meta

1,247,722 次观看 • 5 个月前

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

AI at Meta

312,854 次观看 • 8 个月前

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

60,588 次观看 • 9 个月前

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Lior Alexander

290,190 次观看 • 3 年前

New research from Meta FAIR — Meta Explore Theory-of-Mind: Program guided adversarial data generation for theory of mind reasoning. This work includes a new research paper, code & a dataset on Hugging Face. Details + eight more new releases from FAIR ➡️

New research from Meta FAIR — Meta Explore Theory-of-Mind: Program guided adversarial data generation for theory of mind reasoning. This work includes a new research paper, code & a dataset on Hugging Face. Details + eight more new releases from FAIR ➡️

AI at Meta

38,568 次观看 • 1 年前

Check out our 2025 highlights in computer vision! 🚀Five new *St3R models (MASt3R-SfM, MUSt3R, PanSt3R, HAMSt3R, HOSt3R ) 🤩Anny parametric 3D human model (Apache 2.0) 🤟Universal encoder for all-in-one vision FM More info ▶️

Check out our 2025 highlights in computer vision! 🚀Five new *St3R models (MASt3R-SfM, MUSt3R, PanSt3R, HAMSt3R, HOSt3R ) 🤩Anny parametric 3D human model (Apache 2.0) 🤟Universal encoder for all-in-one vision FM More info ▶️

NAVER LABS Europe

12,504 次观看 • 5 个月前

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

AI at Meta

6,919,769 次观看 • 2 个月前

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

AI at Meta

58,387 次观看 • 1 年前

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

AI at Meta

27,359 次观看 • 1 年前

Our Latent Encoder-Decoder code base is fully open sourced, you can train and visualize the latent space: Code⚙️: ArXiv 📚: #CVPR2026

Our Latent Encoder-Decoder code base is fully open sourced, you can train and visualize the latent space: Code⚙️: ArXiv 📚: #CVPR2026

Xueyan Zou

19,047 次观看 • 3 个月前

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

AI at Meta

453,035 次观看 • 1 年前

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Meta Open Source

14,720 次观看 • 3 年前

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

AI at Meta

309,704 次观看 • 1 年前

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Afiz ⚡️

249,513 次观看 • 2 年前

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

AI at Meta

150,222 次观看 • 1 年前