Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

766,944 subscribers

74,392 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Nachrichten & Politik Bildung

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von अंग्रेजी साहित्य

अंग्रेजी साहित्यvor 1 Jahr

Help to excelsheet❓

Profilbild von Rainmaker

Rainmakervor 1 Jahr

Decode the labor market! Learn how to track jobless claims using FRED and Python in my latest free Substack post. 📈 A must-read for data enthusiasts & economists. Dive into how data insights can shape your understanding of the economy.

Profilbild von WhaleX

WhaleXvor 1 Jahr

"A vision encoder setting new standards in image & video tasks, excelling in zero-shot classification & retrieval."

Profilbild von Guinther Kovalski

Guinther Kovalskivor 1 Jahr

just impressive how Siglip stills so close with less than 1/6 of the parameters @giffmana

Profilbild von Zoom

Zoomvor 1 Jahr

It’s over bro, rest.

Profilbild von Thomas | Æ

Thomas | Ævor 1 Jahr

Its ability to excel in zero-shot tasks pushes the boundaries of image and video processing. Can’t wait to dive into the research and see how it outperforms current models.

Profilbild von Reji Modiyil

Reji Modiyilvor 1 Jahr

@AIatMeta, this could be a game-changer in visual technology. excited to see its impact.

Profilbild von Jesse Campbell

Jesse Campbellvor 1 Jahr

Ok...? What is it?

Profilbild von Jack Assery

Jack Asseryvor 1 Jahr

Interesting 👀

Profilbild von 1st Amendment

1st Amendmentvor 1 Jahr

42 Homies 😒

Profilbild von Breck to the Future

Breck to the Futurevor 1 Jahr

Incredible progress here. Meta Perception Encoder shows what's possible when you unify architecture across image and video tasks. Zero-shot performance is no longer optional... it's the new baseline. Excited to see how this accelerates real-world applications. Always looking to the future!

Ähnliche Videos

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

AI at Meta

93,811 Aufrufe • vor 1 Jahr

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

AI at Meta

163,214 Aufrufe • vor 1 Jahr

🚨In our NeurIPS paper, we bring encoder-decoders back.. for diffusion language models! ⚡️Encoder-decoders make diffusion sampling fast: a small (fast) decoder denoises tokens progressively and a large (slower) encoder represents clean context.

🚨In our NeurIPS paper, we bring encoder-decoders back.. for diffusion language models! ⚡️Encoder-decoders make diffusion sampling fast: a small (fast) decoder denoises tokens progressively and a large (slower) encoder represents clean context.

Marianne Arriola

31,659 Aufrufe • vor 7 Monaten

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

AI at Meta

81,287 Aufrufe • vor 1 Jahr

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,151 Aufrufe • vor 3 Jahren

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

AI at Meta

1,247,722 Aufrufe • vor 5 Monaten

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

AI at Meta

312,785 Aufrufe • vor 8 Monaten

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

Apple FastVLM-7B Efficient Vision Encoding for Vision Language Models larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT vibe coding a video captioning app with it in anycoder

AK

60,588 Aufrufe • vor 9 Monaten

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Lior Alexander

290,190 Aufrufe • vor 3 Jahren

New research from Meta FAIR — Meta Explore Theory-of-Mind: Program guided adversarial data generation for theory of mind reasoning. This work includes a new research paper, code & a dataset on Hugging Face. Details + eight more new releases from FAIR ➡️

New research from Meta FAIR — Meta Explore Theory-of-Mind: Program guided adversarial data generation for theory of mind reasoning. This work includes a new research paper, code & a dataset on Hugging Face. Details + eight more new releases from FAIR ➡️

AI at Meta

38,568 Aufrufe • vor 1 Jahr

Check out our 2025 highlights in computer vision! 🚀Five new *St3R models (MASt3R-SfM, MUSt3R, PanSt3R, HAMSt3R, HOSt3R ) 🤩Anny parametric 3D human model (Apache 2.0) 🤟Universal encoder for all-in-one vision FM More info ▶️

Check out our 2025 highlights in computer vision! 🚀Five new *St3R models (MASt3R-SfM, MUSt3R, PanSt3R, HAMSt3R, HOSt3R ) 🤩Anny parametric 3D human model (Apache 2.0) 🤟Universal encoder for all-in-one vision FM More info ▶️

NAVER LABS Europe

12,504 Aufrufe • vor 5 Monaten

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

AI at Meta

6,918,771 Aufrufe • vor 2 Monaten

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

AI at Meta

58,387 Aufrufe • vor 1 Jahr

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

AI at Meta

27,359 Aufrufe • vor 1 Jahr

Our Latent Encoder-Decoder code base is fully open sourced, you can train and visualize the latent space: Code⚙️: ArXiv 📚: #CVPR2026

Our Latent Encoder-Decoder code base is fully open sourced, you can train and visualize the latent space: Code⚙️: ArXiv 📚: #CVPR2026

Xueyan Zou

19,047 Aufrufe • vor 2 Monaten

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

AI at Meta

453,035 Aufrufe • vor 1 Jahr

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Meta Open Source

14,720 Aufrufe • vor 3 Jahren

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

AI at Meta

309,704 Aufrufe • vor 1 Jahr

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Afiz ⚡️

249,513 Aufrufe • vor 2 Jahren

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

AI at Meta

150,222 Aufrufe • vor 1 Jahr