AI at Meta's banner
AI at Meta's profile picture

AI at Meta

@AIatMeta803,548 subscribers

Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.

Shorts

🏆 We're thrilled to announce that Meta FAIR’s Brain & AI team won 1st place at the prestigious Algonauts 2025 brain modeling competition. Their 1B parameter model, TRIBE (Trimodal Brain Encoder), is the first deep neural network trained to predict brain responses to stimuli across multiple modalities, cortical areas, and individuals. The approach combines pretrained representations of several foundational models from Meta – text (Llama 3.2), audio (Wav2Vec2-BERT from Seamless) and video (V-JEPA 2) – to predict a very large amount (80 hours per subject) of spatio-temporal fMRI brain responses to movies acquired by the Courtois NeuroMod project Download the code: Read the paper: Learn about the challenge: Download the data:

🏆 We're thrilled to announce that Meta FAIR’s Brain & AI team won 1st place at the prestigious Algonauts 2025 brain modeling competition. Their 1B parameter model, TRIBE (Trimodal Brain Encoder), is the first deep neural network trained to predict brain responses to stimuli across multiple modalities, cortical areas, and individuals. The approach combines pretrained representations of several foundational models from Meta – text (Llama 3.2), audio (Wav2Vec2-BERT from Seamless) and video (V-JEPA 2) – to predict a very large amount (80 hours per subject) of spatio-temporal fMRI brain responses to movies acquired by the Courtois NeuroMod project Download the code: Read the paper: Learn about the challenge: Download the data:

1,092,731 views

Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️

Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️

3,570,083 views

Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards. More on this new work ➡️

Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards. More on this new work ➡️

1,224,157 views

Today we’re releasing Code Llama, a large language model built on top of Llama 2, fine-tuned for coding & state-of-the-art for publicly available coding tools. Keeping with our open approach, Code Llama is publicly-available now for both research & commercial use. More ⬇️

Today we’re releasing Code Llama, a large language model built on top of Llama 2, fine-tuned for coding & state-of-the-art for publicly available coding tools. Keeping with our open approach, Code Llama is publicly-available now for both research & commercial use. More ⬇️

1,004,834 views

Today we're sharing details on AudioCraft, a new family of generative AI models built for generating high-quality, realistic audio & music from text. AudioCraft is a single code base that works for music, sound, compression & generation — all in the same place. More details ⬇️

Today we're sharing details on AudioCraft, a new family of generative AI models built for generating high-quality, realistic audio & music from text. AudioCraft is a single code base that works for music, sound, compression & generation — all in the same place. More details ⬇️

677,704 views

We’re releasing model weights for our 8B- parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. Learn more about how Dynamic Byte Latent Transformer is paving the way for groundbreaking developments in the field of language modeling. Read the research paper, and download the model and code.

We’re releasing model weights for our 8B- parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. Learn more about how Dynamic Byte Latent Transformer is paving the way for groundbreaking developments in the field of language modeling. Read the research paper, and download the model and code.

195,454 views

Today we're releasing the Open Catalyst Demo to the public — this new service will allow researchers to accelerate work in material sciences by enabling them to simulate the reactivity of catalyst materials ~1000x faster than existing computational methods using AI. Demo ⬇️

Today we're releasing the Open Catalyst Demo to the public — this new service will allow researchers to accelerate work in material sciences by enabling them to simulate the reactivity of catalyst materials ~1000x faster than existing computational methods using AI. Demo ⬇️

442,458 views

Introducing ImageBind by Meta AI: the first AI model capable of binding data from six modalities at once. This breakthrough brings machines one step closer to the human ability to bind together information from many different senses. More on this new open source work ⬇️

Introducing ImageBind by Meta AI: the first AI model capable of binding data from six modalities at once. This breakthrough brings machines one step closer to the human ability to bind together information from many different senses. More on this new open source work ⬇️

333,326 views

Together with the Ego4D consortium, today we're releasing Ego-Exo4D, the largest ever public dataset of its kind to support research on video learning & multimodal perception — including 1,400+ hours of videos of skilled human activities. Download ➡️

Together with the Ego4D consortium, today we're releasing Ego-Exo4D, the largest ever public dataset of its kind to support research on video learning & multimodal perception — including 1,400+ hours of videos of skilled human activities. Download ➡️

260,250 views

Our Segment Anything Models are helping advance flood monitoring and disaster response. See how USRA and USGS have fine-tuned SAM to automate a key bottleneck in real-time river mapping, enabling faster, scalable, and more cost-effective disaster preparedness:

Our Segment Anything Models are helping advance flood monitoring and disaster response. See how USRA and USGS have fine-tuned SAM to automate a key bottleneck in real-time river mapping, enabling faster, scalable, and more cost-effective disaster preparedness:

52,944 views

SeamlessExpressive, a new AI translation model by research teams at Meta, enables high-quality speech translation that maintains the speaker's vocal style, tone and unique expressions in translated outputs. Try the demo with your own voice ➡️

SeamlessExpressive, a new AI translation model by research teams at Meta, enables high-quality speech translation that maintains the speaker's vocal style, tone and unique expressions in translated outputs. Try the demo with your own voice ➡️

168,999 views

🤖 New robotics research from Meta AI & CMU Robotics Institute — RoboAgent can acquire a wide diversity of non-trivial skills + generalize them to hundreds of unseen scenarios — all w/ an order of magnitude less data than prior works in this space. More details ➡️

🤖 New robotics research from Meta AI & CMU Robotics Institute — RoboAgent can acquire a wide diversity of non-trivial skills + generalize them to hundreds of unseen scenarios — all w/ an order of magnitude less data than prior works in this space. More details ➡️

164,519 views

We’re continuing to see exciting results as we work with our Meta Movie Gen models, here are some more examples of what they can do 🧵

We’re continuing to see exciting results as we work with our Meta Movie Gen models, here are some more examples of what they can do 🧵

101,997 views

Today, we're sharing two major advancements in our work toward general-purpose embodied AI agents: VC-1 & ASC. We're excited for how this work will help build toward a future where AI agents can assist humans in both the virtual & physical world. Details ⬇️

Today, we're sharing two major advancements in our work toward general-purpose embodied AI agents: VC-1 & ASC. We're excited for how this work will help build toward a future where AI agents can assist humans in both the virtual & physical world. Details ⬇️

124,932 views

The Meta Llama 3 Hackathon is this weekend in SF with @Cerebral_Valley! Get on the list ➡️ What to expect • Two days of building alongside the best hackers in AI • Hands on support from the Llama team • Talks from some of the top names in the industry

The Meta Llama 3 Hackathon is this weekend in SF with @Cerebral_Valley! Get on the list ➡️ What to expect • Two days of building alongside the best hackers in AI • Hands on support from the Llama team • Talks from some of the top names in the industry

50,762 views

You can find more on Meta Sparsh, and more of our recently announced robotics AI research in this post ➡️

You can find more on Meta Sparsh, and more of our recently announced robotics AI research in this post ➡️

20,747 views

Videos

AIatMeta's profile picture

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

AI at Meta

2,263,545 views • 1 year ago