Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Introducing Meta Perception Encoder: a vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models. Learn more about Meta Perception Encoder, read the research paper, and download the code and dataset

AI at Meta

825,176 subscribers

74,588 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Haberler & Politika Eğitim

Anya Rossi• Live Now

Private livecam show

11 Yorum

अंग्रेजी साहित्य profil fotoğrafı

अंग्रेजी साहित्य1 yıl önce

Help to excelsheet❓

Rainmaker profil fotoğrafı

Rainmaker1 yıl önce

Decode the labor market! Learn how to track jobless claims using FRED and Python in my latest free Substack post. 📈 A must-read for data enthusiasts & economists. Dive into how data insights can shape your understanding of the economy.

WhaleX profil fotoğrafı

WhaleX1 yıl önce

"A vision encoder setting new standards in image & video tasks, excelling in zero-shot classification & retrieval."

Guinther Kovalski profil fotoğrafı

Guinther Kovalski1 yıl önce

just impressive how Siglip stills so close with less than 1/6 of the parameters @giffmana

Zoom profil fotoğrafı

Zoom1 yıl önce

It’s over bro, rest.

Thomas | Æ profil fotoğrafı

Thomas | Æ1 yıl önce

Its ability to excel in zero-shot tasks pushes the boundaries of image and video processing. Can’t wait to dive into the research and see how it outperforms current models.

Reji Modiyil profil fotoğrafı

Reji Modiyil1 yıl önce

@AIatMeta, this could be a game-changer in visual technology. excited to see its impact.

Jesse Campbell profil fotoğrafı

Jesse Campbell1 yıl önce

Ok...? What is it?

Jack Assery profil fotoğrafı

Jack Assery1 yıl önce

Interesting 👀

1st Amendment profil fotoğrafı

1st Amendment1 yıl önce

42 Homies 😒

Breck to the Future profil fotoğrafı

Breck to the Future1 yıl önce

Incredible progress here. Meta Perception Encoder shows what's possible when you unify architecture across image and video tasks. Zero-shot performance is no longer optional... it's the new baseline. Excited to see how this accelerates real-world applications. Always looking to the future!

Benzer Videolar

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

AI at Meta

94,389 görüntüleme • 1 yıl önce

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

AI at Meta

163,313 görüntüleme • 1 yıl önce

Introducing Long Zhao, a Senior Research Scientist at Google, who worked to build VideoPrism: A Foundational Visual Encoder for Video Understanding. Read the blog to explore innovations in video understanding tasks and more →

Introducing Long Zhao, a Senior Research Scientist at Google, who worked to build VideoPrism: A Foundational Visual Encoder for Video Understanding. Read the blog to explore innovations in video understanding tasks and more →

Google AI

129,768 görüntüleme • 2 yıl önce

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

AI at Meta

81,406 görüntüleme • 1 yıl önce

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

AI at Meta

1,250,657 görüntüleme • 7 ay önce

👀Humans compare images by looking back and forth. Many open-weight VLMs encode each image independently, and defer comparison to the LM. We introduce SVE: Stateful Visual Encoders for Vision-Language Models, where the visual encoder itself becomes change-aware. 🌐Project: 📰Paper: 💻Code: 1/n

👀Humans compare images by looking back and forth. Many open-weight VLMs encode each image independently, and defer comparison to the LM. We introduce SVE: Stateful Visual Encoders for Vision-Language Models, where the visual encoder itself becomes change-aware. 🌐Project: 📰Paper: 💻Code: 1/n

Zirui "Colin" Wang

51,762 görüntüleme • 1 ay önce

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ➡️Download the open weights: ➡️Download the code:

AI at Meta

313,765 görüntüleme • 10 ay önce

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Lior Alexander

290,190 görüntüleme • 3 yıl önce

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

AI at Meta

6,940,760 görüntüleme • 3 ay önce

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code.

AI at Meta

58,510 görüntüleme • 1 yıl önce

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

By sharing some of the insights and challenges in developing the Meta PARTNR demo, we hope to contribute to the development of the next wave of innovation in human-robot collaboration. Research paper ➡️ Dataset and code ➡️

AI at Meta

27,359 görüntüleme • 1 yıl önce

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

AI at Meta

453,260 görüntüleme • 1 yıl önce

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Project Aria is a research program helping Meta unlock new possibilities of how we connect with and experience the world through AR and AI. 📺Watch the full tutorial from #CVPR2022 to learn how Project Aria will advance machine perception and AI research:

Meta Open Source

14,720 görüntüleme • 3 yıl önce

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

AI at Meta

310,120 görüntüleme • 1 yıl önce

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

AI at Meta

150,222 görüntüleme • 1 yıl önce

Open Source has done it again. AI at Meta have released the code for their new Animated Drawings tool. AI can now automatically animate children's drawings of human-like figures. The demo is free, the research paper is available and the code and dataset (nearly 180k annotated amateur drawings) is public. More below 👇

Open Source has done it again. AI at Meta have released the code for their new Animated Drawings tool. AI can now automatically animate children's drawings of human-like figures. The demo is free, the research paper is available and the code and dataset (nearly 180k annotated amateur drawings) is public. More below 👇

d@x

49,054 görüntüleme • 3 yıl önce

📣 Microsoft Research releases Florence-VL, a new family of MLLMs powered by the generative vision foundation model Florence-2. Achieves significant improvements in general VQA, perception, hallucination, OCR, Chart, knowledge-intensive understanding, and more🔥Learn more👇

📣 Microsoft Research releases Florence-VL, a new family of MLLMs powered by the generative vision foundation model Florence-2. Achieves significant improvements in general VQA, perception, hallucination, OCR, Chart, knowledge-intensive understanding, and more🔥Learn more👇

Gradio

14,454 görüntüleme • 1 yıl önce

Muse Spark 1.1 also excels in perception and multimodal reasoning, inspecting visual and audio inputs, preserving details across long workflows, and acting on them in real execution environments. It shows particular strengths in visual-to-code generation, rich image/video captioning, and agentic computer use. In this demo, using video shot from a smartphone, Muse Spark 1.1 extracts useful photos and reasons about the product to operate a user's browser and make a Facebook Marketplace listing on the user's behalf.

Muse Spark 1.1 also excels in perception and multimodal reasoning, inspecting visual and audio inputs, preserving details across long workflows, and acting on them in real execution environments. It shows particular strengths in visual-to-code generation, rich image/video captioning, and agentic computer use. In this demo, using video shot from a smartphone, Muse Spark 1.1 extracts useful photos and reasons about the product to operate a user's browser and make a Facebook Marketplace listing on the user's behalf.

AI at Meta

70,376 görüntüleme • 13 gün önce

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early fusion approach. While some LLMs have separate image and text encoders or decoders, Chameleon is one of the first publicly released approaches using a single unified architecture. We’re releasing Chameleon models under a research license to help democratize access to foundational mixed-modal models & further research on early fusion. Approach & training details in the paper ➡️

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early fusion approach. While some LLMs have separate image and text encoders or decoders, Chameleon is one of the first publicly released approaches using a single unified architecture. We’re releasing Chameleon models under a research license to help democratize access to foundational mixed-modal models & further research on early fusion. Approach & training details in the paper ➡️

AI at Meta

54,426 görüntüleme • 2 yıl önce

📣 New research from GenAI at Meta, introducing Meta 3D Gen: A new system for end-to-end generation of 3D assets from text in <1min. Meta 3D Gen is a new combined AI system that can generate high-quality 3D assets, with both high-resolution textures and material maps end-to-end, producing results that are superior to existing solutions — at 3-10x the speed of existing work in this space. Details in the technical report ➡️

📣 New research from GenAI at Meta, introducing Meta 3D Gen: A new system for end-to-end generation of 3D assets from text in <1min. Meta 3D Gen is a new combined AI system that can generate high-quality 3D assets, with both high-resolution textures and material maps end-to-end, producing results that are superior to existing solutions — at 3-10x the speed of existing work in this space. Details in the technical report ➡️

AI at Meta

408,783 görüntüleme • 2 yıl önce