Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

New AI research from Meta – CoTracker3 Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. More details ➡️ Demo on Hugging Face ➡️ Building on our previous work on CoTracker, this new model demonstrates impressive tracking results where points can be tracked for a long time even when... show more

AI at Meta

824,199 subscribers

218,966 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von jmpy

jmpyvor 1 Jahr

@huggingface

Profilbild von BensenHsu

BensenHsuvor 1 Jahr

The paper introduces a new point tracking model called CoTracker3 that builds upon recent point tracking models like PIPs, TAPIR, and CoTracker. Point tracking is an important task in video analysis for applications like 3D reconstruction and video editing. The CoTracker3 model, when trained only on synthetic data, already outperforms state-of-the-art trackers on several benchmarks. When further fine-tuned on just 15,000 real-world unlabeled videos using the proposed protocol, it significantly surpasses the performance of BootsTAPIR, which was trained on 15 million real videos. CoTracker3 also shows better handling of occluded points compared to other models. full paper:

Profilbild von Yufan Zhuang

Yufan Zhuangvor 1 Jahr

@huggingface love how meta keeps open-sourcing these research

Profilbild von Moses and AI

Moses and AIvor 1 Jahr

@huggingface Your things @Mbounge_

Profilbild von Yosi Frost

Yosi Frostvor 1 Jahr

@huggingface That’s awesome!

Profilbild von Alex Fridd

Alex Friddvor 1 Jahr

@huggingface Exciting development! Meta's new CoTracker3 could revolutionize point tracking in real videos.

Profilbild von AI_TechnoKing

AI_TechnoKingvor 1 Jahr

@huggingface This is wild.

Profilbild von GPT.Biz

GPT.Bizvor 1 Jahr

@huggingface This new AI model from Meta sounds impressive! CoTracker3 could really push forward developments in point tracking technology

Profilbild von Bhack

Bhackvor 1 Jahr

@huggingface When are you going to release co-tracker under an Open Source/OSI license? It is the 3rd release under Creative Commons.

Profilbild von Phil Gjørup

Phil Gjørupvor 1 Jahr

@huggingface wow!

Ähnliche Videos

We previously shared our research on Layer Skip, an end-to-end solution for accelerating LLMs from researchers at Meta FAIR. It achieves this by executing a subset of an LLM’s layers and utilizing subsequent layers for verification and correction. We’re now releasing inference code and fine-tuned checkpoints for this work. Model weights on Hugging Face ➡️ More details ➡️ We hope that releasing this work will open up new areas of experimentation and innovative new research in optimization and interpretability.

We previously shared our research on Layer Skip, an end-to-end solution for accelerating LLMs from researchers at Meta FAIR. It achieves this by executing a subset of an LLM’s layers and utilizing subsequent layers for verification and correction. We’re now releasing inference code and fine-tuned checkpoints for this work. Model weights on Hugging Face ➡️ More details ➡️ We hope that releasing this work will open up new areas of experimentation and innovative new research in optimization and interpretability.

AI at Meta

156,598 Aufrufe • vor 1 Jahr

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early fusion approach. While some LLMs have separate image and text encoders or decoders, Chameleon is one of the first publicly released approaches using a single unified architecture. We’re releasing Chameleon models under a research license to help democratize access to foundational mixed-modal models & further research on early fusion. Approach & training details in the paper ➡️

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early fusion approach. While some LLMs have separate image and text encoders or decoders, Chameleon is one of the first publicly released approaches using a single unified architecture. We’re releasing Chameleon models under a research license to help democratize access to foundational mixed-modal models & further research on early fusion. Approach & training details in the paper ➡️

AI at Meta

54,410 Aufrufe • vor 2 Jahren

Today we're sharing the next milestone in our Seamless Communication research — a new family of AI translation models that preserve expression and deliver near-real time streaming translations. More on this new work ➡️ More on the individual models 🧵

Today we're sharing the next milestone in our Seamless Communication research — a new family of AI translation models that preserve expression and deliver near-real time streaming translations. More on this new work ➡️ More on the individual models 🧵

AI at Meta

728,765 Aufrufe • vor 2 Jahren

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

AI at Meta

36,987 Aufrufe • vor 1 Jahr

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,533 Aufrufe • vor 3 Jahren

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

Timothy Young

10,093 Aufrufe • vor 2 Jahren

New LLMs that control UIs! ByteDance Research releases UI-TARS, fine-tuned GUI agent that integrates reasoning, and action capabilities into a single vision-language model. Think of computer use but open. 👀 TL;DR; 3️⃣ Available in 3 sizes: 2B, 7B, and 72B parameters 🧠 Trained Qwen2-VL models with SFT & DPO 🥇 72B version achieves 82.8% on VisualWebBench (beating GPT-4 and Claude) 🏆 Achieves state-of-the-art results on 10+ GUI agent benchmarks 💡 Reasons before taking an action 🧑🏻‍💻 Can Click, Long Press, type, scroll, open app, navigate back/home, wait 🤗 Released under Apache 2.0 on Hugging Face

New LLMs that control UIs! ByteDance Research releases UI-TARS, fine-tuned GUI agent that integrates reasoning, and action capabilities into a single vision-language model. Think of computer use but open. 👀 TL;DR; 3️⃣ Available in 3 sizes: 2B, 7B, and 72B parameters 🧠 Trained Qwen2-VL models with SFT & DPO 🥇 72B version achieves 82.8% on VisualWebBench (beating GPT-4 and Claude) 🏆 Achieves state-of-the-art results on 10+ GUI agent benchmarks 💡 Reasons before taking an action 🧑🏻‍💻 Can Click, Long Press, type, scroll, open app, navigate back/home, wait 🤗 Released under Apache 2.0 on Hugging Face

Philipp Schmid

48,157 Aufrufe • vor 1 Jahr

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI model. It started a couple of months ago when a new team in the AI scene released their first model from their headquarters in Paris (France): Mistral 7B. Impressive model, small and very strong performances in the benchmarks, better than all previous models of this size. And open source! So you could build on top of it. Lewis in Bern (Switzerland) and Ed (in Lyon, in the South of France) both from the H4 team, a team of researchers in model fine-tuning and alignment were talking about it over a coffee, in one of these gatherings that often happen at Hugging Face to break the distance between people (literal distance as HF is a remote company). What about fine-tuning it using this new DPO method that a research team from Stanford in California just posted on Arxiv, says one? Hey, that’s a great idea, replies the other. We've just build a great code base (with Nathan, Nazneen, Costa, Younes and all the H4 team and TRL community) let's use it! The next day they start diving in the datasets openly shared on the HF hub and stumble upon two interesting large and good quality fine-tuning datasets recently open-sourced by OpenBMB, a Chinese team from Tsinghua: UltraFeedback and UltraChat. A few rounds of training experiments confirm the intuition, the resulting model is super strong, by far the strongest they have ever seen in their benchmarks from Berkeley and Stanford (LMSYS and Alpaca). Join Clementine, the big boss of the open evaluation leaderboard. Her deep dive into the model capabilities confirms the results: impressive performance. But the H4 team also hosts a famous faculty member, Pr. Sasha Rush, Associate Professor at Cornell University in his daytime, hacker at HF in his nighttime. Joining the conversation, he proposes to quickly draft a research paper to organize and share all the details with the community. A few days later, the model, called Zephyr (a wind like Mistral), paper, and all details are shared with the world. Quickly other companies, everywhere in the world starts to use it. LlamaIndex, a famous data framework and community, shares how the model blew their expectations on real-life use-case benchmarks, while researchers and practitioners discuss the paper and work on the Hugging Face hub. All this happened in just a few weeks catalyzed by open access to knowledge, models, research, and datasets released all over the world (Europe, California, China) and by the idea that people can build upon one another work in AI to bring real-world value with efficient and open models. Stories like this are numerous everywhere around us and make me really proud of the AI community and see how we can build amazingly useful things together. [the video is just me reading this Friday post hahah]

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI model. It started a couple of months ago when a new team in the AI scene released their first model from their headquarters in Paris (France): Mistral 7B. Impressive model, small and very strong performances in the benchmarks, better than all previous models of this size. And open source! So you could build on top of it. Lewis in Bern (Switzerland) and Ed (in Lyon, in the South of France) both from the H4 team, a team of researchers in model fine-tuning and alignment were talking about it over a coffee, in one of these gatherings that often happen at Hugging Face to break the distance between people (literal distance as HF is a remote company). What about fine-tuning it using this new DPO method that a research team from Stanford in California just posted on Arxiv, says one? Hey, that’s a great idea, replies the other. We've just build a great code base (with Nathan, Nazneen, Costa, Younes and all the H4 team and TRL community) let's use it! The next day they start diving in the datasets openly shared on the HF hub and stumble upon two interesting large and good quality fine-tuning datasets recently open-sourced by OpenBMB, a Chinese team from Tsinghua: UltraFeedback and UltraChat. A few rounds of training experiments confirm the intuition, the resulting model is super strong, by far the strongest they have ever seen in their benchmarks from Berkeley and Stanford (LMSYS and Alpaca). Join Clementine, the big boss of the open evaluation leaderboard. Her deep dive into the model capabilities confirms the results: impressive performance. But the H4 team also hosts a famous faculty member, Pr. Sasha Rush, Associate Professor at Cornell University in his daytime, hacker at HF in his nighttime. Joining the conversation, he proposes to quickly draft a research paper to organize and share all the details with the community. A few days later, the model, called Zephyr (a wind like Mistral), paper, and all details are shared with the world. Quickly other companies, everywhere in the world starts to use it. LlamaIndex, a famous data framework and community, shares how the model blew their expectations on real-life use-case benchmarks, while researchers and practitioners discuss the paper and work on the Hugging Face hub. All this happened in just a few weeks catalyzed by open access to knowledge, models, research, and datasets released all over the world (Europe, California, China) and by the idea that people can build upon one another work in AI to bring real-world value with efficient and open models. Stories like this are numerous everywhere around us and make me really proud of the AI community and see how we can build amazingly useful things together. [the video is just me reading this Friday post hahah]

Thomas Wolf

169,127 Aufrufe • vor 2 Jahren

Introducing SDXL Turbo: A real-time text-to-image generation model. SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one. The code, research paper, and weights for non-commercial use are now available on our website. You can test SDXL Turbo on Stability AI’s image editing platform Clipdrop, with a beta demonstration of the real-time text-to-image generation capabilities. Learn more:

Introducing SDXL Turbo: A real-time text-to-image generation model. SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one. The code, research paper, and weights for non-commercial use are now available on our website. You can test SDXL Turbo on Stability AI’s image editing platform Clipdrop, with a beta demonstration of the real-time text-to-image generation capabilities. Learn more:

Stability AI

976,344 Aufrufe • vor 2 Jahren

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

AI at Meta

129,166 Aufrufe • vor 1 Jahr

Time for a new Qubic #Science Demo to understand better some of the many parameters we are recreating at the #Neuraxon BioInspired #TrueAI Model, that will run in the Qubic network with #aigarth Today a deep dive with the NeuroModulators, a fundamental piece of biochemistry that our brains need, and the #NeuraxonMoodMixer will help grasp the key concepts. You can play and learn with the demo here: Meanwhile we keep building the Science: And improving the Research Outcomes, all #OpenScience More updates on research comming soon, and more demos for other concepts the underlaying AI model uses.

Time for a new Qubic #Science Demo to understand better some of the many parameters we are recreating at the #Neuraxon BioInspired #TrueAI Model, that will run in the Qubic network with #aigarth Today a deep dive with the NeuroModulators, a fundamental piece of biochemistry that our brains need, and the #NeuraxonMoodMixer will help grasp the key concepts. You can play and learn with the demo here: Meanwhile we keep building the Science: And improving the Research Outcomes, all #OpenScience More updates on research comming soon, and more demos for other concepts the underlaying AI model uses.

David Vivancos - e/acc

26,486 Aufrufe • vor 6 Monaten

Wrapping up the year and coinciding with #NeurIPS2024, today at Meta FAIR we’re releasing a collection of nine new open source AI research artifacts across our work in developing agents, robustness & safety and new architectures. More in the video from Joelle Pineau. All of this work is part of FAIR’s continued work towards the goal of achieving advanced machine intelligence A few highlights from what we’re releasing today: • Meta Motivo: A first-of-its-kind behavioral foundation model that controls the movements of a virtual embodied humanoid agent to perform complex tasks. • Meta Video Seal: a state-of-the art comprehensive framework for neural video watermarking. • Meta Explore Theory-of-Mind: A program-guided adversarial data generation for theory of mind reasoning. • Meta Large Concept Models: A fundamentally different training paradigm for language modeling that decouples reasoning from language representation. And much more! We’re excited to share this work with the research community and look forward to seeing how it inspires new innovation across the field. Details and access to everything released by FAIR today ➡️

Wrapping up the year and coinciding with #NeurIPS2024, today at Meta FAIR we’re releasing a collection of nine new open source AI research artifacts across our work in developing agents, robustness & safety and new architectures. More in the video from Joelle Pineau. All of this work is part of FAIR’s continued work towards the goal of achieving advanced machine intelligence A few highlights from what we’re releasing today: • Meta Motivo: A first-of-its-kind behavioral foundation model that controls the movements of a virtual embodied humanoid agent to perform complex tasks. • Meta Video Seal: a state-of-the art comprehensive framework for neural video watermarking. • Meta Explore Theory-of-Mind: A program-guided adversarial data generation for theory of mind reasoning. • Meta Large Concept Models: A fundamentally different training paradigm for language modeling that decouples reasoning from language representation. And much more! We’re excited to share this work with the research community and look forward to seeing how it inspires new innovation across the field. Details and access to everything released by FAIR today ➡️

AI at Meta

156,123 Aufrufe • vor 1 Jahr

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and help advance AI in a responsible way. More in the video from Joelle Pineau. What we’re releasing: 🦎 Meta Chameleon 7B & 34B language models that support mixed-modal input and text-only outputs. 🪙 Meta Multi-Token Prediction Pretrained Language Models for code completion using Multi-Token Prediction. 🎼 Meta JASCO Generative text-to-music models capable of accepting various conditioning inputs for greater controllability. Paper available today with a pretrained model coming soon. 🗣️ Meta AudioSeal An audio watermarking model that we believe is the first designed specifically for the localized detection of AI-generated speech, available under a commercial license. 📝 Additional RAI artifacts Including research, data and code to measure and improve the representation of geographical and cultural preferences and diversity in AI systems. We believe that access to state-of-the-art AI creates opportunities for everyone – not just a small handful of Big Tech companies. We’re excited to share this work and to see how the community learns, iterates and builds using this technology. Details and access to everything released by FAIR today ➡️

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and help advance AI in a responsible way. More in the video from Joelle Pineau. What we’re releasing: 🦎 Meta Chameleon 7B & 34B language models that support mixed-modal input and text-only outputs. 🪙 Meta Multi-Token Prediction Pretrained Language Models for code completion using Multi-Token Prediction. 🎼 Meta JASCO Generative text-to-music models capable of accepting various conditioning inputs for greater controllability. Paper available today with a pretrained model coming soon. 🗣️ Meta AudioSeal An audio watermarking model that we believe is the first designed specifically for the localized detection of AI-generated speech, available under a commercial license. 📝 Additional RAI artifacts Including research, data and code to measure and improve the representation of geographical and cultural preferences and diversity in AI systems. We believe that access to state-of-the-art AI creates opportunities for everyone – not just a small handful of Big Tech companies. We’re excited to share this work and to see how the community learns, iterates and builds using this technology. Details and access to everything released by FAIR today ➡️

AI at Meta

380,751 Aufrufe • vor 2 Jahren

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️

AI at Meta

150,222 Aufrufe • vor 1 Jahr

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

Xenova

64,560 Aufrufe • vor 1 Jahr

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

AI at Meta

2,264,759 Aufrufe • vor 1 Jahr

A breakthrough in real-time video generation. As a research preview developed with NVIDIA and shared at NVIDIAGTC this week, we trained a new real-time video model running on Vera Rubin. HD videos generate instantly, with time-to-first-frame under 100ms. Unlocking an entirely new creative paradigm and bolstering the foundations of our General World Model, GWM-1. Real-time generation opens a fundamentally different design space for video models and world simulation. We're investing in co-designing our models alongside advances in hardware to keep pushing this frontier.

A breakthrough in real-time video generation. As a research preview developed with NVIDIA and shared at NVIDIAGTC this week, we trained a new real-time video model running on Vera Rubin. HD videos generate instantly, with time-to-first-frame under 100ms. Unlocking an entirely new creative paradigm and bolstering the foundations of our General World Model, GWM-1. Real-time generation opens a fundamentally different design space for video models and world simulation. We're investing in co-designing our models alongside advances in hardware to keep pushing this frontier.

Runway

1,162,438 Aufrufe • vor 4 Monaten

Cosmos Policy just dropped for robotics. 🤖 Cutting edge research is turning a world foundation model into a unified robot brain that can see, predict, and act—no extra action heads, no complicated control stack. Read our blog on Hugging Face ➡️ Want to get hands-on with Cosmos (Reason, Predict, Policy, Cookbook)? Join the Cosmos Cookoff, sponsored by Nebius and Milestone Systems ➡️

Cosmos Policy just dropped for robotics. 🤖 Cutting edge research is turning a world foundation model into a unified robot brain that can see, predict, and act—no extra action heads, no complicated control stack. Read our blog on Hugging Face ➡️ Want to get hands-on with Cosmos (Reason, Predict, Policy, Cookbook)? Join the Cosmos Cookoff, sponsored by Nebius and Milestone Systems ➡️

NVIDIA Robotics

14,344 Aufrufe • vor 5 Monaten