Video yükleniyor...
Video Yüklenemedi
Meta presents Sapiens Foundation for Human Vision Models discuss: We present Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual... show more
151,511 görüntüleme • 1 yıl önce •via X (Twitter)
10 Yorum

The paper presents Sapiens, a family of vision transformer models trained on a large dataset of human images. The goal is to develop models that can generalize well, be applicable to a wide range of tasks, and produce high-quality outputs. The results demonstrate the benefit of pretraining on a large, curated dataset of human images. The models are able to generalize well to various scenarios, including multi-person scenes, egocentric views, and challenging poses. The high-resolution (1024x1024) pretraining and the detailed annotation of the finetuning datasets also contribute to the models' strong performance. full paper:

normal map is mind blowing what the tech

"Sapiens: Foundation for Human Vision Models" PAPER SUMMARY

Nice one but these links are not working (will they open it soon?)

No code :(

Is it real-time or post-processing?

Want

can it spot a soldier and identify the head?

Why non commercial license @Meta 😵💫

Hope there will be some distilled model for realtime inference on mobile

