
AK
@_akhaliq • 449,077 subscribers
AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5XOCi
Shorts
Videos

grok 3 build a endless runner style game where a hugging face collects GPUs
AK8,664,253 views • 1 year ago
0:22
Sensitive content
This media may contain sensitive content.

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold paper page:
AK3,396,886 views • 3 years ago

SpatialLM just dropped on Hugging Face Large Language Model for Spatial Understanding
AK673,358 views • 1 year ago

GameWorld Towards Standardized and Verifiable Evaluation of Multimodal Game Agents paper:
AK63,685 views • 1 month ago

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.
AK810,024 views • 2 years ago

Training AI to Play Pokemon with Reinforcement Learning by Peter Whidden github: youtube:
AK837,668 views • 2 years ago

Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
AK684,196 views • 2 years ago