Wildminder's banner
Wildminder's profile picture

Wildminder

@wildmindai10,137 subscribers

Physicist, Programmer, Designer

Shorts

Another cool stuff from NVIDIA. LocateAnything - high-speed visual search engine. You provide a text prompt and it instantly pinpoints that object's exact location in an image. - 10x speedup for dense object detection - Qwen2.5-3B + Moon-ViT - Fast/Slow/Hybrid modes - trained on 138M samples for UI, docs, generic grounding.

Another cool stuff from NVIDIA. LocateAnything - high-speed visual search engine. You provide a text prompt and it instantly pinpoints that object's exact location in an image. - 10x speedup for dense object detection - Qwen2.5-3B + Moon-ViT - Fast/Slow/Hybrid modes - trained on 138M samples for UI, docs, generic grounding.

50,143 次观看

NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA.

NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA.

296,543 次观看

CogOmniControl by Tencent. Reasoning-driven controllable video gen. CogVLM + CogOmniDiT to translate sparse storyboards/sketches into production-quality video. beats VINO, VACE-Wan2.1

CogOmniControl by Tencent. Reasoning-driven controllable video gen. CogVLM + CogOmniDiT to translate sparse storyboards/sketches into production-quality video. beats VINO, VACE-Wan2.1

23,795 次观看

LTX2.3 ReStyle LoRA Transfers simpler styles (flat 2D, cel-shaded, monochrome line art). struggles with complex styles (texture, intricate detail, strong material/lighting effects).

LTX2.3 ReStyle LoRA Transfers simpler styles (flat 2D, cel-shaded, monochrome line art). struggles with complex styles (texture, intricate detail, strong material/lighting effects).

24,442 次观看

LGTM from Apple: 4K feed-forward 3D Gaussian Splatting. instant 4K 3D scenes without massive GPUs.. - predicts a few lightweight 3D shapes, wraps them in ultra-high-res 2D textures. - low Memory usage You take two normal photos of a room. Instantly walk around it in flawless 3D.

LGTM from Apple: 4K feed-forward 3D Gaussian Splatting. instant 4K 3D scenes without massive GPUs.. - predicts a few lightweight 3D shapes, wraps them in ultra-high-res 2D textures. - low Memory usage You take two normal photos of a room. Instantly walk around it in flawless 3D.

46,278 次观看

Video diffusion models are just overqualified depth estimators! Deterministic single-pass depth estimation based on WanV2.1. - SOTA 5.5 AbsRel on ScanNet - data-efficient than baselines; - no temporal flicker + infinite-length estimation w/ zero scale drift.

Video diffusion models are just overqualified depth estimators! Deterministic single-pass depth estimation based on WanV2.1. - SOTA 5.5 AbsRel on ScanNet - data-efficient than baselines; - no temporal flicker + infinite-length estimation w/ zero scale drift.

49,209 次观看

LTX 2.3 Creative Upscale IC-LoRA. - Generative second-pass refiner for soft or low-resolution video; - enhances detail and clarity without standard upscaling; - output varies based on workflow/settings.

LTX 2.3 Creative Upscale IC-LoRA. - Generative second-pass refiner for soft or low-resolution video; - enhances detail and clarity without standard upscaling; - output varies based on workflow/settings.

16,526 次观看

3D modeling entirely replaced by stick figures. SK-Adapter brings skeleton-based structural control for native 3D. > Feed it a basic skeleton > Type what you want to see > Get a fully rendered 3D character in under 15 seconds > Already rigged and ready for animation > Zero Blender experience required Game devs are happy.

3D modeling entirely replaced by stick figures. SK-Adapter brings skeleton-based structural control for native 3D. > Feed it a basic skeleton > Type what you want to see > Get a fully rendered 3D character in under 15 seconds > Already rigged and ready for animation > Zero Blender experience required Game devs are happy.

26,931 次观看

Unsloth dropped new LTX-2.3 GGUFs. > Dev/distilled UD-Q2/Q5

Unsloth dropped new LTX-2.3 GGUFs. > Dev/distilled UD-Q2/Q5

24,502 次观看

ComfyUI-WanVideoWrapper now supports SteadyDancer: like WanAnimate - human image animation framework; produces high-fidelity, coherent motion

ComfyUI-WanVideoWrapper now supports SteadyDancer: like WanAnimate - human image animation framework; produces high-fidelity, coherent motion

42,246 次观看

Thanks to Kijai, One-to-All Animation has already been added to ComfyUI.

Thanks to Kijai, One-to-All Animation has already been added to ComfyUI.

33,914 次观看

LightVAE + ComfyUI node: High-performance video VAE; runs 2–3x faster using 50% less memory; LightTAE offers a 10+x speedup on just ~0.4GB VRAM

LightVAE + ComfyUI node: High-performance video VAE; runs 2–3x faster using 50% less memory; LightTAE offers a 10+x speedup on just ~0.4GB VRAM

38,092 次观看

As usual, Kijai has prepared the Wan-Move, and it is available in ComfyUI.

As usual, Kijai has prepared the Wan-Move, and it is available in ComfyUI.

25,558 次观看

Capybara? 14B model for T2V, T2I, TV2V, TI2I. - based on HunyuanVideo1.5; - byt5-small, Glyph-SDXL-v2, SigLIP; - 480p-1080p; 16.7GB model, 5GB VAE.. mostly for video editing.

Capybara? 14B model for T2V, T2I, TV2V, TI2I. - based on HunyuanVideo1.5; - byt5-small, Glyph-SDXL-v2, SigLIP; - 480p-1080p; 16.7GB model, 5GB VAE.. mostly for video editing.

16,853 次观看

AnyDepth: Lightweight zero-shot monocular depth estimation; surpasses DPT; - nicely preserves detail.

AnyDepth: Lightweight zero-shot monocular depth estimation; surpasses DPT; - nicely preserves detail.

18,152 次观看

Your LTX-2 performance boost has arrived. NVIDIA Studio Driver (591.74 January)- optimizations for LTX2 + support for NVFP4/NVFP8 in ComfyUI.

Your LTX-2 performance boost has arrived. NVIDIA Studio Driver (591.74 January)- optimizations for LTX2 + support for NVFP4/NVFP8 in ComfyUI.

13,803 次观看

Videos