Wildminder's banner

Wildminder

@wildmindai • 10,320 subscribers

Physicist, Programmer, Designer

Shorts

NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA.

NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA.

296,789 次观看

Klein 9B Sun Direction LoRA Nice LoRA that lets you move the sun to a specific spot in the sky just by pointing at a reference ball.

Klein 9B Sun Direction LoRA Nice LoRA that lets you move the sun to a specific spot in the sky just by pointing at a reference ball.

11,441 次观看

LGTM from Apple: 4K feed-forward 3D Gaussian Splatting. instant 4K 3D scenes without massive GPUs.. - predicts a few lightweight 3D shapes, wraps them in ultra-high-res 2D textures. - low Memory usage You take two normal photos of a room. Instantly walk around it in flawless 3D.

LGTM from Apple: 4K feed-forward 3D Gaussian Splatting. instant 4K 3D scenes without massive GPUs.. - predicts a few lightweight 3D shapes, wraps them in ultra-high-res 2D textures. - low Memory usage You take two normal photos of a room. Instantly walk around it in flawless 3D.

46,278 次观看

Video diffusion models are just overqualified depth estimators! Deterministic single-pass depth estimation based on WanV2.1. - SOTA 5.5 AbsRel on ScanNet - data-efficient than baselines; - no temporal flicker + infinite-length estimation w/ zero scale drift.

Video diffusion models are just overqualified depth estimators! Deterministic single-pass depth estimation based on WanV2.1. - SOTA 5.5 AbsRel on ScanNet - data-efficient than baselines; - no temporal flicker + infinite-length estimation w/ zero scale drift.

49,391 次观看

is it time to delete DaVinci Resolve?

is it time to delete DaVinci Resolve?

35,069 次观看

CogOmniControl by Tencent. Reasoning-driven controllable video gen. CogVLM + CogOmniDiT to translate sparse storyboards/sketches into production-quality video. beats VINO, VACE-Wan2.1

CogOmniControl by Tencent. Reasoning-driven controllable video gen. CogVLM + CogOmniDiT to translate sparse storyboards/sketches into production-quality video. beats VINO, VACE-Wan2.1

24,123 次观看

LTX2.3 ReStyle LoRA Transfers simpler styles (flat 2D, cel-shaded, monochrome line art). struggles with complex styles (texture, intricate detail, strong material/lighting effects).

LTX2.3 ReStyle LoRA Transfers simpler styles (flat 2D, cel-shaded, monochrome line art). struggles with complex styles (texture, intricate detail, strong material/lighting effects).

24,442 次观看

3D modeling entirely replaced by stick figures. SK-Adapter brings skeleton-based structural control for native 3D. > Feed it a basic skeleton > Type what you want to see > Get a fully rendered 3D character in under 15 seconds > Already rigged and ready for animation > Zero Blender experience required Game devs are happy.

3D modeling entirely replaced by stick figures. SK-Adapter brings skeleton-based structural control for native 3D. > Feed it a basic skeleton > Type what you want to see > Get a fully rendered 3D character in under 15 seconds > Already rigged and ready for animation > Zero Blender experience required Game devs are happy.

26,931 次观看

LTX 2.3 Creative Upscale IC-LoRA. - Generative second-pass refiner for soft or low-resolution video; - enhances detail and clarity without standard upscaling; - output varies based on workflow/settings.

LTX 2.3 Creative Upscale IC-LoRA. - Generative second-pass refiner for soft or low-resolution video; - enhances detail and clarity without standard upscaling; - output varies based on workflow/settings.

17,151 次观看

ComfyUI-WanVideoWrapper now supports SteadyDancer: like WanAnimate - human image animation framework; produces high-fidelity, coherent motion

ComfyUI-WanVideoWrapper now supports SteadyDancer: like WanAnimate - human image animation framework; produces high-fidelity, coherent motion

42,246 次观看

Unsloth dropped new LTX-2.3 GGUFs. > Dev/distilled UD-Q2/Q5

Unsloth dropped new LTX-2.3 GGUFs. > Dev/distilled UD-Q2/Q5

24,502 次观看

Thanks to Kijai, One-to-All Animation has already been added to ComfyUI.

Thanks to Kijai, One-to-All Animation has already been added to ComfyUI.

33,914 次观看

LightVAE + ComfyUI node: High-performance video VAE; runs 2–3x faster using 50% less memory; LightTAE offers a 10+x speedup on just ~0.4GB VRAM

LightVAE + ComfyUI node: High-performance video VAE; runs 2–3x faster using 50% less memory; LightTAE offers a 10+x speedup on just ~0.4GB VRAM

38,092 次观看

As usual, Kijai has prepared the Wan-Move, and it is available in ComfyUI.

As usual, Kijai has prepared the Wan-Move, and it is available in ComfyUI.

25,589 次观看

Capybara? 14B model for T2V, T2I, TV2V, TI2I. - based on HunyuanVideo1.5; - byt5-small, Glyph-SDXL-v2, SigLIP; - 480p-1080p; 16.7GB model, 5GB VAE.. mostly for video editing.

Capybara? 14B model for T2V, T2I, TV2V, TI2I. - based on HunyuanVideo1.5; - byt5-small, Glyph-SDXL-v2, SigLIP; - 480p-1080p; 16.7GB model, 5GB VAE.. mostly for video editing.

16,891 次观看

AnyDepth: Lightweight zero-shot monocular depth estimation; surpasses DPT; - nicely preserves detail.

AnyDepth: Lightweight zero-shot monocular depth estimation; surpasses DPT; - nicely preserves detail.

18,610 次观看

Your LTX-2 performance boost has arrived. NVIDIA Studio Driver (591.74 January)- optimizations for LTX2 + support for NVFP4/NVFP8 in ComfyUI.

Your LTX-2 performance boost has arrived. NVIDIA Studio Driver (591.74 January)- optimizations for LTX2 + support for NVFP4/NVFP8 in ComfyUI.

13,803 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Netflix dropped some useful stuff. VOID -video object and interaction deletion. - removes objects while realistically simulating physical consequences; - beats Runway/ProPainter; - CogVideoX-5B + SAM 2; looks good, no smudges/artifacts

Netflix dropped some useful stuff. VOID -video object and interaction deletion. - removes objects while realistically simulating physical consequences; - beats Runway/ProPainter; - CogVideoX-5B + SAM 2; looks good, no smudges/artifacts

365,564 次观看 • 3 个月前

llmfit. Useful tool that probes hardware and tells you exactly which LLMs will actually run. - handles MoE expert offloading, picks the best quantization for your RAM, estimates tokens/sec before you even pull the weights. Essential for local dev.

llmfit. Useful tool that probes hardware and tells you exactly which LLMs will actually run. - handles MoE expert offloading, picks the best quantization for your RAM, estimates tokens/sec before you even pull the weights. Essential for local dev.

247,800 次观看 • 4 个月前

LUNA: Universal 3D human animation via LBS-free mapping - driving from images, sketches, keypoints - 3DGS + Sapiens for high-fidelity cross-identity animation - low temporal jitter - captures loose clothing dynamics

LUNA: Universal 3D human animation via LBS-free mapping - driving from images, sketches, keypoints - 3DGS + Sapiens for high-fidelity cross-identity animation - low temporal jitter - captures loose clothing dynamics

25,026 次观看 • 19 天前

A totally new level of pose control in ComfyUI - VNCCS. - full 3D Pose Studio for character posing & lighting, - multi-pose, body generator, pose gallery; - vision-guided QWEN Detailer + camera controls.

A totally new level of pose control in ComfyUI - VNCCS. - full 3D Pose Studio for character posing & lighting, - multi-pose, body generator, pose gallery; - vision-guided QWEN Detailer + camera controls.

164,255 次观看 • 5 个月前

HOT! SCAIL-2 just dropped! End-to-end character animation via in-context conditioning - no skeleton middleman, it copies pixels directly - no glitches, no messy hands - 512p/704p - unified architecture for character replacement and multi-character tasks. - zero-shot generalization to animal-driven and mesh-based control.

HOT! SCAIL-2 just dropped! End-to-end character animation via in-context conditioning - no skeleton middleman, it copies pixels directly - no glitches, no messy hands - 512p/704p - unified architecture for character replacement and multi-character tasks. - zero-shot generalization to animal-driven and mesh-based control.

45,052 次观看 • 1 个月前

and impressive video model.. LingBot-Video. Sparse MoE for physically consistent video gen. prioritizes physical realism and action-consequence logic -DiT + Qwen3-VL-4B conditioning + Wan2.1-VAE. - 120B params - spatiotemporal geometry stability - T2I, T2V, TI2V - 1080p

and impressive video model.. LingBot-Video. Sparse MoE for physically consistent video gen. prioritizes physical realism and action-consequence logic -DiT + Qwen3-VL-4B conditioning + Wan2.1-VAE. - 120B params - spatiotemporal geometry stability - T2I, T2V, TI2V - 1080p

12,211 次观看 • 11 天前

SUPIR upscaler is outdated. ASASR- turns blurry, low-quality photos into sharp, high-res images. Prevents the fake hallucinated details. - improves OCR - high segmentation accuracy - based on FLUX.1 dev this looks sweet

SUPIR upscaler is outdated. ASASR- turns blurry, low-quality photos into sharp, high-res images. Prevents the fake hallucinated details. - improves OCR - high segmentation accuracy - based on FLUX.1 dev this looks sweet

41,951 次观看 • 1 个月前

LTX2.3 Obscura Remove LoRA. Some kind of a "digital X-ray" for video cleanup. - Strips foreground junk. Peel back smoke, haze, or clutter - Reconstructs hidden scenes. Describe the background to fill holes.

LTX2.3 Obscura Remove LoRA. Some kind of a "digital X-ray" for video cleanup. - Strips foreground junk. Peel back smoke, haze, or clutter - Reconstructs hidden scenes. Describe the background to fill holes.

56,257 次观看 • 2 个月前

Sweet! Kijai has added standalone ComfyUI nodes for SCAIL pose processing.

Sweet! Kijai has added standalone ComfyUI nodes for SCAIL pose processing.

119,613 次观看 • 7 个月前

Soprano: An instant, ultra-lightweight TTS model for realistic speech; generates 10 hours of 32kHz audio in <20s; streams with <15ms latency using just 80M params & <1GB VRAM. Has some limitations and drawbacks.

Soprano: An instant, ultra-lightweight TTS model for realistic speech; generates 10 hours of 32kHz audio in <20s; streams with <15ms latency using just 80M params & <1GB VRAM. Has some limitations and drawbacks.

111,542 次观看 • 6 个月前

MotionBricks by NVIDIA. the all-in-one brain for character movement. No manual rigging/animation, you just let this AI handle the physics and style in real-time. - 15k FPS on RTX 5090! You could run an entire city of unique NPCs without breaking a sweat. - Smart Primitives. The AI fills in the natural-looking motion automatically. - Zero-Shot Skills. - Glitch-Free. Keeps movements stable.

MotionBricks by NVIDIA. the all-in-one brain for character movement. No manual rigging/animation, you just let this AI handle the physics and style in real-time. - 15k FPS on RTX 5090! You could run an entire city of unique NPCs without breaking a sweat. - Smart Primitives. The AI fills in the natural-looking motion automatically. - Zero-Shot Skills. - Glitch-Free. Keeps movements stable.

23,296 次观看 • 1 个月前

CS:GO + Adobe + Wan2.1 = WorldCam. Interactive autoregressive 3D gaming worlds. > AI now generates playable 3D worlds live > You move the mouse, the AI builds the map instantly > Turn around, and it remembers exactly what was there No traditional game engines. Just neural networks hallucinating reality.

CS:GO + Adobe + Wan2.1 = WorldCam. Interactive autoregressive 3D gaming worlds. > AI now generates playable 3D worlds live > You move the mouse, the AI builds the map instantly > Turn around, and it remembers exactly what was there No traditional game engines. Just neural networks hallucinating reality.

67,807 次观看 • 4 个月前

LTX-2.3 Foley LoRA - It adds Foley that actually matches the action on screen. - stops the model from defaulting to background scores when you just want real-world noise. - makes generations feel like actual raw footage

LTX-2.3 Foley LoRA - It adds Foley that actually matches the action on screen. - stops the model from defaulting to background scores when you just want real-world noise. - makes generations feel like actual raw footage

13,864 次观看 • 20 天前

HappyHorse-1.0 vs All. - 720p, 24fps - crisp - pleasant colors, no AI-ish vibrancy - nice textures, good skin - dynamic - good prompt following If this model runs on consumer GPUs - it'll be an absolute nuke

HappyHorse-1.0 vs All. - 720p, 24fps - crisp - pleasant colors, no AI-ish vibrancy - nice textures, good skin - dynamic - good prompt following If this model runs on consumer GPUs - it'll be an absolute nuke

49,905 次观看 • 3 个月前

No way. Adobe's new node editor is a straight-up ComfyUI clone! The killer feature is encapsulating an entire graph into a simple tool that can be dropped into Photoshop.

No way. Adobe's new node editor is a straight-up ComfyUI clone! The killer feature is encapsulating an entire graph into a simple tool that can be dropped into Photoshop.

110,443 次观看 • 8 个月前

Klein 9B Sun Direction LoRA Nice LoRA that lets you move the sun to a specific spot in the sky just by pointing at a reference ball.

Klein 9B Sun Direction LoRA Nice LoRA that lets you move the sun to a specific spot in the sky just by pointing at a reference ball.

11,441 次观看 • 17 天前

Thinking in Boxes- 3D editing for images with simple box interface. - FLUX-Kontext + LoRA - precise translation, rotation, scaling - good object preservation - geometric fidelity ideal for virtual staging and e-commerce.

Thinking in Boxes- 3D editing for images with simple box interface. - FLUX-Kontext + LoRA - precise translation, rotation, scaling - good object preservation - geometric fidelity ideal for virtual staging and e-commerce.

15,451 次观看 • 28 天前

FLUX.2 Klein in pure pixel space! No VAE. AsymFlow - hyper-realistic images by working directly in pixel space rather than using compressed latent representations. - sharper textures, superior visual fidelity - 40% faster - low-rank noise parameterization to solve high-dimensional bottlenecks ComfyUI support incoming

FLUX.2 Klein in pure pixel space! No VAE. AsymFlow - hyper-realistic images by working directly in pixel space rather than using compressed latent representations. - sharper textures, superior visual fidelity - 40% faster - low-rank noise parameterization to solve high-dimensional bottlenecks ComfyUI support incoming

27,210 次观看 • 2 个月前

Awesome. NVIDIA dropped PiD - fast high-res latent decoding via pixel diffusion! - replace VAE - 4/8x upsampling - 2k decoding in <1s on RTX 5090 - works with FLUX.1/SD3/Z - rapid generation previews sharper details, much lower hardware lag compared to standard methods.

Awesome. NVIDIA dropped PiD - fast high-res latent decoding via pixel diffusion! - replace VAE - 4/8x upsampling - 2k decoding in <1s on RTX 5090 - works with FLUX.1/SD3/Z - rapid generation previews sharper details, much lower hardware lag compared to standard methods.

23,632 次观看 • 1 个月前

InfiniDepth: Fine-grained depth estimation at any resolution; models depth as a neural implicit field instead of a grid; - tops DepthAnything; - recovers sharp details w/ a lightweight 15M decoder.

InfiniDepth: Fine-grained depth estimation at any resolution; models depth as a neural implicit field instead of a grid; - tops DepthAnything; - recovers sharp details w/ a lightweight 15M decoder.

59,172 次观看 • 6 个月前