
Zhiting Hu
@ZhitingHu • 4,428 subscribers
Assist. Prof. at UC San Diego; Artificial Intelligence, Machine Learning, Natural Language Processing
Videos

🔥Really excited to see the release of PAN world model, a project I had been working over the past years. PAN is a general world model capable of simulating physical, agentic, and nested worlds, synthesizing infinite interactive experiences for training AI agents. Building on top of pretrained LLMs and video diffusion models, PAN connects language, perception, action, and latent thoughts, for long-horizon simulation and reasoning. PAN shows overwhelming performance gains over JEPA-2, Cosmos-2, and other prior models. More in the thread👇 ... 1/
Zhiting Hu31,119 Aufrufe • vor 7 Monaten

Super excited to introduce Pandora, a generative video World Model interactively controllable by language. #Sora and #GPT4 are both powerful. How about fusing them in a single model? 💥 Pandora gives a preview:🔭 > Build a General World Model (GWM) super efficiently by integrating pretrained autoregressive LLM and diffusion Video Model, aligning them in the representation space. > Let the LLM control the VM on-the-fly. Instruction tuning maximizes the controllability. > Autoregressive LLM empowers VM to generate indefinitely long videos: Starting with a VM for 2-second videos, Pandora extends it for 8-second videos. Would #Sora+#GPT4 under Pandora produce hours-long videos? 📽️ > World Model is beyond just video generation. It’s sensory-level information processing + concept-level reasoning and reflection. Pandora bridges both, with the concept- / language-level backbone (LLM) managing & steering the sensory-level VM functionalities. 👁️🧠 Check out for a bunch of interesting results:
Zhiting Hu62,689 Aufrufe • vor 2 Jahren

A humanoid robot dancing with agility and flair💃 ... in a world _interactively_ simulated by world model Here’s the choreography we told the model to simulate, step by step: 💃Wave both arms and start jumping 👋 💃Dance dance dance‼️ 💃Stand still and put left arm behind back 💃Grasp a rose🌹behind and show the rose to the audience; raise arm high in the air 💃Bend body slightly and raise arm high in the air🦿 💃Stand straight and raise both arms above head 💃Bend body together with hands 💃Stand up straight again; wave right hand 💃Turn right and walk away from the camera; wave right hand🚶 💃Stop walking; look around 💃Make a heart shape with hands💕
Zhiting Hu14,063 Aufrufe • vor 1 Jahr

🚨Do frontier VLMs (o3, Gemini 2.5, Claude 3.5, Qwen…) actually learn an internal world model🌍? Surprisingly, the answer appears to be a hard NO—as revealed by our WM Atomic Benchmark⚛️. Even o3 struggles with the most basic, atomic-level questions: ❌Confuse triangles📐 with circles⭕️ ❌Believe 🟦blue objects move faster than 🟩green ones ❌Fail at compositional and transitive reasoning While humans perform nearly perfectly, these frontier models often score at chance level‼️ 🔎 More details in the thread below 👇
Zhiting Hu12,925 Aufrufe • vor 11 Monaten
Keine weiteren Inhalte verfügbar