15/05/2026
Introducing ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
How can humanoid robots learn rich interactions with the world — without relying on massive task-specific robot datasets?
ExoActor explores a new direction:
using third-person video generation as a unified interface for humanoid control.
Given a task instruction and scene context, ExoActor generates plausible interaction videos that implicitly model:
→ robot behavior
→ object interaction
→ environmental dynamics
→ task intent
These generated videos are then transformed into executable humanoid motions through motion estimation and a general whole-body controller.
Instead of directly supervising robot actions, ExoActor leverages the generative prior of large-scale video models to model interaction-rich behaviors.
The result:
generalizable humanoid behaviors in unseen scenarios — without additional real-world data collection.
ExoActor explores a scalable path toward interaction-centric humanoid intelligence, where video generation becomes part of the control pipeline itself.