Three complementary, self-contained stacks — a world-model-driven vision-language navigation model, semantics-rich language-driven indoor & outdoor navigation, and high-speed embodied marathon autonomy in the open.
Natural language drives navigation directly — no predefined path or map.
A keyframe history of past observations preserves critical decision points over long horizons.
ODE integration yields smooth, continuous trajectories beyond discrete action spaces.
The world model looks ahead implicitly in latent space for anticipatory, robust decisions.
One model jointly optimizes four complementary tasks with shared knowledge.
Three navigation stacks — a mapless VLN model, mapped semantic navigation, and outdoor long-range autonomy — running on real robots.
Go straight, turn right, turn left at the yellow wall painting, and stop beside the red sofa.
A hierarchical scene graph lets you name any place — the robot localizes it and leads the way.
A humanoid runs a full 21 km urban marathon on its own — start to finish, zero collisions.