HANDOFF

Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang1, Junheng Li1, Nehar Poddar2, Yiling Hou1, Gio Huh1,
Robert Griffin2, Georgia Gkioxari1, Aaron D. Ames1

Abstract

The interface between task planning and whole-body control is what makes humanoids deployable — yet existing controllers demand dense kinematic references that planners struggle to produce. HANDOFF is a single whole-body controller built around a compact, explicit 10-D task-space command, distilled via context-conditioned multi-teacher KL into a mixture-of-experts student from three complementary specialists: whole-body motion tracking, locomotion, and fall-recovery. On the Unitree G1 it matches state-of-the-art velocity tracking with one of the largest robust manipulation workspaces, and is driven end-to-end by a VLM agentic planner with no task-specific data or controller fine-tuning.

A compact, planner-friendly command space

Instead of a dense full-body kinematic stream, HANDOFF takes one small, explicit 10-D command:

$$ c_t = \bigl[\,v_x,\ v_y,\ \omega_z,\ z,\ p_L^P,\ p_R^P\,\bigr] $$

planar base velocity $(v_x, v_y, \omega_z)$, root height $z$, and bilateral pelvis-frame wrist targets $p_L^P, p_R^P$. Each slot matches a planner family, and the same vector composes into coordinated whole-body behavior — e.g. low $z$ with forward wrist targets yields a squat-and-reach.

Intuitive
A human, a geometric planner, or a VLM can each produce a valid command.
General
One interface serves different loco-manipulation tasks.
Modular
Planner, perception, and controller decouple and swap independently.
Whole-body expressive
Compact commands still elicit coordinated full-body behavior.

Whole-body motion (29-D): A dense reference the controller must track.

Our 10-D command: The same behavior from a compact, planner-friendly command.

Distilling complementary teachers

System overview: three teachers distilled into a mixture-of-experts student under context-based action-sliced KL.

No single regime gives velocity tracking, whole-body manipulation, and fall recovery at once — so three specialists are distilled into one mixture-of-experts student under context-based action-sliced KL:

  • WBC — 29-DoF motion imitation on retargeted human clips with center-of-pressure correction; arm- and body-slice KL.
  • Locomotion — 15-DoF flat-terrain velocity control under arm perturbations; body-slice KL.
  • Fall-recovery — 29-DoF recovery prior on masked fall-and-recovery clips; full-body KL.

A shared encoder and softmax gate blend the three experts into one 29-DoF action, smoothly shifting weight toward locomotion as commanded speed rises — no hard switching between regimes.

All fuse into one policy under the single 10-D interface, with no runtime switching; a new specialist plugs in as one teacher head and one context channel.

An agentic planner

Agentic deployment pipeline: natural-language instruction decomposed into atomic tasks, a VLM emits pelvis-frame waypoints, and a skill selector produces the 10-D command stream.

A natural-language instruction is split into atomic tasks; a VLM emits pelvis-frame waypoints from RGB-D, a tracker yields $(v_x, v_y, \omega_z)$, and a skill selector sets $z$ and wrist targets. The 10-D stream drives the controller at 50 Hz on hardware — anything that emits it works, classical, agentic, or a VLA.

Filtering teacher data for feasibility
A CBF filter keeps each teacher reference feasible — the center of pressure stays in support (filtered, left) instead of leaving the foot (unfiltered, right), so the student never learns infeasible motion.

Task rollouts

One controller, one 10-D interface, many tasks — each driven from a natural-language instruction.

No controller-side change, data collection, or model fine-tuning is required between tasks.

BibTeX

@article{yang2026handoff,
  title   = {HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers},
  author  = {Yang, Lizhi and Li, Junheng and Poddar, Nehar and Hou, Yiling and Huh, Gio and Griffin, Robert and Gkioxari, Georgia and Ames, Aaron D.},
  journal = {arXiv preprint arXiv:2606.06493},
  year    = {2026},
  url     = {https://arxiv.org/abs/2606.06493}
}