Skip to content

a2c2 — chunk correction

reflex serve --a2c2-checkpoint <path> plugs an A2C2 residual MLP onto the action chunk after the policy returns it. Auto-skips when latency is low or success is high — so you pay nothing on the easy cases and only get correction when it’s earning its keep.

Paper: arXiv:2509.23224.

Terminal window
# 1. Train on LeRobot LIBERO data (Modal A100, ~$3-5)
modal run scripts/train_a2c2_lerobot.py --output a2c2_lerobot_v1.npz
# 2. Serve with the checkpoint loaded
reflex serve ./my-export/ --a2c2-checkpoint ./a2c2_lerobot_v1.npz

When --a2c2-checkpoint is unset, the hook isn’t loaded; /act behavior is unchanged from baseline. Backward-compatible.

For each chunk of N actions, the hook computes a per-step correction:

correction[i] = A2C2Head(actions[i], observation, chunk_position=i, latency_estimate_ms)
actuated[i] = actions[i] + correction[i]

The correction is bounded by the head’s training distribution; it doesn’t replace the policy’s action, it nudges it. Per the paper, this fixes RTC overshoot/undershoot on high-latency Jetson without retraining the base VLA.

The hook tracks two rolling windows from real /act traffic:

TrackerWindowDefault thresholdSkip when
latency_p95_mslast 100 acts< 40 msLow latency — correction not needed
success_ratelast 50 acts> 90%Policy is doing fine on its own
cold_startfirst 5 actsInsufficient signal

A2C2 only fires when both signals favor it: high latency AND low success. Matches the paper’s guidance that A2C2 has positive marginal value only at high-latency low-success regimes.

{
"actions": [...],
"latency_ms": 45.2,
"a2c2_applied": true,
"a2c2_reason": "applied",
"a2c2_correction_magnitude": 0.073
}

a2c2_reason is a bounded enum: applied | cold_start | low_latency | high_success. a2c2_correction_magnitude is the L2 norm of the residual across the chunk. Use these for observability dashboards.

When the hook is not loaded, the a2c2_* fields are absent.

MetricTypeLabelsFires when
reflex_a2c2_applied_totalCounterreason="applied"Forward pass ran + correction added
reflex_a2c2_skipped_totalCounterreason in {cold_start, low_latency, high_success}A2C2 skipped

Track applied / (applied + skipped). If skipped > 95% sustained, the head isn’t earning its keep on this hardware — consider disabling.

The default A2C2Config:

  • Action dim: 7 (Franka); customizable via training-time config
  • Observation dim: 256
  • Chunk size: 50
  • Hidden dim: 128, 3 hidden layers, GELU activation
  • Positional encoding dim: 32 (sinusoidal, sqrt-scale frequencies)
  • Total params: ~72K at FP32, ~280 KB
  • Paper-scaled (hidden_dim=64): ~25K params, ~96 KB — under the 150 KB ceiling

NumPy-only forward pass on the hot path. No torch import in serve runtime; CPU forward is sub-5ms; Orin Nano projection is 1.5–3 ms via SIMD speedup.

SymptomLikely causeFix
applied counter is always 0Latency p95 stays under 40ms (good!) OR success rate stays high (also good)Nothing to fix — A2C2 is correctly recognizing it isn’t needed
applied counter dominatesLatency consistently high + success lowEither real distribution shift (A2C2 is earning its keep), or thresholds are too aggressive
Correction magnitude > 1.0 sustainedHead trained on a very different distributionRetrain on traces from THIS deployment via scripts/train_a2c2_customer.py
Bad checkpoint path → server crashesMisconfigured --a2c2-checkpointServer handles gracefully — logs ERROR and continues with a2c2_hook=None
  • arXiv 2509.23224 (A2C2 paper)
  • ADR: 2026-04-29-a2c2-phase1-libero-smoke-modal.md (Phase 1 fix validated, N=10 LIBERO at --inject-latency-ms 100 = 8/10, matches OFF baseline exactly)