a2c2 — chunk correction
reflex serve --a2c2-checkpoint <path> plugs an A2C2 residual MLP onto the action chunk after the policy returns it. Auto-skips when latency is low or success is high — so you pay nothing on the easy cases and only get correction when it’s earning its keep.
Paper: arXiv:2509.23224.
Quick start
Section titled “Quick start”# 1. Train on LeRobot LIBERO data (Modal A100, ~$3-5)modal run scripts/train_a2c2_lerobot.py --output a2c2_lerobot_v1.npz
# 2. Serve with the checkpoint loadedreflex serve ./my-export/ --a2c2-checkpoint ./a2c2_lerobot_v1.npzWhen --a2c2-checkpoint is unset, the hook isn’t loaded; /act behavior is unchanged from baseline. Backward-compatible.
What it does
Section titled “What it does”For each chunk of N actions, the hook computes a per-step correction:
correction[i] = A2C2Head(actions[i], observation, chunk_position=i, latency_estimate_ms)actuated[i] = actions[i] + correction[i]The correction is bounded by the head’s training distribution; it doesn’t replace the policy’s action, it nudges it. Per the paper, this fixes RTC overshoot/undershoot on high-latency Jetson without retraining the base VLA.
Auto-skip semantics
Section titled “Auto-skip semantics”The hook tracks two rolling windows from real /act traffic:
| Tracker | Window | Default threshold | Skip when |
|---|---|---|---|
latency_p95_ms | last 100 acts | < 40 ms | Low latency — correction not needed |
success_rate | last 50 acts | > 90% | Policy is doing fine on its own |
cold_start | first 5 acts | — | Insufficient signal |
A2C2 only fires when both signals favor it: high latency AND low success. Matches the paper’s guidance that A2C2 has positive marginal value only at high-latency low-success regimes.
Response telemetry
Section titled “Response telemetry”{ "actions": [...], "latency_ms": 45.2, "a2c2_applied": true, "a2c2_reason": "applied", "a2c2_correction_magnitude": 0.073}a2c2_reason is a bounded enum: applied | cold_start | low_latency | high_success. a2c2_correction_magnitude is the L2 norm of the residual across the chunk. Use these for observability dashboards.
When the hook is not loaded, the a2c2_* fields are absent.
Metrics
Section titled “Metrics”| Metric | Type | Labels | Fires when |
|---|---|---|---|
reflex_a2c2_applied_total | Counter | reason="applied" | Forward pass ran + correction added |
reflex_a2c2_skipped_total | Counter | reason in {cold_start, low_latency, high_success} | A2C2 skipped |
Track applied / (applied + skipped). If skipped > 95% sustained, the head isn’t earning its keep on this hardware — consider disabling.
Architecture
Section titled “Architecture”The default A2C2Config:
- Action dim: 7 (Franka); customizable via training-time config
- Observation dim: 256
- Chunk size: 50
- Hidden dim: 128, 3 hidden layers, GELU activation
- Positional encoding dim: 32 (sinusoidal, sqrt-scale frequencies)
- Total params: ~72K at FP32, ~280 KB
- Paper-scaled (
hidden_dim=64): ~25K params, ~96 KB — under the 150 KB ceiling
NumPy-only forward pass on the hot path. No torch import in serve runtime; CPU forward is sub-5ms; Orin Nano projection is 1.5–3 ms via SIMD speedup.
Tuning
Section titled “Tuning”| Symptom | Likely cause | Fix |
|---|---|---|
applied counter is always 0 | Latency p95 stays under 40ms (good!) OR success rate stays high (also good) | Nothing to fix — A2C2 is correctly recognizing it isn’t needed |
applied counter dominates | Latency consistently high + success low | Either real distribution shift (A2C2 is earning its keep), or thresholds are too aggressive |
| Correction magnitude > 1.0 sustained | Head trained on a very different distribution | Retrain on traces from THIS deployment via scripts/train_a2c2_customer.py |
| Bad checkpoint path → server crashes | Misconfigured --a2c2-checkpoint | Server handles gracefully — logs ERROR and continues with a2c2_hook=None |
Source paper + ADR
Section titled “Source paper + ADR”- arXiv 2509.23224 (A2C2 paper)
- ADR:
2026-04-29-a2c2-phase1-libero-smoke-modal.md(Phase 1 fix validated, N=10 LIBERO at--inject-latency-ms 100= 8/10, matches OFF baseline exactly)