a2c2 — chunk correction

reflex serve --a2c2-checkpoint <path> plugs an A2C2 residual MLP onto the action chunk after the policy returns it. Auto-skips when latency is low or success is high — so you pay nothing on the easy cases and only get correction when it’s earning its keep.

Paper: arXiv:2509.23224.

Quick start

# 1. Train on LeRobot LIBERO data (Modal A100, ~$3-5)
modal run scripts/train_a2c2_lerobot.py --output a2c2_lerobot_v1.npz

# 2. Serve with the checkpoint loaded
reflex serve ./my-export/ --a2c2-checkpoint ./a2c2_lerobot_v1.npz

When --a2c2-checkpoint is unset, the hook isn’t loaded; /act behavior is unchanged from baseline. Backward-compatible.

What it does

For each chunk of N actions, the hook computes a per-step correction:

correction[i] = A2C2Head(actions[i], observation, chunk_position=i, latency_estimate_ms)
actuated[i]   = actions[i] + correction[i]

The correction is bounded by the head’s training distribution; it doesn’t replace the policy’s action, it nudges it. Per the paper, this fixes RTC overshoot/undershoot on high-latency Jetson without retraining the base VLA.

Auto-skip semantics

The hook tracks two rolling windows from real /act traffic:

Tracker	Window	Default threshold	Skip when
`latency_p95_ms`	last 100 acts	< 40 ms	Low latency — correction not needed
`success_rate`	last 50 acts	> 90%	Policy is doing fine on its own
`cold_start`	first 5 acts	—	Insufficient signal

A2C2 only fires when both signals favor it: high latency AND low success. Matches the paper’s guidance that A2C2 has positive marginal value only at high-latency low-success regimes.

Response telemetry

{
  "actions": [...],
  "latency_ms": 45.2,
  "a2c2_applied": true,
  "a2c2_reason": "applied",
  "a2c2_correction_magnitude": 0.073
}

a2c2_reason is a bounded enum: applied | cold_start | low_latency | high_success. a2c2_correction_magnitude is the L2 norm of the residual across the chunk. Use these for observability dashboards.

When the hook is not loaded, the a2c2_* fields are absent.

Metrics

Metric	Type	Labels	Fires when
`reflex_a2c2_applied_total`	Counter	`reason="applied"`	Forward pass ran + correction added
`reflex_a2c2_skipped_total`	Counter	`reason in {cold_start, low_latency, high_success}`	A2C2 skipped

Track applied / (applied + skipped). If skipped > 95% sustained, the head isn’t earning its keep on this hardware — consider disabling.

Architecture

The default A2C2Config:

Action dim: 7 (Franka); customizable via training-time config
Observation dim: 256
Chunk size: 50
Hidden dim: 128, 3 hidden layers, GELU activation
Positional encoding dim: 32 (sinusoidal, sqrt-scale frequencies)
Total params: ~72K at FP32, ~280 KB
Paper-scaled (hidden_dim=64): ~25K params, ~96 KB — under the 150 KB ceiling

NumPy-only forward pass on the hot path. No torch import in serve runtime; CPU forward is sub-5ms; Orin Nano projection is 1.5–3 ms via SIMD speedup.

Tuning

Symptom	Likely cause	Fix
`applied` counter is always 0	Latency p95 stays under 40ms (good!) OR success rate stays high (also good)	Nothing to fix — A2C2 is correctly recognizing it isn’t needed
`applied` counter dominates	Latency consistently high + success low	Either real distribution shift (A2C2 is earning its keep), or thresholds are too aggressive
Correction magnitude > 1.0 sustained	Head trained on a very different distribution	Retrain on traces from THIS deployment via `scripts/train_a2c2_customer.py`
Bad checkpoint path → server crashes	Misconfigured `--a2c2-checkpoint`	Server handles gracefully — logs ERROR and continues with `a2c2_hook=None`

Source paper + ADR

arXiv 2509.23224 (A2C2 paper)
ADR: 2026-04-29-a2c2-phase1-libero-smoke-modal.md (Phase 1 fix validated, N=10 LIBERO at --inject-latency-ms 100 = 8/10, matches OFF baseline exactly)