Skip to content

record & replay

Every /act request and response from reflex serve can be captured to a JSONL trace file so physical-robot bugs become reproducible on a dev laptop. Traces can be:

  • Replayed against the same model to verify determinism (cos ≈ 1.0)
  • Diffed against a different model to spot regression
  • Fed to A2C2 as a training corpus
  • Handed to Reflex support as a reproduction artifact

Don’t confuse them:

LayerPurposeFormatSinkUse case
Record/replayBit-exact replay + cosine diffCustom JSONL schema v1Local .jsonl.gz file”Did model B reproduce model A’s actions on this trace?”
OTel tracingLive observability + debug UIOTel spans (gen_ai.* + reflex.*)Phoenix / any OTLP backend”Why did /act take 800 ms at 14:32 yesterday?”

Both can run simultaneously. The /act hook emits reflex.record.seq on the OTel span so a single record can be grepped in either ledger by seq.

Terminal window
reflex serve ./export/pi05 --record /var/log/reflex/traces --embodiment franka

One file per server session, named <YYYYMMDD>-<HHMMSS>-<model_hash>-<session_id>.jsonl.gz (UTC). Default: gzipped + image hashes only.

FlagDefaultNotes
--record <dir>disabledPath that will hold the trace file. Auto-created.
--record-images <mode>hash_onlyfull (base64 JPEG kept; ~40 MB/1k calls gzipped), hash_only (image_sha256 only; ~0.9 MB/1k calls gzipped), none (image field dropped entirely)
--record-no-gzipoffPlain .jsonl instead of .jsonl.gz. Useful for grep during dev.

Size guidance (per 1000 /act calls, pi0.5 decomposed, 50-action chunks, 7-dim, 224×224 RGB)

Section titled “Size guidance (per 1000 /act calls, pi0.5 decomposed, 50-action chunks, 7-dim, 224×224 RGB)”
ModeUncompressedGzipped
full~45 MB~40 MB
hash_only (default)~2.8 MB~0.9 MB
none~2.1 MB~0.7 MB

hash_only is sufficient for replay against a fixed image corpus, small enough for indefinite retention. Full images are only needed when you plan to replay without the original image source.

Image frames can capture people or proprietary workcells. Default hash_only keeps image content from leaving the robot.

  • Traces are filesystem-permissioned; wrap the recording dir in LUKS / ecryptfs if at-rest encryption is required
  • --record-images none drops all image data (even the hash)
  • Instruction text is kept as-is; redact client-side if it carries secrets

If the disk fills mid-session, the recorder catches OSError, logs slug=record-disk-full, and stops writing. The server continues serving /act — recording is never load-bearing for inference.

Terminal window
reflex replay ./traces/20260424-171305-7a8b3c1d-<session>.jsonl.gz \
--model ./export/pi05 \
--diff all

Loads the trace, re-invokes predict_from_base64 on the target model for each request, compares against the recorded response, prints a per-request line + summary footer.

FlagDefaultNotes
trace_filerequiredPath to a .jsonl or .jsonl.gz trace
--model <dir>required unless --no-replayTarget export directory
--diff <mode>actionsactions (cos + max_abs), latency (within ±5%), cache (status match), or all
--n <int>0 (all)Replay first N records only
--output <path>unsetWrite machine-readable JSON report (CI-friendly)
--fail-on <mode>unsetExit 3 if any diff of that mode fails
--no-replayoffParse trace, skip model load. For inspecting traces.
Replay: traces/20260424-171305-7a8b3c1d-c9d4f0ea.jsonl.gz
reflex_version: 0.7.0
model_hash: 7a8b3c1d9f2e4a55
config_hash: e12f44c7b1a93802
model_type: pi0.5
export_kind: decomposed
embodiment: franka
Loading target model: ./export/pi05
Replaying requests (--n=all, --diff=all):
seq= 0 actions: cos=1.000000 max_abs=2.09e-07 [PASS] latency: 98→101ms (+3.1%) [PASS] cache: hit→hit [PASS]
seq= 1 actions: cos=1.000000 max_abs=2.11e-07 [PASS] latency: 103→104ms (+1.0%) [PASS] cache: miss→miss [PASS]
...
Summary:
replayed: 1843
diffed: 1843
actions: 1843/1843 pass (cos≥0.999, max_abs<1e-3)
latency: 1821/1843 pass (within ±5% of recorded total_ms)
cache: 1843/1843 pass (status match)
CodeMeaning
0All diffs passed (or --no-replay completed)
1Trace file error (missing, malformed, unknown schema version)
2Target model load failed
3--fail-on <mode> triggered
  • Images: replay requires --record-images full. hash_only traces can be inspected with --no-replay but can’t be re-invoked through the model.
  • No --seed override for diffable determinism — replay runs with whatever RNG state the target produces. Deterministic models (onnx_gpu with fixed inputs) don’t need this; flow-matching with fresh noise per call does.
{"kind":"header","schema_version":1,"reflex_version":"0.7.0","model_hash":"...","config_hash":"...","session_id":"...","started_at":"...", ...}
{"kind":"request","schema_version":1,"seq":0,"chunk_id":0,"timestamp":"...","request":{...},"response":{...},"latency":{...},"denoise":{...},"mode":"onnx_gpu"}
{"kind":"request","schema_version":1,"seq":1, ...}
...
{"kind":"footer","schema_version":1,"ended_at":"...","total_requests":1843, ...}

All records carry schema_version so readers can dispatch across versions. Additive fields don’t bump the version; readers ignore unknown fields.