record & replay
Every /act request and response from reflex serve can be captured to a JSONL trace file so physical-robot bugs become reproducible on a dev laptop. Traces can be:
- Replayed against the same model to verify determinism (
cos ≈ 1.0) - Diffed against a different model to spot regression
- Fed to A2C2 as a training corpus
- Handed to Reflex support as a reproduction artifact
Record vs OTel — two layers, two jobs
Section titled “Record vs OTel — two layers, two jobs”Don’t confuse them:
| Layer | Purpose | Format | Sink | Use case |
|---|---|---|---|---|
| Record/replay | Bit-exact replay + cosine diff | Custom JSONL schema v1 | Local .jsonl.gz file | ”Did model B reproduce model A’s actions on this trace?” |
| OTel tracing | Live observability + debug UI | OTel spans (gen_ai.* + reflex.*) | Phoenix / any OTLP backend | ”Why did /act take 800 ms at 14:32 yesterday?” |
Both can run simultaneously. The /act hook emits reflex.record.seq on the OTel span so a single record can be grepped in either ledger by seq.
Record
Section titled “Record”reflex serve ./export/pi05 --record /var/log/reflex/traces --embodiment frankaOne file per server session, named <YYYYMMDD>-<HHMMSS>-<model_hash>-<session_id>.jsonl.gz (UTC). Default: gzipped + image hashes only.
| Flag | Default | Notes |
|---|---|---|
--record <dir> | disabled | Path that will hold the trace file. Auto-created. |
--record-images <mode> | hash_only | full (base64 JPEG kept; ~40 MB/1k calls gzipped), hash_only (image_sha256 only; ~0.9 MB/1k calls gzipped), none (image field dropped entirely) |
--record-no-gzip | off | Plain .jsonl instead of .jsonl.gz. Useful for grep during dev. |
Size guidance (per 1000 /act calls, pi0.5 decomposed, 50-action chunks, 7-dim, 224×224 RGB)
Section titled “Size guidance (per 1000 /act calls, pi0.5 decomposed, 50-action chunks, 7-dim, 224×224 RGB)”| Mode | Uncompressed | Gzipped |
|---|---|---|
full | ~45 MB | ~40 MB |
hash_only (default) | ~2.8 MB | ~0.9 MB |
none | ~2.1 MB | ~0.7 MB |
hash_only is sufficient for replay against a fixed image corpus, small enough for indefinite retention. Full images are only needed when you plan to replay without the original image source.
Privacy & compliance
Section titled “Privacy & compliance”Image frames can capture people or proprietary workcells. Default hash_only keeps image content from leaving the robot.
- Traces are filesystem-permissioned; wrap the recording dir in LUKS / ecryptfs if at-rest encryption is required
--record-images nonedrops all image data (even the hash)- Instruction text is kept as-is; redact client-side if it carries secrets
Degraded mode
Section titled “Degraded mode”If the disk fills mid-session, the recorder catches OSError, logs slug=record-disk-full, and stops writing. The server continues serving /act — recording is never load-bearing for inference.
Replay
Section titled “Replay”reflex replay ./traces/20260424-171305-7a8b3c1d-<session>.jsonl.gz \ --model ./export/pi05 \ --diff allLoads the trace, re-invokes predict_from_base64 on the target model for each request, compares against the recorded response, prints a per-request line + summary footer.
| Flag | Default | Notes |
|---|---|---|
trace_file | required | Path to a .jsonl or .jsonl.gz trace |
--model <dir> | required unless --no-replay | Target export directory |
--diff <mode> | actions | actions (cos + max_abs), latency (within ±5%), cache (status match), or all |
--n <int> | 0 (all) | Replay first N records only |
--output <path> | unset | Write machine-readable JSON report (CI-friendly) |
--fail-on <mode> | unset | Exit 3 if any diff of that mode fails |
--no-replay | off | Parse trace, skip model load. For inspecting traces. |
Example output
Section titled “Example output”Replay: traces/20260424-171305-7a8b3c1d-c9d4f0ea.jsonl.gz reflex_version: 0.7.0 model_hash: 7a8b3c1d9f2e4a55 config_hash: e12f44c7b1a93802 model_type: pi0.5 export_kind: decomposed embodiment: franka
Loading target model: ./export/pi05
Replaying requests (--n=all, --diff=all): seq= 0 actions: cos=1.000000 max_abs=2.09e-07 [PASS] latency: 98→101ms (+3.1%) [PASS] cache: hit→hit [PASS] seq= 1 actions: cos=1.000000 max_abs=2.11e-07 [PASS] latency: 103→104ms (+1.0%) [PASS] cache: miss→miss [PASS] ...
Summary: replayed: 1843 diffed: 1843 actions: 1843/1843 pass (cos≥0.999, max_abs<1e-3) latency: 1821/1843 pass (within ±5% of recorded total_ms) cache: 1843/1843 pass (status match)Exit codes
Section titled “Exit codes”| Code | Meaning |
|---|---|
| 0 | All diffs passed (or --no-replay completed) |
| 1 | Trace file error (missing, malformed, unknown schema version) |
| 2 | Target model load failed |
| 3 | --fail-on <mode> triggered |
Known limitations
Section titled “Known limitations”- Images: replay requires
--record-images full.hash_onlytraces can be inspected with--no-replaybut can’t be re-invoked through the model. - No
--seedoverride for diffable determinism — replay runs with whatever RNG state the target produces. Deterministic models (onnx_gpuwith fixed inputs) don’t need this; flow-matching with fresh noise per call does.
JSONL format (schema v1)
Section titled “JSONL format (schema v1)”{"kind":"header","schema_version":1,"reflex_version":"0.7.0","model_hash":"...","config_hash":"...","session_id":"...","started_at":"...", ...}{"kind":"request","schema_version":1,"seq":0,"chunk_id":0,"timestamp":"...","request":{...},"response":{...},"latency":{...},"denoise":{...},"mode":"onnx_gpu"}{"kind":"request","schema_version":1,"seq":1, ...}...{"kind":"footer","schema_version":1,"ended_at":"...","total_requests":1843, ...}All records carry schema_version so readers can dispatch across versions. Additive fields don’t bump the version; readers ignore unknown fields.