record & replay

Every /act request and response from reflex serve can be captured to a JSONL trace file so physical-robot bugs become reproducible on a dev laptop. Traces can be:

Replayed against the same model to verify determinism (cos ≈ 1.0)
Diffed against a different model to spot regression
Fed to A2C2 as a training corpus
Handed to Reflex support as a reproduction artifact

Record vs OTel — two layers, two jobs

Don’t confuse them:

Layer	Purpose	Format	Sink	Use case
Record/replay	Bit-exact replay + cosine diff	Custom JSONL schema v1	Local `.jsonl.gz` file	”Did model B reproduce model A’s actions on this trace?”
OTel tracing	Live observability + debug UI	OTel spans (`gen_ai.` + `reflex.`)	Phoenix / any OTLP backend	”Why did /act take 800 ms at 14:32 yesterday?”

Both can run simultaneously. The /act hook emits reflex.record.seq on the OTel span so a single record can be grepped in either ledger by seq.

Record

reflex serve ./export/pi05 --record /var/log/reflex/traces --embodiment franka

One file per server session, named <YYYYMMDD>-<HHMMSS>-<model_hash>-<session_id>.jsonl.gz (UTC). Default: gzipped + image hashes only.

Flags

Flag	Default	Notes
`--record <dir>`	disabled	Path that will hold the trace file. Auto-created.
`--record-images <mode>`	`hash_only`	`full` (base64 JPEG kept; ~40 MB/1k calls gzipped), `hash_only` (image_sha256 only; ~0.9 MB/1k calls gzipped), `none` (image field dropped entirely)
`--record-no-gzip`	off	Plain `.jsonl` instead of `.jsonl.gz`. Useful for `grep` during dev.

Size guidance (per 1000 /act calls, pi0.5 decomposed, 50-action chunks, 7-dim, 224×224 RGB)

Mode	Uncompressed	Gzipped
`full`	~45 MB	~40 MB
`hash_only` (default)	~2.8 MB	~0.9 MB
`none`	~2.1 MB	~0.7 MB

hash_only is sufficient for replay against a fixed image corpus, small enough for indefinite retention. Full images are only needed when you plan to replay without the original image source.

Privacy & compliance

Image frames can capture people or proprietary workcells. Default hash_only keeps image content from leaving the robot.

Traces are filesystem-permissioned; wrap the recording dir in LUKS / ecryptfs if at-rest encryption is required
--record-images none drops all image data (even the hash)
Instruction text is kept as-is; redact client-side if it carries secrets

Degraded mode

If the disk fills mid-session, the recorder catches OSError, logs slug=record-disk-full, and stops writing. The server continues serving /act — recording is never load-bearing for inference.

Replay

reflex replay ./traces/20260424-171305-7a8b3c1d-<session>.jsonl.gz \
    --model ./export/pi05 \
    --diff all

Loads the trace, re-invokes predict_from_base64 on the target model for each request, compares against the recorded response, prints a per-request line + summary footer.

Flags

Flag	Default	Notes
`trace_file`	required	Path to a `.jsonl` or `.jsonl.gz` trace
`--model <dir>`	required unless `--no-replay`	Target export directory
`--diff <mode>`	`actions`	`actions` (cos + max_abs), `latency` (within ±5%), `cache` (status match), or `all`
`--n <int>`	0 (all)	Replay first N records only
`--output <path>`	unset	Write machine-readable JSON report (CI-friendly)
`--fail-on <mode>`	unset	Exit 3 if any diff of that mode fails
`--no-replay`	off	Parse trace, skip model load. For inspecting traces.

Example output

Replay: traces/20260424-171305-7a8b3c1d-c9d4f0ea.jsonl.gz
  reflex_version: 0.7.0
  model_hash:     7a8b3c1d9f2e4a55
  config_hash:    e12f44c7b1a93802
  model_type:     pi0.5
  export_kind:    decomposed
  embodiment:     franka

Loading target model: ./export/pi05

Replaying requests (--n=all, --diff=all):
  seq=   0   actions: cos=1.000000 max_abs=2.09e-07 [PASS]   latency: 98→101ms (+3.1%) [PASS]   cache: hit→hit [PASS]
  seq=   1   actions: cos=1.000000 max_abs=2.11e-07 [PASS]   latency: 103→104ms (+1.0%) [PASS]   cache: miss→miss [PASS]
  ...

Summary:
  replayed: 1843
  diffed:   1843
  actions:  1843/1843 pass (cos≥0.999, max_abs<1e-3)
  latency:  1821/1843 pass (within ±5% of recorded total_ms)
  cache:    1843/1843 pass (status match)

Exit codes

Code	Meaning
0	All diffs passed (or `--no-replay` completed)
1	Trace file error (missing, malformed, unknown schema version)
2	Target model load failed
3	`--fail-on <mode>` triggered

Known limitations

Images: replay requires --record-images full. hash_only traces can be inspected with --no-replay but can’t be re-invoked through the model.
No --seed override for diffable determinism — replay runs with whatever RNG state the target produces. Deterministic models (onnx_gpu with fixed inputs) don’t need this; flow-matching with fresh noise per call does.

JSONL format (schema v1)

{"kind":"header","schema_version":1,"reflex_version":"0.7.0","model_hash":"...","config_hash":"...","session_id":"...","started_at":"...", ...}
{"kind":"request","schema_version":1,"seq":0,"chunk_id":0,"timestamp":"...","request":{...},"response":{...},"latency":{...},"denoise":{...},"mode":"onnx_gpu"}
{"kind":"request","schema_version":1,"seq":1, ...}
...
{"kind":"footer","schema_version":1,"ended_at":"...","total_requests":1843, ...}

All records carry schema_version so readers can dispatch across versions. Additive fields don’t bump the version; readers ignore unknown fields.