latency SLO

reflex serve --slo p99=150ms makes the server measure every /act response, track a rolling p99, and react when it exceeds your threshold.

Three modes

Mode	Behavior when p99 > threshold
`log_only`	Emit `reflex_slo_violations_total` Prometheus metric only
`503`	Metric + return HTTP 503 with `{"p99_measured_ms", "p99_slo_ms", "retry_after_s"}` body + `Retry-After: 1` header
`degrade` (default)	Phase 1: same as `log_only`. Phase 1.5: drops NFE 4→2, skips RTC eval, etc.

Quick start

# 503 mode — failover-capable clients get structured 503 when SLO is at risk
reflex serve ./my-export/ --slo p99=150ms --slo-mode 503

# log_only — monitor SLO without changing response behavior
reflex serve ./my-export/ --slo p99=200ms --slo-mode log_only

# default: --slo-mode degrade (telemetry only until P1.5 degradation knobs land)
reflex serve ./my-export/ --slo p99=150ms

Spec format

--slo p<N>=<X>ms where N is percentile (0-100, fractional allowed) and X is threshold in ms:

p99=150ms
p95=200ms
p99.9=500ms
p50=50ms

Only the ms unit is supported in Phase 1. Case-insensitive; whitespace around = is ignored.

Recovery semantics

Once the tracker flags a violation, it clears only after:

The measured percentile drops below 0.8 × threshold (default ratio)
For 3 consecutive recomputations (default recover_windows)

Prevents flapping on bursty workloads.

Metric

reflex_slo_violations_total{embodiment, kind="p{N}_exceeded"}

Increments on every request where the tracker is in violation state. Scrape via /metrics or via the MCP resource metrics://prometheus.

503 response shape

HTTP/1.1 503 Service Unavailable
Retry-After: 1
Content-Type: application/json

{
  "error": "slo_violation",
  "p99_measured_ms": 187.43,
  "p99_slo_ms": 150.0,
  "retry_after_s": 1
}

Clients implementing failover should retry on a different instance (e.g., cloud backup) without polling the same server. Clients without failover logic see 503 until the SLO recovers.

Recommended Grafana panels

# Violation rate per minute
rate(reflex_slo_violations_total[5m])

# Actual rolling p99 (compare against your --slo)
histogram_quantile(0.99, rate(reflex_act_latency_seconds_bucket[5m]))

# 503 rate (only meaningful with --slo-mode 503)
rate(reflex_act_latency_seconds_count{status="503"}[5m])