Skip to content

latency SLO

reflex serve --slo p99=150ms makes the server measure every /act response, track a rolling p99, and react when it exceeds your threshold.

ModeBehavior when p99 > threshold
log_onlyEmit reflex_slo_violations_total Prometheus metric only
503Metric + return HTTP 503 with {"p99_measured_ms", "p99_slo_ms", "retry_after_s"} body + Retry-After: 1 header
degrade (default)Phase 1: same as log_only. Phase 1.5: drops NFE 4→2, skips RTC eval, etc.
Terminal window
# 503 mode — failover-capable clients get structured 503 when SLO is at risk
reflex serve ./my-export/ --slo p99=150ms --slo-mode 503
# log_only — monitor SLO without changing response behavior
reflex serve ./my-export/ --slo p99=200ms --slo-mode log_only
# default: --slo-mode degrade (telemetry only until P1.5 degradation knobs land)
reflex serve ./my-export/ --slo p99=150ms

--slo p<N>=<X>ms where N is percentile (0-100, fractional allowed) and X is threshold in ms:

  • p99=150ms
  • p95=200ms
  • p99.9=500ms
  • p50=50ms

Only the ms unit is supported in Phase 1. Case-insensitive; whitespace around = is ignored.

Once the tracker flags a violation, it clears only after:

  1. The measured percentile drops below 0.8 × threshold (default ratio)
  2. For 3 consecutive recomputations (default recover_windows)

Prevents flapping on bursty workloads.

reflex_slo_violations_total{embodiment, kind="p{N}_exceeded"}

Increments on every request where the tracker is in violation state. Scrape via /metrics or via the MCP resource metrics://prometheus.

HTTP/1.1 503 Service Unavailable
Retry-After: 1
Content-Type: application/json
{
"error": "slo_violation",
"p99_measured_ms": 187.43,
"p99_slo_ms": 150.0,
"retry_after_s": 1
}

Clients implementing failover should retry on a different instance (e.g., cloud backup) without polling the same server. Clients without failover logic see 503 until the SLO recovers.

# Violation rate per minute
rate(reflex_slo_violations_total[5m])
# Actual rolling p99 (compare against your --slo)
histogram_quantile(0.99, rate(reflex_act_latency_seconds_bucket[5m]))
# 503 rate (only meaningful with --slo-mode 503)
rate(reflex_act_latency_seconds_count{status="503"}[5m])