latency SLO
reflex serve --slo p99=150ms makes the server measure every /act response, track a rolling p99, and react when it exceeds your threshold.
Three modes
Section titled “Three modes”| Mode | Behavior when p99 > threshold |
|---|---|
log_only | Emit reflex_slo_violations_total Prometheus metric only |
503 | Metric + return HTTP 503 with {"p99_measured_ms", "p99_slo_ms", "retry_after_s"} body + Retry-After: 1 header |
degrade (default) | Phase 1: same as log_only. Phase 1.5: drops NFE 4→2, skips RTC eval, etc. |
Quick start
Section titled “Quick start”# 503 mode — failover-capable clients get structured 503 when SLO is at riskreflex serve ./my-export/ --slo p99=150ms --slo-mode 503
# log_only — monitor SLO without changing response behaviorreflex serve ./my-export/ --slo p99=200ms --slo-mode log_only
# default: --slo-mode degrade (telemetry only until P1.5 degradation knobs land)reflex serve ./my-export/ --slo p99=150msSpec format
Section titled “Spec format”--slo p<N>=<X>ms where N is percentile (0-100, fractional allowed) and X is threshold in ms:
p99=150msp95=200msp99.9=500msp50=50ms
Only the ms unit is supported in Phase 1. Case-insensitive; whitespace around = is ignored.
Recovery semantics
Section titled “Recovery semantics”Once the tracker flags a violation, it clears only after:
- The measured percentile drops below
0.8 × threshold(default ratio) - For 3 consecutive recomputations (default
recover_windows)
Prevents flapping on bursty workloads.
Metric
Section titled “Metric”reflex_slo_violations_total{embodiment, kind="p{N}_exceeded"}Increments on every request where the tracker is in violation state. Scrape via /metrics or via the MCP resource metrics://prometheus.
503 response shape
Section titled “503 response shape”HTTP/1.1 503 Service UnavailableRetry-After: 1Content-Type: application/json
{ "error": "slo_violation", "p99_measured_ms": 187.43, "p99_slo_ms": 150.0, "retry_after_s": 1}Clients implementing failover should retry on a different instance (e.g., cloud backup) without polling the same server. Clients without failover logic see 503 until the SLO recovers.
Recommended Grafana panels
Section titled “Recommended Grafana panels”# Violation rate per minuterate(reflex_slo_violations_total[5m])
# Actual rolling p99 (compare against your --slo)histogram_quantile(0.99, rate(reflex_act_latency_seconds_bucket[5m]))
# 503 rate (only meaningful with --slo-mode 503)rate(reflex_act_latency_seconds_count{status="503"}[5m])