HTTP /act endpoint
reflex serve listens on :8000 (configurable) with four endpoints. /act is the inference path; the rest are observability.
POST /act
Section titled “POST /act”Send {instruction, state, image?, episode_id?}, get back a 50-step action chunk.
Request
Section titled “Request”{ "instruction": "pick up the red cup", "state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], "image_b64": "<base64-encoded JPEG/PNG>", "episode_id": "ep-2026-05-01-001"}| Field | Type | Required | Notes |
|---|---|---|---|
instruction | string | Yes | Natural-language task spec. ≤ 512 chars (Phase 1). |
state | array of float | Yes | Proprioceptive state vector. Length must match the embodiment’s mean_state. |
image_b64 | string | Optional | Base64-encoded JPEG or PNG. Required when the model is multi-modal (most are). |
image | array | Optional | Alternative to image_b64 — raw HxWxC pixel array. Use base64 for HTTP. |
episode_id | string | Recommended | Stable identifier for the current task. Required for cache locality + RTC carry-over + policy-versioning routing. |
request_id | string | Auto | Set if you want client-side request tracing. Falls back to a UUID. |
Response (success)
Section titled “Response (success)”{ "actions": [[0.01, -0.02, 0.0, 0.0, 0.0, 0.0, 0.5], ...], "num_actions": 50, "action_dim": 7, "latency_ms": 11.9, "hz": 84.0, "denoising_steps": 10, "inference_mode": "onnx_trt_fp16", "guard_clamped": false, "guard_violations": []}| Field | Type | Notes |
|---|---|---|
actions | 2D array | num_actions × action_dim action chunk. Pre-clamped to embodiment ranges. |
num_actions | int | Length of the chunk. Usually 50 for flow-matching models. |
action_dim | int | Per-action dimensionality (7 for Franka 6-DOF + gripper). |
latency_ms | float | Wall-clock from request receipt to response. |
hz | float | 1000 / latency_ms. |
denoising_steps | int | Actual denoise steps used (may be fewer than max if --adaptive-steps engaged). |
inference_mode | string | onnx_trt_fp16, onnx_cuda, onnx_cpu, decomposed, monolithic |
guard_clamped | bool | True if ActionGuard clamped any action in the chunk. |
guard_violations | array | Per-axis violations when guard_clamped is true. |
Response (telemetry — when wedges are active)
Section titled “Response (telemetry — when wedges are active)”When you enable wedges, additional fields appear:
{ "actions": [...], "latency_ms": 45.2,
// a2c2 (when --a2c2-checkpoint is set) "a2c2_applied": true, "a2c2_reason": "applied", "a2c2_correction_magnitude": 0.073,
// record (when --record is set) "record_seq": 1842,
// policy-versioning (when --policy-a/--policy-b are set) "policy_slot": "a",
// robot identity (when --robot-id is set) "robot_id": "warehouse-01"}These fields are additive — clients ignoring unknown fields stay backwards-compatible.
When 2-policy mode is active, the response also carries headers:
X-Reflex-Policy-Slot: aX-Reflex-Model-Version: pi0-libero-v1@<hash>X-Reflex-Routing-Key: ep_xyzX-Reflex-Routing-Degraded: falseError responses
Section titled “Error responses”{ "error": "queue_full", "message": "policy runtime queue at capacity", "policy_id": "prod", "max_queue": 1000}Status codes:
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Malformed request (missing field, wrong type, image decode failure) |
| 422 | Schema-valid but semantically invalid (e.g. state length doesn’t match embodiment) |
| 503 | Server unavailable (warming up, queue full, SLO violation, circuit breaker tripped) |
| 500 | Internal error (model crashed; check /health and the audit log) |
503 always carries Retry-After indicating when to retry.
GET /health
Section titled “GET /health”curl http://localhost:8000/health{ "status": "ready", "model_loaded": true, "model_version": "pi0-libero-v1@7a8b3c1d", "inference_mode": "onnx_trt_fp16", "uptime_seconds": 1245, "robot_id": "warehouse-01", "cuda_graphs_active": true}status is one of:
warming— first cold-start (10-70 sec). Returns HTTP 503.ready— operational. HTTP 200.degraded— circuit breaker tripped or both 2-policy slots failed. HTTP 503.
Load balancers should treat 503 as “skip this instance” — Reflex never returns 503 for transient single-request failures, only durable instance state.
GET /config
Section titled “GET /config”Returns the saved reflex_config.json from the loaded export. Useful for verifying which model + embodiment + version is actually serving.
GET /metrics
Section titled “GET /metrics”Prometheus exposition format. Scrape interval: 15 seconds is typical.
Key metrics:
| Metric | Type | Labels |
|---|---|---|
reflex_act_latency_seconds | Histogram | embodiment, policy_slot, inference_mode |
reflex_act_total | Counter | embodiment, status |
reflex_guard_clamped_total | Counter | embodiment |
reflex_cache_hit_total / reflex_cache_miss_total | Counter | embodiment, policy_slot |
reflex_in_flight_requests | Gauge | policy_slot |
reflex_robot_info | Gauge | robot_id, embodiment, model_id |
reflex_slo_violations_total | Counter | embodiment, kind |
Default: Access-Control-Allow-Origin: * (open). Override with --cors-origins https://app.example.com for restricted origins. CORS is on for browser-based control loops; never disable in non-browser deploys.
Client SDK
Section titled “Client SDK”Python client is provided:
from reflex.client import ReflexClient
with ReflexClient("http://localhost:8000") as client: with client.episode() as ep: result = ep.act(image=numpy_frame, state=[0.1, 0.2, ...]) print(result["actions"])The client handles 503 retries, episode_id management, and image base64 encoding. Other languages: just use any HTTP client; the wire format is plain JSON.