[serve] Recover ingress-router pin-misses via the fallback proxy instead of 503 by eicherseiji · Pull Request #64218 · ray-project/ray

eicherseiji · 2026-06-18T19:59:15Z

Why are these changes needed?

When direct streaming is enabled, HAProxy asks the ingress request router to pin a replica, then routes to it by name through its statically reloaded server map. Right after an application becomes RUNNING there is a brief window where the router's in-process replica view runs ahead of HAProxy's config reload, so the router can name a replica HAProxy has not loaded yet. HAProxy rejected that with a 503 unknown_replica_id -- surfacing as flaky test_dp_direct_streaming failures in postmerge and as a real client-facing error during the post-deploy gap.

This routes the unknown_replica_id reason to the fallback Serve proxy instead of returning 503. The fallback proxy is always running in HAProxy mode and re-pins via the same consistent-hash router (same ring, same replica) over the Serve handle path, so it serves the request correctly and preserves session affinity.

Scope: only unknown_replica_id is recovered. router_unreachable, router_non_200, and unparseable_replica_id mean the router itself is broken and still fail loud (503). The Lua keeps arming txn.ingress_request_router_failed, so serve_haproxy_ingress_router_failures{reason} still counts every pin-miss by reason.

Adds test_pin_miss_falls_back_to_fallback_server (runs against real HAProxy): with the router pinning a replica absent from the server map, the request is served by the fallback (200) and the affinity-breaking primary backend is never selected.

The first commit (a test-side warmup) is reverted in this PR; the fallback fix replaces it.

Related issue number

Surfaced by postmerge https://buildkite.com/ray-project/postmerge/builds/18131

Checks

I've signed off every commit (DCO).
test_haproxy_api.py (incl. the new test) and test_haproxy_metrics.py pass locally against HAProxy 2.8; lint clean.

… tests The direct-streaming session-affinity tests fired requests as soon as the application reached RUNNING. RUNNING only means the controller sees every replica running; HAProxy reloads its data-plane server map asynchronously afterward, and the ingress request router learns the running replicas through a separate long poll that runs ahead of that reload. In that window the router pins a replica HAProxy has not loaded, and HAProxy returns 503 unknown_replica_id. test_different_sessions_spread fans requests across all replicas, so it hit the gap and failed in postmerge; test_session_affinity pins a single replica and rarely did. run_app_through_haproxy now warms up until a request has reached every data-plane replica before returning the URL, closing the race for all three direct-streaming test files (DP, PD, and router). Signed-off-by: Seiji Eicher <seiji@anyscale.com>

gemini-code-assist

Code Review

This pull request improves the reliability of direct-streaming session-affinity tests by ensuring HAProxy is fully warmed up and routing to all data-plane replicas before proceeding. Feedback suggests wrapping the warmup HTTP requests in a try-except block to handle transient connection or timeout errors during HAProxy startup or reload, preventing premature failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-18T20:00:09Z

+            resp = _chat_request(base_url, f"warmup-session-{next(sessions)}")
+            if resp.status_code == 200:
+                reached.add(resp.headers["x-replica-id"])


During HAProxy startup or reload, _chat_request might raise transient connection or timeout errors (e.g., httpx.ConnectError or httpx.HTTPError). If an exception is raised, it will abort the loop of probes immediately, which can cause the wait_for_condition check to fail or run much slower. Wrapping the request in a try-except block to catch httpx.HTTPError and safely checking for the presence of the x-replica-id header makes the warmup process significantly more robust.

Suggested change

resp = _chat_request(base_url, f"warmup-session-{next(sessions)}")

if resp.status_code == 200:

reached.add(resp.headers["x-replica-id"])

try:

resp = _chat_request(base_url, f"warmup-session-{next(sessions)}")

if resp.status_code == 200 and "x-replica-id" in resp.headers:

reached.add(resp.headers["x-replica-id"])

except httpx.HTTPError:

pass

…affinity tests" This reverts commit 284f335. Signed-off-by: Seiji Eicher <seiji@anyscale.com>

…ead of 503 When direct streaming is enabled, HAProxy asks the ingress request router to pin a replica, then routes to that replica by name through its statically reloaded server map. Right after an application becomes RUNNING there is a brief window where the router's in-process replica view runs ahead of HAProxy's config reload, so the router can name a replica HAProxy has not loaded yet. HAProxy rejected that with a 503 `unknown_replica_id`, which surfaced as flaky failures in the direct-streaming session-affinity tests (test_dp_direct_streaming, etc.) and as a real client-facing error during the post-deploy gap. Recover the request instead of failing it. The fallback Serve proxy is always running in HAProxy mode and re-pins via the same consistent-hash router (same ring, same replica) over the Serve handle path, so it serves the request correctly and preserves session affinity. The frontend now routes the `unknown_replica_id` reason to the fallback proxy rather than returning 503. Scope: only `unknown_replica_id` is recovered. `router_unreachable`, `router_non_200`, and `unparseable_replica_id` mean the router itself is broken and still fail loud. The Lua keeps arming `txn.ingress_request_router_failed`, so the `serve_haproxy_ingress_router_failures{reason}` metric still counts every pin-miss by reason. Adds test_pin_miss_falls_back_to_fallback_server: with the router pinning a replica absent from the server map, the request is served by the fallback (200) and the affinity-breaking primary backend is never selected. Signed-off-by: Seiji Eicher <seiji@anyscale.com>

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

eicherseiji self-assigned this Jun 18, 2026

eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 18, 2026

eicherseiji added 2 commits June 19, 2026 01:36

Revert "[llm] Warm up HAProxy data plane in direct-streaming session-…

8463e92

…affinity tests" This reverts commit 284f335. Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji changed the title ~~[llm] Warm up HAProxy data plane in direct-streaming session-affinity tests~~ [serve] Recover ingress-router pin-misses via the fallback proxy instead of 503 Jun 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Recover ingress-router pin-misses via the fallback proxy instead of 503#64218

[serve] Recover ingress-router pin-misses via the fallback proxy instead of 503#64218
eicherseiji wants to merge 3 commits into
ray-project:masterfrom
eicherseiji:llm-dp-direct-streaming-warmup

eicherseiji commented Jun 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eicherseiji commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eicherseiji commented Jun 18, 2026 •

edited

Loading