Skip to content

Recover JSON-RPC messages with malformed unicode escapes#1722

Draft
ellismg wants to merge 1 commit into
mainfrom
ellismg/harden-jsonrpc-surrogate-decode
Draft

Recover JSON-RPC messages with malformed unicode escapes#1722
ellismg wants to merge 1 commit into
mainfrom
ellismg/harden-jsonrpc-surrogate-decode

Conversation

@ellismg

@ellismg ellismg commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

The Rust SDK's JSON-RPC read loop parsed each framed body with a strict serde_json::from_slice. A backend/CLI response containing a malformed \u escape — specifically a lone (unpaired) UTF-16 surrogate — fails with unexpected end of hex escape. That single parse error aborts the entire read loop (JsonRpcClient::read_loop), drains all pending requests, and tears down the transport. The list-client then reconnects and re-breaks on the same deterministic bytes, so every RPC on that channel (list_models, get_account_quota, list_global_agents, list_global_skills) fails together — on the github-app desktop app this manifests as a permanently empty model dropdown, quota, and agent/skill lists.

This is the consumer-side defensive fix for github/app#1055. Sibling producer-side fixes already landed in copilot-agent-runtime (#10300, #10231) to stop the CLI from emitting lone surrogates; this SDK change hardens the reader so a bad payload from any source (including an enterprise backend) can't brick the channel.

What changed

  • New rust/src/surrogate_safe.rssanitize_json_escapes(&[u8]) -> (Cow<[u8]>, usize), a string-context-aware byte scanner that rewrites lone high/low UTF-16 surrogates and otherwise-malformed \u escapes to \ufffd (U+FFFD). It preserves valid surrogate pairs, valid BMP escapes, and never misreads an escaped \\u. Returns Cow::Borrowed (zero allocation) when the input is already strict-JSON safe.
  • rust/src/jsonrpc.rsJsonRpcClient::read_message now recovers on the parse-error path only: on failure it runs the sanitizer; if something was actually repaired (Cow::Owned) it warn!s with the replacement count + original error and re-parses; if nothing needed repair (Cow::Borrowed) the original error propagates unchanged. The happy path stays zero-cost.
  • rust/tests/jsonrpc_test.rs — adds malformed_unicode_escape_is_recovered_and_loop_survives, a regression test that frames a notification carrying a lone high surrogate \ud83d, asserts it's delivered with the surrogate replaced by U+FFFD, and asserts a following well-formed message still arrives (proving the read loop survived).

No public API change and no consumer call-site change. read_message is the single chokepoint for every message from the CLI subprocess, so one guard here covers list-client RPCs, session-stream notifications, and resume frames. This mirrors the runtime's approach of substituting U+FFFD rather than dropping data.

Validation

Run from rust/ (matching .github/workflows/rust-sdk-tests.yml):

  • cargo +nightly-2026-04-14 fmt --all -- --config-path .rustfmt.nightly.toml --check
  • cargo clippy --all-targets --features test-support -- --no-deps -D warnings -D clippy::unwrap_used -D clippy::disallowed_macros -D clippy::await_holding_invalid_type
  • cargo test --no-default-features --features test-support — 13 surrogate_safe unit tests + the new integration test + the existing jsonrpc_test suite all pass ✅ (the e2e suite requires a locally installed CLI and is unrelated to this change.)

Fixes the SDK side of github/app#1055.

The Rust SDK's JSON-RPC read loop parsed each framed body with a strict
serde_json::from_slice. A response containing a malformed `\u` escape —
notably a lone (unpaired) UTF-16 surrogate — failed with "unexpected end
of hex escape", which aborted JsonRpcClient::read_loop, drained all
pending requests, and tore down the transport. The channel then
reconnected and re-broke on the same deterministic bytes, so every RPC on
it (list_models, get_account_quota, list_global_agents, list_global_skills)
failed together (github/app#1055).

Add a string-context-aware sanitizer (surrogate_safe::sanitize_json_escapes)
that rewrites lone surrogates and otherwise-malformed `\u` escapes to
U+FFFD while preserving valid surrogate pairs and escaped backslashes. It
runs only on the parse-error path, so well-formed traffic is unaffected
and zero-allocation (Cow::Borrowed) when nothing needs repair. On
recovery the reader warns with the replacement count and re-parses;
unrelated syntax errors still propagate unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet