Configure rate limits on VirtualMCPServer PR B 2 by Sanskarzz · Pull Request #5522 · stacklok/toolhive

Sanskarzz · 2026-06-14T19:35:37Z

Summary

This PR adds optimizer-aware vMCP rate limiting using the post-refactor core.VMCP decorator seam.

PR #5276 wired VirtualMCPServer.spec.config.rateLimiting into the vMCP runtime. After the vMCP core refactor, optimizer mode resolves the call_tool meta-tool to the real backend tool before invoking core.VMCP.CallTool. That means rate limiting no longer needs HTTP parsed-request rewriting. It can sit below the optimizer at the core CallTool seam and key buckets directly by the resolved backend tool name.

Follow-up to #5276 and aligned with the vMCP core refactor in #5606 .

Fixes #4552

Type of change

Test plan

Unit tests (task test)
E2E tests (task test-e2e)
Linting (task lint-fix)
Manual testing (describe below)

API Compatibility

This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

This PR does not change the CRD schema or v1beta1 API surface.

Changes

File	Change
`pkg/ratelimit/errors.go`	Adds shared rate-limit error constants and typed `RateLimitedError`.
`pkg/ratelimit/limiter.go`	Adds reusable `Allow(...)` decision helper.
`pkg/ratelimit/middleware.go`	Reuses `Allow(...)` and extracts Redis-backed limiter construction into `NewRedisLimiter(...)`; existing HTTP middleware remains available.
`pkg/vmcp/ratelimit/decorator.go`	Adds a `core.VMCP` decorator that rate-limits `CallTool` using the resolved tool name.
`pkg/vmcp/ratelimit/factory/limiter.go`	Builds and cleans up the Redis-backed limiter from vMCP runtime config.
`pkg/vmcp/cli/serve.go`	Builds the vMCP rate limiter from runtime config and passes it to `server.Config.RateLimiter`.
`pkg/vmcp/server/server.go`	Wraps `core.New(...)` with the rate-limit decorator before Serve/session optimizer handling.
tests	Adds unit coverage for the decision helper, decorator, and limiter factory; keeps optimizer E2E coverage.

Does this introduce a user-facing change?

Yes. When vMCP optimizer is enabled, per-tool rate limits apply to the resolved backend tool name reached through call_tool, rather than the optimizer meta-tool name.

Implementation plan

Approved implementation plan

Reuse the existing pkg/ratelimit limiter and Redis setup instead of duplicating rate-limit logic in vMCP.
Extract the rate-limit decision into a reusable helper that returns a typed RateLimitedError.
Build a Redis-backed limiter from vMCP runtime config in pkg/vmcp/ratelimit/factory.
Pass the limiter through pkg/vmcp/cli/serve.go into server.Config.
Wrap core.VMCP with a rate-limit decorator immediately after core.New(...).
Let the session optimizer remain above the core decorator, so CallTool reaches the limiter with the resolved backend tool name.
Remove the previous parsed-request rewrite/context-juggling implementation.
Cover the new helper, factory, decorator, and optimizer behavior with tests.

Special notes for reviewers

This intentionally avoids optimizer-specific logic in pkg/ratelimit.
The vMCP decorator keys buckets using the name passed to core.VMCP.CallTool; after the Serve-path optimizer this is the resolved backend tool name.
pkg/ratelimit.NewMiddleware(...) remains available for existing HTTP middleware callers.
Extra reviewer attention requested: confirm the typed RateLimitedError is mapped to the desired client-visible rate-limit response on the Serve path.

Large PR Justification

This is a new feature package with a large test suite, and it needs to land as one coherent phase.

codecov · 2026-06-14T19:42:06Z

Codecov Report

❌ Patch coverage is 81.25000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.36%. Comparing base (17d3432) to head (5b31318).

Files with missing lines	Patch %	Lines
pkg/vmcp/cli/serve.go	0.00%	4 Missing ⚠️
pkg/ratelimit/errors.go	0.00%	3 Missing ⚠️
pkg/ratelimit/middleware.go	83.33%	2 Missing and 1 partial ⚠️
pkg/vmcp/server/server.go	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5522   +/-   ##
=======================================
  Coverage   70.35%   70.36%           
=======================================
  Files         649      651    +2     
  Lines       66110    66152   +42     
=======================================
+ Hits        46513    46549   +36     
+ Misses      16254    16236   -18     
- Partials     3343     3367   +24

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jerm-dro · 2026-06-15T20:47:48Z

@Sanskarzz — for context, this is about Trey's in-flight vMCP interface refactor (epic #5419, RFC THV-0076), which splits vMCP into a domain core (the VMCP interface, core.New(cfg)) behind a thin transport layer (server.Serve). Phase 2 (#5431) re-homes the optimizer and the middleware chain onto that split. With that in mind —

Let's park this for now. In the post-refactor vMCP, per-tool rate limiting doesn't need to be HTTP middleware at all — it fits more naturally as a VMCP decorator at the CallTool seam. (It isn't in Trey's refactor scope simply because it hadn't landed when he started — not an oversight.)

The reason it fits: the optimizer is becoming the outermost VMCP layer, resolving call_tool → the real backend tool. Anything that needs to key off the real tool name therefore sits below the optimizer — authorization already does (it's the core admission seam, #5438), and rate limiting has the exact same dependency. So the flow becomes optimizer → {authz, rate-limit} → core: by the time CallTool reaches the limiter the tool name is already resolved, the bucket keys correctly, and the whole mechanism this PR adds to pkg/vmcp/ratelimit/factory (parsed-request rewrite, context juggling) disappears.

Roughly (signatures illustrative — align with the post-#5431 VMCP interface):

// Composition root (cli/serve.go, post-refactor):
var v core.VMCP = core.New(coreCfg) // authz admission seam lives here (#5438) — below the optimizer
v = ratelimit.WrapVMCP(v, limiter)  // keys the per-tool bucket on the resolved tool name
v = optimizer.WrapVMCP(v, ...)      // outermost: resolves call_tool -> backend tool first
srv, _ := server.Serve(ctx, v, serverCfg)

// pkg/vmcp/ratelimit — a VMCP decorator instead of HTTP middleware:
type rateLimitedVMCP struct {
    core.VMCP // embed: every other method passes through unchanged
    limiter   Limiter
}

func (v *rateLimitedVMCP) CallTool(
    ctx context.Context, id *auth.Identity, tool string, args map[string]any, meta vmcp.Meta,
) (*vmcp.CallToolResult, error) {
    // `tool` is already the backend tool name — the optimizer decorator wraps
    // this one and resolved call_tool -> tool before delegating here.
    if err := v.limiter.Allow(ctx, id, tool); err != nil {
        return nil, err // typed rate-limit error; transport maps it to the JSON-RPC 429
    }
    return v.VMCP.CallTool(ctx, id, tool, args, meta)
}

The one bit of prep this needs: pkg/ratelimit currently bundles the bucket decision with HTTP concerns (identity extraction, writing the 429). The decorator only wants the decision — so we'd expose limiter.Allow(ctx, id, tool) error as a standalone call and leave the transport-shaped pieces (identity at the edge, 429 mapping) where they are. No behavior change, just splitting "decide" from "HTTP-wrap."

If you'd like to take this on as the VMCP-wrapper version, that'd be very welcome — otherwise we can fold it in once the refactor lands. Either way I'd rather not merge the middleware-layer change now and then unwind it.

CC @tgrunnagle for awareness — no action needed from you.

Sanskarzz · 2026-06-16T08:54:01Z

@jerm-dro Thanks, that makes sense. Once #5431 lands, I’m happy to rework this as a VMCP decorator and split the rate-limit decision from the HTTP transport response mapping as you suggested.

jerm-dro · 2026-06-24T17:46:54Z

@Sanskarzz — the refactor has landed. PR #5606 ships the first core.VMCP decorator (code mode), which is a working reference for how to rework rate limiting.

The structural template is pkg/vmcp/codemode/decorator.go in that PR:

type decorator struct {
    core.VMCP  // embed: all other methods promoted unchanged
    cfg Config
}

func (d *decorator) CallTool(
    ctx context.Context, identity *auth.Identity,
    name string, args map[string]any, meta map[string]any,
) (*vmcp.ToolCallResult, error) {
    // `name` is already the resolved backend tool name — the optimizer sits above
    // this layer at the session level and resolves call_tool → backend name before
    // CallTool is ever reached, so no inner-tool extraction is needed here.
    // ... rate-limit check on name ...
    return d.VMCP.CallTool(ctx, identity, name, args, meta)
}

The wiring in server.go (also from #5606) shows the insertion point — right after core.New, before the session-layer optimizer:

coreVMCP, err = core.New(deriveCoreConfig(...))
// ...
if cfg.RateLimitConfig != nil {
    coreVMCP = ratelimit.NewDecorator(coreVMCP, cfg.RateLimitConfig)
}
// optimizer wraps session tools above this, so call_tool → real name resolution
// happens before CallTool is invoked on the decorator

A couple of things carry forward from the original discussion:

The pkg/ratelimit.NewMiddleware bucket decision needs to be extracted as a standalone Allow(ctx, identity, tool) error before you can reuse it in the decorator cleanly — separate the "decide" from the "HTTP 429 write."
With name already resolved when CallTool is reached, the parsed-request rewrite and context juggling in pkg/vmcp/ratelimit/factory/middleware.go go away entirely.

Give pkg/vmcp/codemode a read for the full shape and let me know if anything's unclear.

Sanskarzz · 2026-06-24T20:42:20Z

@jerm-dro Thanks for the explanation. I will rebase onto the latest main and will see the new core.VMCP/decorator shape in codemode.

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

Large PR justification has been provided. Thank you!

github-actions · 2026-06-25T13:27:45Z

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

jerm-dro

Thanks for this, @Sanskarzz — the overall direction is right. Moving rate limiting to a core.VMCP decorator below the optimizer so buckets key on the resolved backend tool name is exactly the correct seam, and reusing pkg/ratelimit (the shared Allow helper + NewRedisLimiter) instead of duplicating logic is the right call. The decorator is clean too — it composes over the existing interface rather than modifying it.

One blocker before merge (inline): the typed RateLimitedError isn't surfaced to clients on the Serve path, so the -32029 code and RetryAfter are dropped — the mapping you flagged for review. I sketched a generic CodedError interface so the fix doesn't couple the transport handler to pkg/ratelimit. Plus a nitpick on fail-open vs fail-closed consistency. Happy to re-review once the client mapping is wired.

jerm-dro · 2026-06-25T21:24:18Z

+	args map[string]any, meta map[string]any,
+) (*vmcp.ToolCallResult, error) {
+	if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil {
+		return nil, err


blocker: This *RateLimitedError isn't surfaced to clients on the Serve path. It propagates to coreToolHandler (serve_handlers.go:196-203) and hits the generic return mcp.NewToolResultError(err.Error()), nil — a successful response with IsError: true and text "Rate limit exceeded". So the client gets neither the -32029 CodeRateLimited code nor RetryAfter; both are only honored by the HTTP middleware path the optimizer bypasses, and this PR removes the test that covered the code. This is the mapping you flagged for review — it's currently unwired.

I'd avoid teaching the transport handler about *RateLimitedError directly (it couples serve_handlers to pkg/ratelimit). Instead, define a small capability interface that domain errors opt into, and have the handler ask for it generically:

// neutral home both layers can import without a cycle (e.g. pkg/mcp) // CodedError is implemented by domain errors that should surface a specific // JSON-RPC error code (and optional structured data) instead of degrading to // a generic tool error. type CodedError interface { error Code() int64 // JSON-RPC error code Data() map[string]any // optional structured data (nil if none) }

RateLimitedError implements it:

func (*RateLimitedError) Code() int64 { return CodeRateLimited } func (e *RateLimitedError) Data() map[string]any { return map[string]any{"retryAfterSeconds": e.RetryAfter.Seconds()} }

Then the handler has one generic branch and no knowledge of rate limiting:

if err != nil { var coded mcp.CodedError if errors.As(err, &coded) { return codedErrorResult(req.Params.ID, coded), nil } return mcp.NewToolResultError(err.Error()), nil }

Bonus: the existing ErrAuthorizationFailed special-case just above can fold into the same path by having the authz error implement CodedError (with a generic message), removing that branch too. A natural home for the error → result conversion is the conversion package, so serve_handlers stays thin.

One caveat regardless of shape: mcp-go tool handlers return (*mcp.CallToolResult, error), and a returned error typically yields a generic -32603. Please confirm the SDK can actually carry a custom -32029 + a Retry-After hint on this path; if it can't, say so explicitly and drop the unreachable CodeRateLimited/RetryAfter plumbing rather than leaving dead intent. Either way, add a Serve-path test asserting the contract you land on.

I checked the mcp-go v0.55.0 Serve path while wiring this. Its handleToolCall wraps returned tool-handler errors as JSON-RPC INTERNAL_ERROR (-32603), so this seam cannot currently emit custom -32029 + RetryAfter without a lower-level transport hook or upstream mcp-go support.

Given that limitation, I avoided adding dead Code()/Data() plumbing. Instead, I added a small neutral pkg/mcp.RequestError marker for domain errors that should fail the MCP request rather than being converted into a successful CallToolResult with IsError=true. RateLimitedError implements that marker, and both the direct core tool handler and optimizer call_tool handler now preserve it as a handler error.

jerm-dro · 2026-06-25T21:24:19Z

+	ctx context.Context, identity *auth.Identity, name string,
+	args map[string]any, meta map[string]any,
+) (*vmcp.ToolCallResult, error) {
+	if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil {


nitpick: Allow returns three outcomes — nil, *RateLimitedError, or the raw limiter error (e.g. Redis unreachable). This returns the raw error too, so the decorator fails closed: a Redis blip fails every vMCP tool call. The HTTP middleware path consuming the same Allow fails open (slog.Warn("rate limit check failed, allowing request") → proceeds). Worth matching that posture here — block only on *RateLimitedError, log-and-allow on infra errors — so the two consumers of Allow behave consistently and a limiter outage doesn't take down tool calls. If fail-closed is actually desired for vMCP, a one-line comment saying so would settle it.

Fixed. The vMCP rate-limit decorator now matches the HTTP middleware posture: it blocks only on *RateLimitedError and logs/allows raw limiter infrastructure errors like Redis failures.

That keeps limiter outages from taking down all vMCP tool calls while still enforcing actual rate-limit denials.

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

…VMCP decorator seam. Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

jerm-dro · 2026-06-26T21:51:46Z

+		cleanupRedis(redisName)
+	})
+
+	ginkgo.It("rate-limits call_tool by the inner backend tool name", func() {


blocker: This spec fails consistently on all three K8s versions (:360 — Expected an error to have occurred. Got: <nil>): the second call_tool isn't being rate-limited.

Rather than debug this through a full Kind + Redis + operator + embedding-server e2e (slow, and hard to see what's going on inside), I'd suggest dropping this spec and reproducing the same scenario as a pkg/vmcp/server integration test: wire the real ratelimit.NewDecorator over a fake core and drive it through the optimizer call_tool handler, with a mock optimizer and mock embedding server (and an in-memory limiter instead of Redis).

That keeps the test fast and deterministic, exercises the real decorator (unlike the forced-RateLimitedError unit test), and gives you a much easier place to track this down. Rate limiting through the optimizer should work, so I don't think we want to ship with it silently not enforcing.

Sanskarzz requested review from ChrisJBurns, JAORMX, amirejaz, blkt, jerm-dro, jhrozek, rdimitrov, reyortiz3 and tgrunnagle as code owners June 14, 2026 19:35

github-actions Bot added the size/M Medium PR: 300-599 lines changed label Jun 14, 2026

Sanskarzz changed the title ~~Add ratelimiting support for vmcp optimizer~~ Configure rate limits on VirtualMCPServer PR B 2 Jun 14, 2026

github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 14, 2026

Sanskarzz force-pushed the ratelimitingVMCP3 branch from 57c3c4e to 7b9e3a2 Compare June 25, 2026 12:57

github-actions Bot added size/M Medium PR: 300-599 lines changed size/XL Extra large PR: 1000+ lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 25, 2026

github-actions Bot previously requested changes Jun 25, 2026

View reviewed changes

Sanskarzz requested a review from aponcedeleonch as a code owner June 25, 2026 13:21

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 25, 2026

Sanskarzz force-pushed the ratelimitingVMCP3 branch from 1c23728 to 7b9e3a2 Compare June 25, 2026 14:37

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 25, 2026

jerm-dro reviewed Jun 25, 2026

View reviewed changes

Sanskarzz added 4 commits June 26, 2026 14:43

Add ratelimiting support for vmcp optimizer

14bc749

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

Adds optimizer-aware vMCP rate limiting using the post-refactor core.…

bfac4d2

…VMCP decorator seam. Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

fix lint

5a4ee45

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

fix addressed review comments

5b31318

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>

Sanskarzz force-pushed the ratelimitingVMCP3 branch from 3ffd4fe to 5b31318 Compare June 26, 2026 09:14

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 26, 2026

jerm-dro reviewed Jun 26, 2026

View reviewed changes

Uh oh!

Conversation

Sanskarzz commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Test plan

API Compatibility

Changes

Does this introduce a user-facing change?

Implementation plan

Special notes for reviewers

Large PR Justification

Uh oh!

codecov Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jerm-dro commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sanskarzz commented Jun 16, 2026

Uh oh!

jerm-dro commented Jun 24, 2026

Uh oh!

Sanskarzz commented Jun 24, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Large PR Detected

How to unblock this PR:

Alternative:

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

jerm-dro left a comment

Choose a reason for hiding this comment

Uh oh!

jerm-dro Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sanskarzz Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jerm-dro Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sanskarzz Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jerm-dro Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sanskarzz commented Jun 14, 2026 •

edited

Loading

codecov Bot commented Jun 14, 2026 •

edited

Loading

jerm-dro commented Jun 15, 2026 •

edited

Loading