Skip to content

Configure rate limits on VirtualMCPServer PR B 2#5522

Open
Sanskarzz wants to merge 4 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP3
Open

Configure rate limits on VirtualMCPServer PR B 2#5522
Sanskarzz wants to merge 4 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP3

Conversation

@Sanskarzz

@Sanskarzz Sanskarzz commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds optimizer-aware vMCP rate limiting using the post-refactor core.VMCP decorator seam.

PR #5276 wired VirtualMCPServer.spec.config.rateLimiting into the vMCP runtime. After the vMCP core refactor, optimizer mode resolves the call_tool meta-tool to the real backend tool before invoking core.VMCP.CallTool. That means rate limiting no longer needs HTTP parsed-request rewriting. It can sit below the optimizer at the core CallTool seam and key buckets directly by the resolved backend tool name.

Follow-up to #5276 and aligned with the vMCP core refactor in #5606 .

Fixes #4552

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

This PR does not change the CRD schema or v1beta1 API surface.

Changes

File Change
pkg/ratelimit/errors.go Adds shared rate-limit error constants and typed RateLimitedError.
pkg/ratelimit/limiter.go Adds reusable Allow(...) decision helper.
pkg/ratelimit/middleware.go Reuses Allow(...) and extracts Redis-backed limiter construction into NewRedisLimiter(...); existing HTTP middleware remains available.
pkg/vmcp/ratelimit/decorator.go Adds a core.VMCP decorator that rate-limits CallTool using the resolved tool name.
pkg/vmcp/ratelimit/factory/limiter.go Builds and cleans up the Redis-backed limiter from vMCP runtime config.
pkg/vmcp/cli/serve.go Builds the vMCP rate limiter from runtime config and passes it to server.Config.RateLimiter.
pkg/vmcp/server/server.go Wraps core.New(...) with the rate-limit decorator before Serve/session optimizer handling.
tests Adds unit coverage for the decision helper, decorator, and limiter factory; keeps optimizer E2E coverage.

Does this introduce a user-facing change?

Yes. When vMCP optimizer is enabled, per-tool rate limits apply to the resolved backend tool name reached through call_tool, rather than the optimizer meta-tool name.

Implementation plan

Approved implementation plan
  1. Reuse the existing pkg/ratelimit limiter and Redis setup instead of duplicating rate-limit logic in vMCP.
  2. Extract the rate-limit decision into a reusable helper that returns a typed RateLimitedError.
  3. Build a Redis-backed limiter from vMCP runtime config in pkg/vmcp/ratelimit/factory.
  4. Pass the limiter through pkg/vmcp/cli/serve.go into server.Config.
  5. Wrap core.VMCP with a rate-limit decorator immediately after core.New(...).
  6. Let the session optimizer remain above the core decorator, so CallTool reaches the limiter with the resolved backend tool name.
  7. Remove the previous parsed-request rewrite/context-juggling implementation.
  8. Cover the new helper, factory, decorator, and optimizer behavior with tests.

Special notes for reviewers

  • This intentionally avoids optimizer-specific logic in pkg/ratelimit.
  • The vMCP decorator keys buckets using the name passed to core.VMCP.CallTool; after the Serve-path optimizer this is the resolved backend tool name.
  • pkg/ratelimit.NewMiddleware(...) remains available for existing HTTP middleware callers.
  • Extra reviewer attention requested: confirm the typed RateLimitedError is mapped to the desired client-visible rate-limit response on the Serve path.

Large PR Justification

This is a new feature package with a large test suite, and it needs to land as one coherent phase.

@github-actions github-actions Bot added the size/M Medium PR: 300-599 lines changed label Jun 14, 2026
@Sanskarzz Sanskarzz changed the title Add ratelimiting support for vmcp optimizer Configure rate limits on VirtualMCPServer PR B 2 Jun 14, 2026
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 14, 2026
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.25000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.36%. Comparing base (17d3432) to head (5b31318).

Files with missing lines Patch % Lines
pkg/vmcp/cli/serve.go 0.00% 4 Missing ⚠️
pkg/ratelimit/errors.go 0.00% 3 Missing ⚠️
pkg/ratelimit/middleware.go 83.33% 2 Missing and 1 partial ⚠️
pkg/vmcp/server/server.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5522   +/-   ##
=======================================
  Coverage   70.35%   70.36%           
=======================================
  Files         649      651    +2     
  Lines       66110    66152   +42     
=======================================
+ Hits        46513    46549   +36     
+ Misses      16254    16236   -18     
- Partials     3343     3367   +24     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jerm-dro

jerm-dro commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@Sanskarzz — for context, this is about Trey's in-flight vMCP interface refactor (epic #5419, RFC THV-0076), which splits vMCP into a domain core (the VMCP interface, core.New(cfg)) behind a thin transport layer (server.Serve). Phase 2 (#5431) re-homes the optimizer and the middleware chain onto that split. With that in mind —

Let's park this for now. In the post-refactor vMCP, per-tool rate limiting doesn't need to be HTTP middleware at all — it fits more naturally as a VMCP decorator at the CallTool seam. (It isn't in Trey's refactor scope simply because it hadn't landed when he started — not an oversight.)

The reason it fits: the optimizer is becoming the outermost VMCP layer, resolving call_tool → the real backend tool. Anything that needs to key off the real tool name therefore sits below the optimizer — authorization already does (it's the core admission seam, #5438), and rate limiting has the exact same dependency. So the flow becomes optimizer → {authz, rate-limit} → core: by the time CallTool reaches the limiter the tool name is already resolved, the bucket keys correctly, and the whole mechanism this PR adds to pkg/vmcp/ratelimit/factory (parsed-request rewrite, context juggling) disappears.

Roughly (signatures illustrative — align with the post-#5431 VMCP interface):

// Composition root (cli/serve.go, post-refactor):
var v core.VMCP = core.New(coreCfg) // authz admission seam lives here (#5438) — below the optimizer
v = ratelimit.WrapVMCP(v, limiter)  // keys the per-tool bucket on the resolved tool name
v = optimizer.WrapVMCP(v, ...)      // outermost: resolves call_tool -> backend tool first
srv, _ := server.Serve(ctx, v, serverCfg)
// pkg/vmcp/ratelimit — a VMCP decorator instead of HTTP middleware:
type rateLimitedVMCP struct {
    core.VMCP // embed: every other method passes through unchanged
    limiter   Limiter
}

func (v *rateLimitedVMCP) CallTool(
    ctx context.Context, id *auth.Identity, tool string, args map[string]any, meta vmcp.Meta,
) (*vmcp.CallToolResult, error) {
    // `tool` is already the backend tool name — the optimizer decorator wraps
    // this one and resolved call_tool -> tool before delegating here.
    if err := v.limiter.Allow(ctx, id, tool); err != nil {
        return nil, err // typed rate-limit error; transport maps it to the JSON-RPC 429
    }
    return v.VMCP.CallTool(ctx, id, tool, args, meta)
}

The one bit of prep this needs: pkg/ratelimit currently bundles the bucket decision with HTTP concerns (identity extraction, writing the 429). The decorator only wants the decision — so we'd expose limiter.Allow(ctx, id, tool) error as a standalone call and leave the transport-shaped pieces (identity at the edge, 429 mapping) where they are. No behavior change, just splitting "decide" from "HTTP-wrap."

If you'd like to take this on as the VMCP-wrapper version, that'd be very welcome — otherwise we can fold it in once the refactor lands. Either way I'd rather not merge the middleware-layer change now and then unwind it.

CC @tgrunnagle for awareness — no action needed from you.

@Sanskarzz

Copy link
Copy Markdown
Contributor Author

@jerm-dro Thanks, that makes sense. Once #5431 lands, I’m happy to rework this as a VMCP decorator and split the rate-limit decision from the HTTP transport response mapping as you suggested.

@jerm-dro

Copy link
Copy Markdown
Contributor

@Sanskarzz — the refactor has landed. PR #5606 ships the first core.VMCP decorator (code mode), which is a working reference for how to rework rate limiting.

The structural template is pkg/vmcp/codemode/decorator.go in that PR:

type decorator struct {
    core.VMCP  // embed: all other methods promoted unchanged
    cfg Config
}

func (d *decorator) CallTool(
    ctx context.Context, identity *auth.Identity,
    name string, args map[string]any, meta map[string]any,
) (*vmcp.ToolCallResult, error) {
    // `name` is already the resolved backend tool name — the optimizer sits above
    // this layer at the session level and resolves call_tool → backend name before
    // CallTool is ever reached, so no inner-tool extraction is needed here.
    // ... rate-limit check on name ...
    return d.VMCP.CallTool(ctx, identity, name, args, meta)
}

The wiring in server.go (also from #5606) shows the insertion point — right after core.New, before the session-layer optimizer:

coreVMCP, err = core.New(deriveCoreConfig(...))
// ...
if cfg.RateLimitConfig != nil {
    coreVMCP = ratelimit.NewDecorator(coreVMCP, cfg.RateLimitConfig)
}
// optimizer wraps session tools above this, so call_tool → real name resolution
// happens before CallTool is invoked on the decorator

A couple of things carry forward from the original discussion:

  • The pkg/ratelimit.NewMiddleware bucket decision needs to be extracted as a standalone Allow(ctx, identity, tool) error before you can reuse it in the decorator cleanly — separate the "decide" from the "HTTP 429 write."
  • With name already resolved when CallTool is reached, the parsed-request rewrite and context juggling in pkg/vmcp/ratelimit/factory/middleware.go go away entirely.

Give pkg/vmcp/codemode a read for the full shape and let me know if anything's unclear.

@Sanskarzz

Copy link
Copy Markdown
Contributor Author

@jerm-dro Thanks for the explanation. I will rebase onto the latest main and will see the new core.VMCP/decorator shape in codemode.

@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP3 branch from 57c3c4e to 7b9e3a2 Compare June 25, 2026 12:57
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed size/XL Extra large PR: 1000+ lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 25, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 25, 2026
@github-actions github-actions Bot dismissed their stale review June 25, 2026 13:27

Large PR justification has been provided. Thank you!

@github-actions

Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 25, 2026
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP3 branch from 1c23728 to 7b9e3a2 Compare June 25, 2026 14:37
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 25, 2026

@jerm-dro jerm-dro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @Sanskarzz — the overall direction is right. Moving rate limiting to a core.VMCP decorator below the optimizer so buckets key on the resolved backend tool name is exactly the correct seam, and reusing pkg/ratelimit (the shared Allow helper + NewRedisLimiter) instead of duplicating logic is the right call. The decorator is clean too — it composes over the existing interface rather than modifying it.

One blocker before merge (inline): the typed RateLimitedError isn't surfaced to clients on the Serve path, so the -32029 code and RetryAfter are dropped — the mapping you flagged for review. I sketched a generic CodedError interface so the fix doesn't couple the transport handler to pkg/ratelimit. Plus a nitpick on fail-open vs fail-closed consistency. Happy to re-review once the client mapping is wired.

Comment thread pkg/vmcp/ratelimit/decorator.go Outdated
args map[string]any, meta map[string]any,
) (*vmcp.ToolCallResult, error) {
if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil {
return nil, err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: This *RateLimitedError isn't surfaced to clients on the Serve path. It propagates to coreToolHandler (serve_handlers.go:196-203) and hits the generic return mcp.NewToolResultError(err.Error()), nil — a successful response with IsError: true and text "Rate limit exceeded". So the client gets neither the -32029 CodeRateLimited code nor RetryAfter; both are only honored by the HTTP middleware path the optimizer bypasses, and this PR removes the test that covered the code. This is the mapping you flagged for review — it's currently unwired.

I'd avoid teaching the transport handler about *RateLimitedError directly (it couples serve_handlers to pkg/ratelimit). Instead, define a small capability interface that domain errors opt into, and have the handler ask for it generically:

// neutral home both layers can import without a cycle (e.g. pkg/mcp)
// CodedError is implemented by domain errors that should surface a specific
// JSON-RPC error code (and optional structured data) instead of degrading to
// a generic tool error.
type CodedError interface {
    error
    Code() int64            // JSON-RPC error code
    Data() map[string]any   // optional structured data (nil if none)
}

RateLimitedError implements it:

func (*RateLimitedError) Code() int64 { return CodeRateLimited }
func (e *RateLimitedError) Data() map[string]any {
    return map[string]any{"retryAfterSeconds": e.RetryAfter.Seconds()}
}

Then the handler has one generic branch and no knowledge of rate limiting:

if err != nil {
    var coded mcp.CodedError
    if errors.As(err, &coded) {
        return codedErrorResult(req.Params.ID, coded), nil
    }
    return mcp.NewToolResultError(err.Error()), nil
}

Bonus: the existing ErrAuthorizationFailed special-case just above can fold into the same path by having the authz error implement CodedError (with a generic message), removing that branch too. A natural home for the error → result conversion is the conversion package, so serve_handlers stays thin.

One caveat regardless of shape: mcp-go tool handlers return (*mcp.CallToolResult, error), and a returned error typically yields a generic -32603. Please confirm the SDK can actually carry a custom -32029 + a Retry-After hint on this path; if it can't, say so explicitly and drop the unreachable CodeRateLimited/RetryAfter plumbing rather than leaving dead intent. Either way, add a Serve-path test asserting the contract you land on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the mcp-go v0.55.0 Serve path while wiring this. Its handleToolCall wraps returned tool-handler errors as JSON-RPC INTERNAL_ERROR (-32603), so this seam cannot currently emit custom -32029 + RetryAfter without a lower-level transport hook or upstream mcp-go support.

Given that limitation, I avoided adding dead Code()/Data() plumbing. Instead, I added a small neutral pkg/mcp.RequestError marker for domain errors that should fail the MCP request rather than being converted into a successful CallToolResult with IsError=true. RateLimitedError implements that marker, and both the direct core tool handler and optimizer call_tool handler now preserve it as a handler error.

ctx context.Context, identity *auth.Identity, name string,
args map[string]any, meta map[string]any,
) (*vmcp.ToolCallResult, error) {
if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Allow returns three outcomes — nil, *RateLimitedError, or the raw limiter error (e.g. Redis unreachable). This returns the raw error too, so the decorator fails closed: a Redis blip fails every vMCP tool call. The HTTP middleware path consuming the same Allow fails open (slog.Warn("rate limit check failed, allowing request") → proceeds). Worth matching that posture here — block only on *RateLimitedError, log-and-allow on infra errors — so the two consumers of Allow behave consistently and a limiter outage doesn't take down tool calls. If fail-closed is actually desired for vMCP, a one-line comment saying so would settle it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The vMCP rate-limit decorator now matches the HTTP middleware posture: it blocks only on *RateLimitedError and logs/allows raw limiter infrastructure errors like Redis failures.

That keeps limiter outages from taking down all vMCP tool calls while still enforcing actual rate-limit denials.

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
…VMCP decorator seam.

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP3 branch from 3ffd4fe to 5b31318 Compare June 26, 2026 09:14
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 26, 2026
cleanupRedis(redisName)
})

ginkgo.It("rate-limits call_tool by the inner backend tool name", func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: This spec fails consistently on all three K8s versions (:360Expected an error to have occurred. Got: <nil>): the second call_tool isn't being rate-limited.

Rather than debug this through a full Kind + Redis + operator + embedding-server e2e (slow, and hard to see what's going on inside), I'd suggest dropping this spec and reproducing the same scenario as a pkg/vmcp/server integration test: wire the real ratelimit.NewDecorator over a fake core and drive it through the optimizer call_tool handler, with a mock optimizer and mock embedding server (and an in-memory limiter instead of Redis).

That keeps the test fast and deterministic, exercises the real decorator (unlike the forced-RateLimitedError unit test), and gives you a much easier place to track this down. Rate limiting through the optimizer should work, so I don't think we want to ship with it silently not enforcing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure rate limits on VirtualMCPServer

2 participants