Configure rate limits on VirtualMCPServer PR B 2#5522
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5522 +/- ##
=======================================
Coverage 70.35% 70.36%
=======================================
Files 649 651 +2
Lines 66110 66152 +42
=======================================
+ Hits 46513 46549 +36
+ Misses 16254 16236 -18
- Partials 3343 3367 +24 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@Sanskarzz — for context, this is about Trey's in-flight vMCP interface refactor (epic #5419, RFC THV-0076), which splits vMCP into a domain core (the Let's park this for now. In the post-refactor vMCP, per-tool rate limiting doesn't need to be HTTP middleware at all — it fits more naturally as a The reason it fits: the optimizer is becoming the outermost Roughly (signatures illustrative — align with the post-#5431 // Composition root (cli/serve.go, post-refactor):
var v core.VMCP = core.New(coreCfg) // authz admission seam lives here (#5438) — below the optimizer
v = ratelimit.WrapVMCP(v, limiter) // keys the per-tool bucket on the resolved tool name
v = optimizer.WrapVMCP(v, ...) // outermost: resolves call_tool -> backend tool first
srv, _ := server.Serve(ctx, v, serverCfg)// pkg/vmcp/ratelimit — a VMCP decorator instead of HTTP middleware:
type rateLimitedVMCP struct {
core.VMCP // embed: every other method passes through unchanged
limiter Limiter
}
func (v *rateLimitedVMCP) CallTool(
ctx context.Context, id *auth.Identity, tool string, args map[string]any, meta vmcp.Meta,
) (*vmcp.CallToolResult, error) {
// `tool` is already the backend tool name — the optimizer decorator wraps
// this one and resolved call_tool -> tool before delegating here.
if err := v.limiter.Allow(ctx, id, tool); err != nil {
return nil, err // typed rate-limit error; transport maps it to the JSON-RPC 429
}
return v.VMCP.CallTool(ctx, id, tool, args, meta)
}The one bit of prep this needs: If you'd like to take this on as the CC @tgrunnagle for awareness — no action needed from you. |
|
@Sanskarzz — the refactor has landed. PR #5606 ships the first The structural template is type decorator struct {
core.VMCP // embed: all other methods promoted unchanged
cfg Config
}
func (d *decorator) CallTool(
ctx context.Context, identity *auth.Identity,
name string, args map[string]any, meta map[string]any,
) (*vmcp.ToolCallResult, error) {
// `name` is already the resolved backend tool name — the optimizer sits above
// this layer at the session level and resolves call_tool → backend name before
// CallTool is ever reached, so no inner-tool extraction is needed here.
// ... rate-limit check on name ...
return d.VMCP.CallTool(ctx, identity, name, args, meta)
}The wiring in coreVMCP, err = core.New(deriveCoreConfig(...))
// ...
if cfg.RateLimitConfig != nil {
coreVMCP = ratelimit.NewDecorator(coreVMCP, cfg.RateLimitConfig)
}
// optimizer wraps session tools above this, so call_tool → real name resolution
// happens before CallTool is invoked on the decoratorA couple of things carry forward from the original discussion:
Give |
|
@jerm-dro Thanks for the explanation. I will rebase onto the latest main and will see the new core.VMCP/decorator shape in codemode. |
57c3c4e to
7b9e3a2
Compare
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
1c23728 to
7b9e3a2
Compare
jerm-dro
left a comment
There was a problem hiding this comment.
Thanks for this, @Sanskarzz — the overall direction is right. Moving rate limiting to a core.VMCP decorator below the optimizer so buckets key on the resolved backend tool name is exactly the correct seam, and reusing pkg/ratelimit (the shared Allow helper + NewRedisLimiter) instead of duplicating logic is the right call. The decorator is clean too — it composes over the existing interface rather than modifying it.
One blocker before merge (inline): the typed RateLimitedError isn't surfaced to clients on the Serve path, so the -32029 code and RetryAfter are dropped — the mapping you flagged for review. I sketched a generic CodedError interface so the fix doesn't couple the transport handler to pkg/ratelimit. Plus a nitpick on fail-open vs fail-closed consistency. Happy to re-review once the client mapping is wired.
| args map[string]any, meta map[string]any, | ||
| ) (*vmcp.ToolCallResult, error) { | ||
| if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil { | ||
| return nil, err |
There was a problem hiding this comment.
blocker: This *RateLimitedError isn't surfaced to clients on the Serve path. It propagates to coreToolHandler (serve_handlers.go:196-203) and hits the generic return mcp.NewToolResultError(err.Error()), nil — a successful response with IsError: true and text "Rate limit exceeded". So the client gets neither the -32029 CodeRateLimited code nor RetryAfter; both are only honored by the HTTP middleware path the optimizer bypasses, and this PR removes the test that covered the code. This is the mapping you flagged for review — it's currently unwired.
I'd avoid teaching the transport handler about *RateLimitedError directly (it couples serve_handlers to pkg/ratelimit). Instead, define a small capability interface that domain errors opt into, and have the handler ask for it generically:
// neutral home both layers can import without a cycle (e.g. pkg/mcp)
// CodedError is implemented by domain errors that should surface a specific
// JSON-RPC error code (and optional structured data) instead of degrading to
// a generic tool error.
type CodedError interface {
error
Code() int64 // JSON-RPC error code
Data() map[string]any // optional structured data (nil if none)
}RateLimitedError implements it:
func (*RateLimitedError) Code() int64 { return CodeRateLimited }
func (e *RateLimitedError) Data() map[string]any {
return map[string]any{"retryAfterSeconds": e.RetryAfter.Seconds()}
}Then the handler has one generic branch and no knowledge of rate limiting:
if err != nil {
var coded mcp.CodedError
if errors.As(err, &coded) {
return codedErrorResult(req.Params.ID, coded), nil
}
return mcp.NewToolResultError(err.Error()), nil
}Bonus: the existing ErrAuthorizationFailed special-case just above can fold into the same path by having the authz error implement CodedError (with a generic message), removing that branch too. A natural home for the error → result conversion is the conversion package, so serve_handlers stays thin.
One caveat regardless of shape: mcp-go tool handlers return (*mcp.CallToolResult, error), and a returned error typically yields a generic -32603. Please confirm the SDK can actually carry a custom -32029 + a Retry-After hint on this path; if it can't, say so explicitly and drop the unreachable CodeRateLimited/RetryAfter plumbing rather than leaving dead intent. Either way, add a Serve-path test asserting the contract you land on.
There was a problem hiding this comment.
I checked the mcp-go v0.55.0 Serve path while wiring this. Its handleToolCall wraps returned tool-handler errors as JSON-RPC INTERNAL_ERROR (-32603), so this seam cannot currently emit custom -32029 + RetryAfter without a lower-level transport hook or upstream mcp-go support.
Given that limitation, I avoided adding dead Code()/Data() plumbing. Instead, I added a small neutral pkg/mcp.RequestError marker for domain errors that should fail the MCP request rather than being converted into a successful CallToolResult with IsError=true. RateLimitedError implements that marker, and both the direct core tool handler and optimizer call_tool handler now preserve it as a handler error.
| ctx context.Context, identity *auth.Identity, name string, | ||
| args map[string]any, meta map[string]any, | ||
| ) (*vmcp.ToolCallResult, error) { | ||
| if err := baseratelimit.Allow(ctx, d.limiter, identity, name); err != nil { |
There was a problem hiding this comment.
nitpick: Allow returns three outcomes — nil, *RateLimitedError, or the raw limiter error (e.g. Redis unreachable). This returns the raw error too, so the decorator fails closed: a Redis blip fails every vMCP tool call. The HTTP middleware path consuming the same Allow fails open (slog.Warn("rate limit check failed, allowing request") → proceeds). Worth matching that posture here — block only on *RateLimitedError, log-and-allow on infra errors — so the two consumers of Allow behave consistently and a limiter outage doesn't take down tool calls. If fail-closed is actually desired for vMCP, a one-line comment saying so would settle it.
There was a problem hiding this comment.
Fixed. The vMCP rate-limit decorator now matches the HTTP middleware posture: it blocks only on *RateLimitedError and logs/allows raw limiter infrastructure errors like Redis failures.
That keeps limiter outages from taking down all vMCP tool calls while still enforcing actual rate-limit denials.
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
…VMCP decorator seam. Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
3ffd4fe to
5b31318
Compare
| cleanupRedis(redisName) | ||
| }) | ||
|
|
||
| ginkgo.It("rate-limits call_tool by the inner backend tool name", func() { |
There was a problem hiding this comment.
blocker: This spec fails consistently on all three K8s versions (:360 — Expected an error to have occurred. Got: <nil>): the second call_tool isn't being rate-limited.
Rather than debug this through a full Kind + Redis + operator + embedding-server e2e (slow, and hard to see what's going on inside), I'd suggest dropping this spec and reproducing the same scenario as a pkg/vmcp/server integration test: wire the real ratelimit.NewDecorator over a fake core and drive it through the optimizer call_tool handler, with a mock optimizer and mock embedding server (and an in-memory limiter instead of Redis).
That keeps the test fast and deterministic, exercises the real decorator (unlike the forced-RateLimitedError unit test), and gives you a much easier place to track this down. Rate limiting through the optimizer should work, so I don't think we want to ship with it silently not enforcing.
Summary
This PR adds optimizer-aware vMCP rate limiting using the post-refactor
core.VMCPdecorator seam.PR #5276 wired
VirtualMCPServer.spec.config.rateLimitinginto the vMCP runtime. After the vMCP core refactor, optimizer mode resolves thecall_toolmeta-tool to the real backend tool before invokingcore.VMCP.CallTool. That means rate limiting no longer needs HTTP parsed-request rewriting. It can sit below the optimizer at the coreCallToolseam and key buckets directly by the resolved backend tool name.Follow-up to #5276 and aligned with the vMCP core refactor in #5606 .
Fixes #4552
Type of change
Test plan
task test)task test-e2e)task lint-fix)API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.This PR does not change the CRD schema or
v1beta1API surface.Changes
pkg/ratelimit/errors.goRateLimitedError.pkg/ratelimit/limiter.goAllow(...)decision helper.pkg/ratelimit/middleware.goAllow(...)and extracts Redis-backed limiter construction intoNewRedisLimiter(...); existing HTTP middleware remains available.pkg/vmcp/ratelimit/decorator.gocore.VMCPdecorator that rate-limitsCallToolusing the resolved tool name.pkg/vmcp/ratelimit/factory/limiter.gopkg/vmcp/cli/serve.goserver.Config.RateLimiter.pkg/vmcp/server/server.gocore.New(...)with the rate-limit decorator before Serve/session optimizer handling.Does this introduce a user-facing change?
Yes. When vMCP optimizer is enabled, per-tool rate limits apply to the resolved backend tool name reached through
call_tool, rather than the optimizer meta-tool name.Implementation plan
Approved implementation plan
pkg/ratelimitlimiter and Redis setup instead of duplicating rate-limit logic in vMCP.RateLimitedError.pkg/vmcp/ratelimit/factory.pkg/vmcp/cli/serve.gointoserver.Config.core.VMCPwith a rate-limit decorator immediately aftercore.New(...).CallToolreaches the limiter with the resolved backend tool name.Special notes for reviewers
pkg/ratelimit.namepassed tocore.VMCP.CallTool; after the Serve-path optimizer this is the resolved backend tool name.pkg/ratelimit.NewMiddleware(...)remains available for existing HTTP middleware callers.RateLimitedErroris mapped to the desired client-visible rate-limit response on the Serve path.Large PR Justification
This is a new feature package with a large test suite, and it needs to land as one coherent phase.