Add rate limiting to a binding¶
MCPRateLimit is a Protocol — anything with a single
consume(request, token) -> int | None method qualifies. The return value
is the suggested retry-after-in-seconds when the limit has been hit, or
None to allow the call.
The shipped FixedWindowRateLimit is backed by django.core.cache and is
the right choice for most Django deployments — the cache is already shared
across worker processes when you're running anything serious.
from rest_framework_mcp import MCPServer
from rest_framework_mcp.auth.rate_limits.fixed_window_rate_limit import (
FixedWindowRateLimit,
)
from rest_framework_services.types.service_spec import ServiceSpec
server = MCPServer(name="my-app")
server.register_service_tool(
name="invoices.create",
spec=ServiceSpec(service=create_invoice, ...),
rate_limits=[
FixedWindowRateLimit(max_calls=120, per_seconds=60, namespace="burst"),
FixedWindowRateLimit(max_calls=10_000, per_seconds=86_400, namespace="daily"),
],
)
Limits are AND-combined — denial from any limiter immediately rejects the call. The handler stops at the first denial (no point further consuming quota when the call is going to fail).
Wire shape¶
Denial returns JSON-RPC error code -32005 RATE_LIMITED with
data.retryAfter set to the suggested seconds:
{
"jsonrpc": "2.0",
"id": 1,
"error": {
"code": -32005,
"message": "Rate limit exceeded",
"data": {"retryAfter": 42}
}
}
MCP clients that respect this should back off accordingly. Misbehaving clients keep hitting the same limit; the cache counter is per-bucket so the counter doesn't grow unbounded.
Custom keys¶
The default FixedWindowRateLimit keys per authenticated user (or
REMOTE_ADDR for anonymous requests). Override with a callable:
def per_org_key(request, token) -> str:
return f"org:{token.user.organization_id}"
server.register_service_tool(
...,
rate_limits=[
FixedWindowRateLimit(
max_calls=10_000, per_seconds=60, key=per_org_key
),
],
)
Choosing a scheme¶
Three implementations ship in rest_framework_mcp.auth.rate_limits:
FixedWindowRateLimit— bucketed counters, atomic viacache.add+cache.incr. Cheaper, but a client can issue2 * max_callsacross the bucket boundary.SlidingWindowRateLimit— list of timestamps in cache, pruned on every call. Smoother (limits the rolling-window rate, not bucket count), at the cost of a small read-modify-write race under concurrency.TokenBucketRateLimit— bucket of tokens that refills continuously atrefill_per_second. Burst-friendly: a full bucket absorbs a sudden burst ofcapacityrequests, then the steady-state rate clamps torefill_per_second. Useful when consumers naturally batch.
from rest_framework_mcp.auth.rate_limits import (
FixedWindowRateLimit,
SlidingWindowRateLimit,
TokenBucketRateLimit,
)
server.register_service_tool(
name="invoices.create",
spec=...,
rate_limits=[
# Steady-state 1 call/sec, but absorb bursts up to 60.
TokenBucketRateLimit(capacity=60, refill_per_second=1.0),
],
)
All three share the same consume(request, token) -> int | None signature,
so swapping is a one-line change. The shipped classes are read-modify-write
on django.core.cache — for strict atomicity under heavy concurrency, back
the limiter with a Redis-Lua script implementing the same Protocol.
Implementing your own¶
For leaky buckets, GCRA, distributed Lua-backed schemes etc. — implement the Protocol against your storage of choice:
from django.http import HttpRequest
from rest_framework_mcp import TokenInfo
class LeakyBucketRateLimit:
"""Naive sketch — production should use redis-py with Lua for atomicity."""
def __init__(self, *, capacity: int, leak_per_second: float) -> None:
self._capacity = capacity
self._leak = leak_per_second
def consume(self, request: HttpRequest, token: TokenInfo) -> int | None:
...
Pass it to a binding the same way:
server.register_service_tool(..., rate_limits=[LeakyBucketRateLimit(capacity=60, leak_per_second=1)])
Tips¶
- Limiters are constructed once per binding at registration time —
shared across all dispatches. Keep
__init__cheap. - State that crosses requests must live in shared storage, not on the
instance.
FixedWindowRateLimitfollows this rule (Django cache); custom limiters must too if they're going to work under multiple worker processes. - The default Django
locmemcache is process-local. In production, use Memcached or Redis so all workers see the same counter.