Back to product
How it works

How MCP Server Toolkit works

A complete tour of the architecture: data flow, subsystems, technology choices, performance, and where the project is heading next.

TL;DR

A FastAPI app that speaks the Model Context Protocol over both stdio and streamable HTTP, with OAuth in front, OpenTelemetry around it, and a plugin contract small enough to read in one sitting. Four production plugins ship in the box; bring or write your own to extend it.

Core data flow

From the moment a request enters the system to the moment a response leaves it.

  ┌──────────────┐
  │  MCP Client  │  desktop or hosted
  └──────┬───────┘
         │
   ┌─────┴─────┐
   │           │
  stdio   HTTP+SSE
   │           │
   ▼           ▼
  ┌────────────────────────────────────────────────────┐
  │ Transport layer (jsonrpc framing or SSE writer)     │
  │   normalises both protocols into Request object     │
  └──────────────────┬─────────────────────────────────┘
                     ▼
  ┌────────────────────────────────────────────────────┐
  │ OAuth middleware                                   │
  │   stdio  → no-op (trust local user)                │
  │   HTTP   → verify Bearer, cache JWKS, decode scope │
  └──────────────────┬─────────────────────────────────┘
                     ▼
  ┌────────────────────────────────────────────────────┐
  │ Scope decorator                                    │
  │   tool registered with @tool(scope="write")        │
  │   request must carry that scope or 403             │
  └──────────────────┬─────────────────────────────────┘
                     ▼
  ┌────────────────────────────────────────────────────┐
  │ Plugin tool body                                   │
  │   wrapped in OTel span with                        │
  │     plugin, tool, scope, args, latency, outcome    │
  └──────────────────┬─────────────────────────────────┘
                     ▼
                Response back through transport

Each subsystem, deep-dived

Every component in the data flow above, opened up and explained.

Transport layer

Two transports, one plugin contract. The stdio transport reads JSON-RPC messages from standard input, dispatches them, and writes results to standard output, framed by Content-Length headers exactly as the MCP spec defines. The HTTP transport listens on a configurable port and exposes /mcp as the streamable HTTP endpoint with SSE as the streaming back-channel.

Both implementations build the same internal Request object before anything else runs. From the perspective of the auth layer and the plugin layer, the transport is invisible. This is what lets a single plugin module work in both Claude Desktop (over stdio) and a hosted Vercel deployment (over HTTP) without modification.

OAuth and scope gating

OAuth 2.1 with PKCE. The toolkit ships with no opinion about which provider you use; we have working examples for Auth0, Clerk, Logto, and a self-hosted Hydra. The verifier caches the provider’s JWKS for fifteen minutes, decodes the access token, validates the issuer and audience, extracts the scopes, and attaches them to the Request object.

Tools declare their required scope in their decorator. The scope decorator runs after auth but before the tool body. If a tool decorated @tool(scope="write") is invoked without that scope, the call is rejected with an MCP-compliant error. This means write-protected tools (Postgres mutations, GitHub PR creation, filesystem writes) can sit safely next to read-only tools without separate routing.

Plugin contract and lifecycle

A plugin is a Python class with a small surface. name identifies it. tools, resources, and prompts declare what it offers. lifespan is an optional async context manager that runs at server start (acquire a database pool, warm a cache, fetch a schema) and runs again on shutdown (close, flush, release).

Plugins are discovered from a configurable list of paths. Each plugin is loaded once at server boot. Hot reload is intentionally not supported , plugin code is trusted, plugin failures crash the server loudly, and the operator restarts. This is the boring, correct choice. It is also the choice that means you can statically analyse the running set of plugins.

OpenTelemetry pipeline

Every tool invocation creates a span. The span name is {plugin}.{tool}. Attributes include scope, argument size, the user (when authenticated), and the outcome (success, error, denied). Logs emitted from inside the tool body inherit the trace ID; structured log output is JSON. Metrics are derived from spans by the OTel collector , request count and latency histogram per tool come for free.

OTLP is the only export format. Point it at Tempo, Honeycomb, Datadog, Grafana Cloud, or anything else that speaks OTLP. There is no custom format and no agent installed in the container.

Configuration and secrets

Pydantic Settings reads the environment, type-checks each value, and exposes a single typed configuration object to the rest of the app. Secrets are SecretStr so accidental string interpolation is rejected. The example .env documents every variable; the schema makes it impossible to start the server with a missing required value.

For Vault and AWS Secrets Manager users, an optional secret-resolver layer sits in front of Pydantic Settings: write AWSSM:my/path as a value and the resolver fetches it at boot. The resolver is opt-in; the default path is plain environment variables.

The four built-in plugins

Filesystem. Sandboxed read, write, list, and search across an allow-listed root. Path traversal is rejected by canonicalising paths and re-checking. MIME detection by libmagic. Maximum file size enforced before read. Write scope required for any mutation.

Postgres. Schema introspection (read scope). Parameterised queries with statement timeouts. Read-only mode rejects DDL and DML at parse time using a deny-list of tokens that is conservative by design. Read-write mode requires the write scope.

GitHub. Authenticated with a fine-grained PAT. Issues, pull requests, file contents, comments. Rate-limit-aware: when GitHub returns the rate-limit headers, the plugin sleeps with jitter and retries up to a budget configurable per request.

SarmaLink. Calls the SarmaLink-AI failover stack as a sub-tool. Lets a calling agent invoke any of the thirty-six engines via a single MCP tool. Useful for delegating one part of a workflow to a different model than the calling agent.

CI and release

Every push runs ruff (lint), mypy strict (types), pytest (unit and integration), and a smoke test that boots the server in both transports and exercises a representative tool. The Docker image is built and tagged on every release. The release process is a one-line GitHub Actions trigger; semver is enforced by the changelog tooling.

Why this stack

The road not taken matters as much as the road taken. Here is what was picked, why, and what was rejected and why.

Picked

Python 3.12

Best asyncio performance available in 2026. The mcp SDK is Python-first. The ecosystem of libraries (asyncpg, httpx, OTel) is mature.

Not this

Node.js , the official mcp SDK exists in TypeScript too, but the broader ecosystem of MCP plugins is heavier on the Python side. We chose to follow that gravity.

Picked

FastAPI

Streamable HTTP, SSE, OpenAPI for management endpoints, dependency injection, and a good middleware story. Tiny overhead next to a model round trip.

Not this

Starlette directly , would have meant rewriting things FastAPI already does well, with no measurable gain.

Picked

mcp Python SDK

The official SDK. Tracks the spec. Anything we built ourselves would diverge inside a quarter.

Not this

A hand-rolled JSON-RPC implementation , fun once, then a maintenance burden.

Picked

Pydantic v2 settings

Type-checked twelve-factor configuration. SecretStr stops accidental log leaks. v2 is fast enough that boot config takes microseconds.

Not this

python-dotenv plus manual parsing , works, but loses type safety and validation at the wrong layer.

Picked

OpenTelemetry

Vendor-neutral spans, metrics, and logs. OTLP exports to anything modern. The right answer in 2026.

Not this

Sentry / Datadog SDK directly , locks the codebase into one back-end. We export to those via OTel collector instead.

Picked

uv

Fastest dependency resolver and venv builder available. Reproducible locks. Replaces pip, virtualenv, and pip-tools.

Not this

Poetry , slower, less ergonomic for scripts, more complex packaging story.

Picked

OAuth 2.1 + PKCE

The right shape for hosted MCP. Maps cleanly to Auth0, Clerk, Logto, Hydra. Scopes give per-tool gating without per-tool plumbing.

Not this

API keys , fine for internal use, wrong for hosted multi-user. The toolkit accepts both via env config.

Performance & observability

The performance budget is set by the LLM round trip the response will eventually feed, not by the tool call itself. The toolkit aims to be invisible in the latency budget. In practice, on a small Fly.io machine, tool dispatch overhead (transport, auth, scope, OTel span) is between 800 microseconds and 2 milliseconds. The expensive bits are upstream: the database query, the GitHub API call, the SarmaLink-AI failover.

Cold start in a 256MB container is under sixty seconds; in a 512MB container it is under twenty. There is no JIT to warm. There is no global mutex. asyncio is the only concurrency primitive, and the plugin contract requires every tool function to be async.

Observability is via OpenTelemetry only. Spans, structured logs, and derived metrics are exported via OTLP. The toolkit attaches the trace ID to MCP responses as an extension field so the calling agent can correlate its prompt-level traces with the toolkit’s tool-level traces. This is the trick that made cross-system debugging tolerable on the projects this code came out of.

Where it is heading

  • Plugin signing and verification, for environments that need it.
  • A fanout transport that lets one server expose plugins from multiple sub-servers.
  • Built-in rate limiting per token and per tool, configured via the plugin contract.
  • Wider plugin family: Slack, Linear, Notion, S3, Snowflake.
  • A worked example of running the toolkit as a Cloudflare Worker (HTTP transport only) for edge MCP gateways.

Read the full whitepaper for the formal technical write-up.