Production MCP server, skeleton to deployed
Model Context Protocol is the open spec for letting an AI assistant call your tools, read your resources, and use your prompts. Most MCP servers people show you online are toy demos. This is the full path from skeleton to a hosted production MCP server you can plug into Claude Desktop, ChatGPT, or any compatible client.
What MCP actually is
MCP is a transport-agnostic JSON-RPC contract between an AI client and your server. The client (Claude Desktop, ChatGPT custom connector, Zed, Cursor, anything that speaks the spec) discovers three kinds of capability you expose: tools the model can call, resources the model can read, and prompts the user can invoke as templates. Underneath, the wire format is JSON-RPC 2.0; the transport can be stdio (the server is a subprocess of the client), streamable HTTP (the server runs anywhere and the client connects over the network), or the older SSE pairing. Everything else, schemas, authentication, capability negotiation, sits on top of those primitives.
The reason MCP matters is not the spec itself. It is that you write the server once and any compliant client gets your capabilities for free. Before MCP, every assistant had its own plugin format and you wrote the same Postgres query tool four times.
Python or TypeScript
There are two healthy SDKs. fastmcp in Python wraps the official Python SDK with a FastAPI-style decorator API; it is the fastest way to ship a server on your own. @modelcontextprotocol/sdk in TypeScript is the official Node implementation; it slots into an existing Express or Fastify app naturally.
My default for solo work is Python with FastMCP. The decorator syntax is tight, the type hints become tool schemas automatically, and the deploy story (a single uv project, one Dockerfile) is boring in a good way. I reach for TypeScript when the MCP server lives inside a Node service that already exists, because shoving a Python sidecar next to it is more pain than just using the TS SDK.
An MCP server is not a microservice. It is the typed surface a model uses to act on your world, and the failure mode of a bad one is the model lying convincingly.
The skeleton
New folder, fresh uv project, one dependency, one file. This is the smallest valid MCP server: a single tool that returns the server time.
bashmkdir mcp-acme && cd mcp-acme uv init --package mcp-acme uv add fastmcp mkdir -p src/mcp_acme touch src/mcp_acme/server.py
python# src/mcp_acme/server.py from datetime import datetime, timezone from fastmcp import FastMCP mcp = FastMCP(name="acme", version="0.1.0") @mcp.tool def server_time() -> dict: """Return the server's current time in ISO 8601 UTC.""" now = datetime.now(timezone.utc) return {"iso": now.isoformat(), "epoch": int(now.timestamp())} if __name__ == "__main__": mcp.run() # defaults to stdio transport
Run it with uv run python -m mcp_acme.server and the process sits on stdin/stdout waiting for a client. That is the entire contract for local use. The next sections add what makes it useful and what makes it production grade.
A real tool
Toy tools are a trap because they let you skip the parts that bite later: parameter validation, structured responses, error shape. Here is a Postgres query tool with a Pydantic input model, a constrained result shape, and explicit error handling. The model passes you a SQL string; you decide what to allow.
python# src/mcp_acme/tools/db.py from typing import Literal from pydantic import BaseModel, Field import asyncpg from fastmcp import FastMCP mcp = FastMCP(name="acme-db", version="0.1.0") class QueryArgs(BaseModel): sql: str = Field(..., description="A single SELECT statement.") limit: int = Field(100, ge=1, le=1000) class Row(BaseModel): data: dict class QueryResult(BaseModel): status: Literal["ok", "error"] rows: list[Row] = [] error: str | None = None POOL: asyncpg.Pool | None = None async def get_pool() -> asyncpg.Pool: global POOL if POOL is None: POOL = await asyncpg.create_pool(dsn=os.environ["DATABASE_URL"], min_size=1, max_size=5) return POOL @mcp.tool async def db_query(args: QueryArgs) -> QueryResult: """Run a read-only SQL query. SELECT only; anything else is rejected.""" stmt = args.sql.strip().rstrip(";") if not stmt.lower().startswith("select"): return QueryResult(status="error", error="Only SELECT statements are permitted.") pool = await get_pool() try: async with pool.acquire() as conn: await conn.execute("SET TRANSACTION READ ONLY") records = await conn.fetch(f"{stmt} LIMIT {args.limit}") return QueryResult(status="ok", rows=[Row(data=dict(r)) for r in records]) except Exception as e: return QueryResult(status="error", error=str(e))
Two things to notice. First, the schema the client sees is generated from QueryArgs; the model gets useful hints (the docstring, the field descriptions, the bounds on limit) without you writing JSON schema by hand. Second, the dangerous verb (anything that mutates) is rejected at the application layer, and the database transaction is opened read-only as a second line of defence. You want both belts on.
Resources and subscriptions
Resources are the read-only side. They are addressed by URI and returned as text or binary, and clients can subscribe to a URI to be notified when its content changes. A clean use case is exposing a folder of markdown notes as resource://notes/<slug>; the assistant can list them, read them, and react when one is edited.
python# src/mcp_acme/resources/notes.py from pathlib import Path from fastmcp import FastMCP from watchfiles import awatch mcp = FastMCP(name="acme-notes", version="0.1.0") NOTES_DIR = Path("/var/data/notes") @mcp.resource("resource://notes/{slug}") async def read_note(slug: str) -> str: path = NOTES_DIR / f"{slug}.md" if not path.is_file(): raise FileNotFoundError(slug) return path.read_text(encoding="utf-8") @mcp.resource_template("resource://notes") async def list_notes() -> list[dict]: return [ {"uri": f"resource://notes/{p.stem}", "name": p.stem, "mimeType": "text/markdown"} for p in sorted(NOTES_DIR.glob("*.md")) ] @mcp.on_startup async def watch_notes(): async for changes in awatch(NOTES_DIR): for _, path_str in changes: slug = Path(path_str).stem await mcp.notify_resource_updated(f"resource://notes/{slug}")
The notify_resource_updated call is the subscription pivot. A client that subscribed to resource://notes/foo will be told that resource changed; it then re-reads at its leisure. This is how you build live context surfaces (a Jira board, a Linear cycle, a build queue) without inventing your own pub/sub.
Auth across transports
Auth is the part where toy servers die. There are three realistic modes and you should know which you are in.
Stdio for local Claude Desktop. The server is a subprocess of the user. There is no auth; the OS is the perimeter. Do not expose anything via stdio that the local user should not already be able to do.
HTTP with a bearer token for single-tenant remote. You run the server somewhere, you give yourself a long random token, the server checks it on every request. Five lines of middleware. Suitable for personal MCP servers you host for yourself.
python# src/mcp_acme/http.py import os from fastmcp import FastMCP from starlette.middleware.base import BaseHTTPMiddleware from starlette.responses import JSONResponse mcp = FastMCP(name="acme", version="0.1.0") EXPECTED = os.environ["MCP_BEARER"] class BearerAuth(BaseHTTPMiddleware): async def dispatch(self, request, call_next): if request.url.path.startswith("/healthz"): return await call_next(request) header = request.headers.get("authorization", "") if header != f"Bearer {EXPECTED}": return JSONResponse({"error": "unauthorised"}, status_code=401) return await call_next(request) app = mcp.http_app() app.add_middleware(BearerAuth) @app.route("/healthz") async def healthz(_): return JSONResponse({"ok": True})
OAuth for hosted multi-tenant. When the server is something many users connect to from many clients, you need real OAuth. The spec defines a discovery document, a dynamic client registration endpoint, and the usual authorisation code flow with PKCE. FastMCP exposes hooks for this; in practice you delegate to an identity provider (Auth0, WorkOS, your own Keycloak) and let it mint the tokens. Do not roll your own OAuth server.
Deploying it
One container, one machine, one reverse proxy. The same recipe works on Fly.io, a Hetzner CX22, a Render service, or a tiny DigitalOcean Droplet.
dockerfile# Dockerfile FROM python:3.12-slim WORKDIR /app COPY pyproject.toml uv.lock ./ RUN pip install --no-cache-dir uv && uv sync --frozen --no-dev COPY src ./src ENV PYTHONUNBUFFERED=1 MCP_TRANSPORT=http PORT=8080 EXPOSE 8080 CMD ["uv", "run", "uvicorn", "mcp_acme.http:app", "--host", "0.0.0.0", "--port", "8080"]
On Fly.io, fly launch reads the Dockerfile, picks a region, and gives you a TLS-terminated public URL. On a VPS, put Caddy in front and it handles certificates for you.
caddy# /etc/caddy/Caddyfile mcp.example.com { encode zstd gzip reverse_proxy localhost:8080 header /healthz Cache-Control "no-store" }
Add a health endpoint (the example above exposes /healthz) so your platform can restart the container when it goes sideways. Wire the bearer token in as a secret, never in the image.
Wiring clients
Claude Desktop is configured by a JSON file at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS, or the equivalent under %APPDATA% on Windows. Stdio servers are listed under mcpServers with a command and args; HTTP servers go under the same key with a URL and headers.
json{ "mcpServers": { "acme-local": { "command": "uv", "args": ["--directory", "/Users/sarma/code/mcp-acme", "run", "python", "-m", "mcp_acme.server"] }, "acme-remote": { "url": "https://mcp.example.com/mcp", "headers": { "Authorization": "Bearer ${MCP_BEARER}" } } } }
ChatGPT custom connectors take the public URL and walk you through OAuth or bearer setup in the UI; there is no config file to edit. Cursor, Zed, and Continue all read MCP server lists from their own settings, but they accept the same shape (command plus args, or URL plus headers). Restart the client after editing; none of them hot-reload yet.
Observability
An MCP server is a tool API the model can call hundreds of times per session. You want to see what was called, with what arguments, how long it took, and whether it errored. Structured JSON logs to stdout are the floor; OpenTelemetry traces are the ceiling that pays for itself the first time a tool starts misbehaving.
Wrap every tool with a logging decorator: log the tool name, a hash of the arguments (never the raw arguments, they may contain secrets), the duration, and the outcome. Push traces to a free Honeycomb or Grafana Cloud Tempo endpoint via the OTel exporter; the FastMCP-as-Starlette app instruments cleanly with opentelemetry-instrumentation-starlette.
Alert on three signals. Tool error rate above 1% over five minutes (something is misconfigured). P95 tool latency above 3 seconds (the model will keep retrying and your bill will balloon). Auth failure rate spike (someone is probing your bearer token). Everything else is nice to know; those three will save your weekend.
Pitfalls
Stdio MCP uses stdout as the wire. A stray print(), a logging handler defaulting to stdout, or a library that warns to stdout will corrupt every message and the client will silently disconnect. Route every log to stderr, period. Set Python logging to stream=sys.stderr at the entrypoint.
Any tool that writes, deletes, or sends is a tool the model will eventually call at the wrong moment. Default destructive verbs to a dry-run mode, require an explicit confirm=true argument, and log every invocation with full arguments to an append-only audit table you own.
If you accept dict or Any, the model will pass you malformed JSON, numbers as strings, nested objects you never planned for, and you will be debugging at 2am. Use Pydantic models (or zod in TS) on every tool input, with bounds and enums where they apply. The model also gets a better schema as a side effect.
A confused model can call your search tool a hundred times in thirty seconds. Put a per-token rate limit (slowapi in Python, express-rate-limit in TS) at the HTTP layer, and set a hard timeout on every external call your tools make. Without both, one bad session can drain a downstream API quota.
Platforms reach for a health endpoint to decide when to restart your container. No endpoint means the platform never restarts a hung process and your clients keep getting 502s. Add /healthz that returns 200 only when the database pool is alive and the worker loop is running.
Wrap up
The leap from a toy MCP server to a production one is mostly about the boring half: typed inputs, real auth, a deploy pipeline, observability, and the discipline to assume the model will call your tools in ways you did not anticipate. The spec gives you the wire format for free; the engineering gives you a server you can leave running for months without flinching. Build the skeleton in an hour, harden it over an afternoon, and you have an asset that every new AI client gets to use without you writing another integration.
Want this done for you?
If you would rather skip the YAK shave and have someone who has done this fifty times set it up properly, that is what I do for a living.
Start a project