Rate-Limited Public API. Defensive by default.
The minimum sensible architecture for any public-facing API. Edge-level rate limiting, key-based authentication, request signing where appropriate, observability from day one, and a tiering model that scales from a free tier to paid plans without rewriting the auth layer.
Components
When to use this
- →You are exposing an API to third parties — partners, customers, or the public
- →Abuse is a realistic threat (which it is, for any public API)
- →You expect to grow from a free tier into paid plans
- →You want predictable latency and predictable cost
When not to use this
- ×Internal-only API behind your firewall — overkill, just authenticate normally
- ×Single-tenant, single-customer integration — webhook with shared secret is enough
- ×Throughputs above 10k req/s — you need a different layer of infrastructure
The rate-limit layer
Rate limits run in Edge Middleware, before any handler executes. The limit is keyed by API key when present, by IP otherwise. Two windows, sliding: short window catches bursts, long window catches sustained abuse.
// middleware.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const burst = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(20, '10 s'),
})
const sustained = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(1000, '1 h'),
})
export async function middleware(req: Request) {
const key = req.headers.get('authorization') ?? clientIp(req)
const [b, s] = await Promise.all([
burst.limit(key), sustained.limit(key),
])
if (!b.success || !s.success) {
return new Response('rate_limited', {
status: 429,
headers: { 'retry-after': String(b.reset) },
})
}
}API keys, properly
Keys are generated as random 32-byte strings, prefixed with an environment marker (`sk_live_`, `sk_test_`). Stored hashed in Postgres — never store the raw key. On each request, hash the incoming key and look up.
Hot-cache the key-to-tenant mapping in Edge memory for the lifetime of the request, with a short Redis cache for cross-region lookup. Invalidate on key revocation.
// At key creation
const raw = `sk_live_${crypto.randomUUID().replace(/-/g,'')}`
const hash = await sha256(raw)
await db.insert(apiKeys).values({
tenantId, hash, tier: 'free', createdAt: new Date(),
})
// Show 'raw' to the user once. Never store it.
// At request time
const incoming = req.headers.get('authorization')?.replace('Bearer ', '')
const key = await db.select().from(apiKeys).where(eq(apiKeys.hash, await sha256(incoming))).limit(1)Tier-aware limits
Once the key is resolved to a tier, the rate limit is updated to the tier’s limit. Free tier gets 60 req/min, paid tier gets 600, enterprise gets 6000. Keep the table small and explicit; do not let tiers proliferate.
- →free: 60 req/min, 10k req/day
- →pro: 600 req/min, 1M req/day
- →enterprise: 6000 req/min, custom daily cap
Observability
Every request emits a structured log line: timestamp, key (hashed), endpoint, status, latency, region. These go to Tinybird (or Vercel Logs), where you can query req/s by tenant, error rate by endpoint, p99 latency by region.
Errors go to Sentry with the tenant tag attached, so any spike is immediately attributable. Without per-tenant attribution, public-API errors are noise; with it, they are signal.
Abuse detection
- →Per-IP fallback rate limit even for authenticated requests (one tenant should not be able to DoS you with one key)
- →Geographic gate on registration if you do not serve certain regions
- →Honeypot endpoints that should never be hit; any traffic to them gets the IP banned
- →Anomaly alerts: 5x normal traffic for a tenant pages me, not an auto-suspension
Pagination, filtering, idempotency
Cursor-based pagination, not offset. Filtering via explicit query parameters with a typed schema. Idempotency-Key header on POSTs that mutate, with the key + tenant scoped for 24 hours. These are not optional; they are part of the baseline.
Alternatives I considered
Battle-tested, AWS-native. Slower cold starts, more configuration, vendor-specific. Right answer if the rest of your stack is on AWS.
Excellent Edge performance, KV is eventually consistent, which can be a problem for rate limiting. Workers + Durable Objects is closer to the ideal here.
Cheap, full control, you babysit the box. Acceptable for hobby projects. Painful for production scale.
Different shape entirely. Rate limiting GraphQL well is genuinely hard; depth + complexity limits are necessary on top of request limits.
Want me to build this for you?
Blueprints are how I think. If your problem fits one of these, we are already most of the way to a quote.