API

Rate-Limited Public API. Defensive by default.

The minimum sensible architecture for any public-facing API. Edge-level rate limiting, key-based authentication, request signing where appropriate, observability from day one, and a tiering model that scales from a free tier to paid plans without rewriting the auth layer.

Components

Edge Middleware

First line of defence. Rate limit, IP block, geographic gate before the handler runs.

Upstash Redis

Distributed counter store for rate limits. Edge-friendly, sub-10ms p99.

Hono

Lightweight router for the API surface. Works on Edge runtime.

API Keys table

Postgres table mapping hashed keys to a tenant + a tier. Lookup once, cache aggressively.

Sentry + Tinybird

Errors to Sentry, request analytics to Tinybird (or Vercel Logs + a dashboard).

Stripe (optional)

Billing for paid tiers. Webhooks update the key’s tier in Postgres.

When to use this

→You are exposing an API to third parties — partners, customers, or the public
→Abuse is a realistic threat (which it is, for any public API)
→You expect to grow from a free tier into paid plans
→You want predictable latency and predictable cost

When not to use this

×Internal-only API behind your firewall — overkill, just authenticate normally
×Single-tenant, single-customer integration — webhook with shared secret is enough
×Throughputs above 10k req/s — you need a different layer of infrastructure

The rate-limit layer

Rate limits run in Edge Middleware, before any handler executes. The limit is keyed by API key when present, by IP otherwise. Two windows, sliding: short window catches bursts, long window catches sustained abuse.

// middleware.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const burst = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(20, '10 s'),
})
const sustained = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(1000, '1 h'),
})

export async function middleware(req: Request) {
  const key = req.headers.get('authorization') ?? clientIp(req)
  const [b, s] = await Promise.all([
    burst.limit(key), sustained.limit(key),
  ])
  if (!b.success || !s.success) {
    return new Response('rate_limited', {
      status: 429,
      headers: { 'retry-after': String(b.reset) },
    })
  }
}

API keys, properly

Keys are generated as random 32-byte strings, prefixed with an environment marker (`sk_live_`, `sk_test_`). Stored hashed in Postgres — never store the raw key. On each request, hash the incoming key and look up.

Hot-cache the key-to-tenant mapping in Edge memory for the lifetime of the request, with a short Redis cache for cross-region lookup. Invalidate on key revocation.

// At key creation
const raw = `sk_live_${crypto.randomUUID().replace(/-/g,'')}`
const hash = await sha256(raw)
await db.insert(apiKeys).values({
  tenantId, hash, tier: 'free', createdAt: new Date(),
})
// Show 'raw' to the user once. Never store it.

// At request time
const incoming = req.headers.get('authorization')?.replace('Bearer ', '')
const key = await db.select().from(apiKeys).where(eq(apiKeys.hash, await sha256(incoming))).limit(1)

Tier-aware limits

Once the key is resolved to a tier, the rate limit is updated to the tier’s limit. Free tier gets 60 req/min, paid tier gets 600, enterprise gets 6000. Keep the table small and explicit; do not let tiers proliferate.

→free: 60 req/min, 10k req/day
→pro: 600 req/min, 1M req/day
→enterprise: 6000 req/min, custom daily cap

Observability

Every request emits a structured log line: timestamp, key (hashed), endpoint, status, latency, region. These go to Tinybird (or Vercel Logs), where you can query req/s by tenant, error rate by endpoint, p99 latency by region.

Errors go to Sentry with the tenant tag attached, so any spike is immediately attributable. Without per-tenant attribution, public-API errors are noise; with it, they are signal.

Abuse detection

→Per-IP fallback rate limit even for authenticated requests (one tenant should not be able to DoS you with one key)
→Geographic gate on registration if you do not serve certain regions
→Honeypot endpoints that should never be hit; any traffic to them gets the IP banned
→Anomaly alerts: 5x normal traffic for a tenant pages me, not an auto-suspension

Pagination, filtering, idempotency

Cursor-based pagination, not offset. Filtering via explicit query parameters with a typed schema. Idempotency-Key header on POSTs that mutate, with the key + tenant scoped for 24 hours. These are not optional; they are part of the baseline.

Alternatives I considered

AWS API Gateway + Lambda

Battle-tested, AWS-native. Slower cold starts, more configuration, vendor-specific. Right answer if the rest of your stack is on AWS.

Cloudflare Workers + KV

Excellent Edge performance, KV is eventually consistent, which can be a problem for rate limiting. Workers + Durable Objects is closer to the ideal here.

Express + Redis on a VM

Cheap, full control, you babysit the box. Acceptable for hobby projects. Painful for production scale.

GraphQL with depth limiting

Different shape entirely. Rate limiting GraphQL well is genuinely hard; depth + complexity limits are necessary on top of request limits.

Want me to build this for you?

Blueprints are how I think. If your problem fits one of these, we are already most of the way to a quote.

Start a conversation Next: Multi-Tenant SaaS Foundation