Whitepaper · Shipyard

Shipyard

A production-grade multi-tenant SaaS starter. Tenant isolation enforced by a repository chokepoint, permission-based RBAC, append-only audit log, token-bucket rate limits, and a billing scaffold with a real subscription state machine.

MIT LicensedOpen SourceNext.js 16TypeScriptnode:sqlite29 tests
29tests, all green
~470msfull test run
6test suites
0external services

v1.0 · May 2026 · Sai Sarma · Sarma Linux

Abstract

Shipyard is an open-source, MIT-licensed multi-tenant SaaS starter for Next.js. The headline guarantee is that one tenant cannot read or write another tenant’s data, enforced by a single repository chokepoint that injects the tenant predicate into every scoped query and stamps it onto every scoped insert. On top of that sit permission-based RBAC, an append-only audit log, a token-bucket rate limiter with a pluggable store, and a billing scaffold that includes a real subscription state machine and a Stripe-shaped webhook signature check. The project ships with 29 tests across six isolated suites that prove each guarantee, runs against the built-in node:sqlite in development and tests, and is structured for a Postgres swap in production. This whitepaper documents the architecture, the technical decisions, the alternatives considered, and the reasons for picking Shipyard over a homegrown starter or a Clerk-plus-Stripe paste.

01Executive Summary

Every B2B SaaS needs the same spine before it can ship product: tenant isolation, sessions, a role model, an audit trail, rate limits, and a billing scaffold that does not embarrass you on the first sales call. The pieces are well-understood. They are also each independently easy to get subtly wrong, and the failures are the kind that surface in production: a query missing its tenant predicate, a role check living in the client, a webhook reactivating a cancelled plan because nobody validated the transition.

Shipyard exists to write that spine once, with the isolation and authorisation guarantees pinned down by tests rather than by code review. The project is opinionated about the hard parts (one chokepoint for tenant data, server-side authorisation, append-only audit, validated state transitions) and deliberately empty everywhere your product lives (no UI kit, no ORM, no bundled payment SDK).

The whole spine is small enough to read in one sitting. The repository is roughly 230 lines, the RBAC guard is about 30, the limiter is about 60. Total pnpm test wall time is roughly one second including process start. Six isolated suites prove tenant isolation, RBAC, audit, rate limits, billing transitions and Stripe webhook verification.

02Background & Motivation

I have started the same B2B SaaS three times. Each time I spent the first fortnight rebuilding the same unglamorous spine before I could touch the actual product. The cost is not the lines of code, it is the second-order risk of getting any of them subtly wrong in a way that does not show up until a customer notices. A scoped query that forgot its WHERE. A role comparison that drifted as a new role was added. A webhook handler that accepted a replayed event because the state machine was implicit in the if-statements.

The market has three nearby answers. One: glue Clerk or Auth0 onto Stripe and write the rest yourself. Two: clone a paid SaaS boilerplate. Three: copy your last project. Each works, none of them give you a tested guarantee of tenant isolation on commit one, because that guarantee is structural rather than something an auth provider can add.

Shipyard takes the fourth route. Write the spine once, define the chokepoint explicitly, prove it with tests, and document the design choices in prose. Make it run with zero external services so the install is fast and the guarantees prove themselves on any machine.

03The Problem

The specific failure modes Shipyard is designed to remove:

  • Cross-tenant reads and writes. A query path that does not include the tenant predicate is a data breach waiting for the right input. Application-level isolation has to be testable on commit one, not after Postgres is configured.
  • Smuggled tenant ids. A request body that carries an organisationId from the client must not be trusted, even by a careful developer who only sometimes remembers to strip it.
  • Role check drift. Asserting roles directly at the call site (if (role === "admin")) means every new capability invites a quiet inconsistency between routes.
  • Client-trusted authorisation. A role read from a JWT and acted on in the browser is not authorisation, it is a UI hint with no enforcement.
  • Sessions in plaintext. Storing the session token verbatim means a database leak hands an attacker live sessions.
  • Audit gaps. Privileged actions without a tamper-resistant trail leave an incident review without an answer.
  • Implicit state machines. A subscription model where transitions are scattered across handlers ends up accepting replays and out-of-order webhooks.
  • Hand-rolled signature compare. Webhook signature checks that use === instead of constant-time comparison leak signature bytes through wall-clock variance.

04Goals & Non-goals

Goals

  • A single, narrow, auditable path for every tenant-scoped read and write.
  • Server-side authorisation through permissions, with a fail-closed default.
  • Append-only audit log written through the scoped repository.
  • Token-bucket rate limits per (tenant, route group), with a pluggable store for multi-instance correctness.
  • A real subscription state machine that rejects illegal transitions.
  • A Stripe-shaped webhook signature check that is the real HMAC scheme, not a stub.
  • Zero external services to run locally. pnpm install and pnpm test finish in seconds.
  • Tests that prove each guarantee, with one fresh in-memory database per suite.

Non-goals

  • An ORM. The whole isolation argument rests on one narrow path. A generated query builder hides the WHERE clause the guarantee depends on.
  • A bundled payment SDK. The Stripe adapter is a seam. The webhook signature is real, the rest of the calls point you at the four-line pnpm install.
  • A UI kit. A minimal settings dashboard proves the wiring, then gets out of the way.
  • SQLite in production. The repository is built for the Postgres swap. SQLite is the dev and test layer.
  • A single-tenant skeleton. The tenancy machinery is pure overhead if you are not multi-tenant.

05Architecture

Request flow

rendering
Shipyard request flow: edge gate, resolved context, rate limit, RBAC, service, scoped repository, database.

Module map

FileResponsibility
src/db/schema.tsTable definitions and the TENANT_SCOPED_TABLES set
src/db/repository.tsThe chokepoint. Scoped and global helpers, predicate injection
src/db/migrate.tsSchema migrations driven by the table descriptors
src/lib/auth.tsSessions (SHA-256 hashed), scrypt password hashes, signup, login
src/lib/context.tsresolveContext: session → user → tenant → role
src/lib/rbac.tsPermissions, roles, requirePermission, guard
src/lib/audit.tsrecordAudit, listAudit
src/lib/rate-limit.tsToken bucket, injectable clock, BucketStore interface
src/lib/billing/plans.tsPlan catalogue and per-metric budgets
src/lib/billing/service.tsSubscription state machine and usage metering
src/lib/billing/provider-fake.tsIn-memory provider for tests and local dev
src/lib/billing/provider-stripe.tsStripe-shaped seam. Real HMAC verification, stubbed customer/subscription calls
src/lib/http.tswithGuard wrapper for routes, cookie setter, error mapping
tests/*Six suites, fresh in-memory DB each

06Key Technical Decisions

A repository chokepoint, not Postgres RLS as the primary guard

Row-level security is genuinely good, and Shipyard documents it as the production defence in depth on Postgres. It is not the primary guard for two reasons. The project has to run and prove itself with zero services, which rules out making Postgres a prerequisite. And an application-level guard fails loudly in a unit test on any database, whereas an RLS misconfiguration fails silently until production. Both, not one, was the only honest answer.

node:sqlite, not better-sqlite3 or an ORM

better-sqlite3 is excellent and it is also a compiled addon, which is exactly the thing that breaks in someone’s CI on a Tuesday. node:sqlite (stable from Node 24) has no native build step, so pnpm install is fast and pnpm test runs anywhere. An ORM was rejected for a different reason: the isolation argument rests on there being one narrow, auditable path to tenant data, and a hand-written repository of about 230 lines is something I can read top to bottom and reason about.

Permissions at the call site, roles as bundles

Asserting roles directly (if (role === "admin")) is shorter and rots. Every new capability forces a revisit of every role comparison and meaning drifts. Asserting a permission (requirePermission(role, "members:invite")) keeps routes readable and lets the role table grow without touching call sites. The guard throws on missing permission so a forgotten check fails by raising rather than by silently allowing.

Sessions hashed at rest

Opaque 32-byte tokens with only the SHA-256 hash stored. A database dump does not hand out live sessions because the stored hash cannot be presented as a cookie. The plaintext is the httpOnly cookie. Passwords are hashed with scrypt from node:crypto, with cost parameters embedded in the hash so they can be raised later without a migration.

Token bucket over a fixed window or sliding-window log

A fixed window double-rates at the boundary. A sliding-window log needs a timestamp list per key. The bucket is two numbers, (tokens, lastRefill), which is also why it ports unchanged to a Redis Lua script. auth is the tightest budget at five-with-one-every-five-seconds, which blunts credential stuffing.

A real state machine for billing

Allowed transitions are explicit. canceled is terminal. An out-of-order or replayed webhook that tries an illegal move (for example reactivating a cancelled subscription) is rejected with a BillingError rather than silently applied. Webhook events are also checked against the stored providerSubscriptionId so an event for a different subscription cannot mutate this tenant’s record.

A Stripe-shaped seam, not a Stripe SDK dependency

Bundling a payment SDK into a starter is the wrong default. The webhook signature check is the one piece you genuinely cannot fake, so it is implemented for real: HMAC-SHA256 over `{timestamp}.{payload}` with the webhook secret, compared in constant time via timingSafeEqual. The customer and subscription methods throw with a pointer to the wiki until you pnpm add stripe and fill them in.

07Alternatives Considered

Why this over a homegrown starter

The honest version of a homegrown starter takes a fortnight per project and produces a slightly different spine every time, because the design decisions are remade from cold. The result is a portfolio of slightly inconsistent starters, none of which has a tested isolation guarantee on commit one. Shipyard is the version where the design decisions are made once, written down, and pinned by tests.

Why this over Clerk plus Stripe pasted together

Clerk solves authentication and gives you a hosted UI for sign-in and organisations. Stripe solves billing. Neither solves tenant isolation in your database, because that is structural to your code rather than something a provider can do for you. The interesting questions are still yours: which tables carry organisationId, where the predicate is enforced, what stops a smuggled id from landing under the wrong tenant, what makes the audit log trustworthy. A Clerk-plus-Stripe paste leaves all of those open and bills you monthly for the bits it does cover. Shipyard answers them in the repository and is MIT-licensed.

Why this over a paid SaaS boilerplate

Paid boilerplates tend to optimise for surface area. Lots of pages, lots of integrations, lots of branding. The spine underneath is rarely the part they sell on. Shipyard does the opposite: only the spine, with the guarantees front and centre. If you want pages, bring your own design system. If you want integrations, the seams are explicit.

Why this over Postgres RLS alone

RLS is excellent for defence in depth and the right answer in production. It is not the right primary guard because it makes Postgres a prerequisite for the project to install and prove itself, and because RLS misconfiguration fails silently. Application-level isolation is what you test on commit one. RLS is what you add when you swap to Postgres.

08Results & Performance

Test suite

Apple M3 Pro, Node v25.9.0. Real numbers from my machine, not estimates.

$ pnpm test
 Test Files  6 passed (6)
      Tests  29 passed (29)
   Duration  468ms

$ /usr/bin/time -p pnpm test    # whole command, including process start
real 1.04

What each suite proves

SuiteWhat it proves
tenant-isolationCross-tenant reads return nothing; a smuggled tenant id is overwritten; cross-tenant updates change zero rows
rbacA viewer is refused privileged actions; a user with no membership in the active tenant fails closed
auditSignup and invitations write entries with the correct actor, tenant and metadata; entries are returned newest first
rate-limitThe bucket allows up to capacity, blocks past it, refills at the configured rate and never exceeds the ceiling
billingSubscribe and webhook transitions are validated; illegal transitions are rejected; plan budgets stop usage overrun
stripe-webhookA correctly signed payload is accepted and mapped; a tampered or unsigned payload is rejected

Each suite gets its own fresh in-memory database, so there is no shared state to leak between cases. The whole run is hermetic and reproducible.

09Lessons & Trade-offs

What worked

  • Asserting the chokepoint with a test before any feature. The smuggled-id and cross-tenant-update cases catch the entire class of subtle isolation bugs in one suite.
  • Skipping organisationId in the where loop. One line in the repository (if (key === "organisationId") continue) makes the predicate genuinely non-overridable rather than merely conventionally so.
  • Permissions as a typed tuple. A new capability that is not in PERMISSIONS is a type error at the call site, so the wiring stays in sync.
  • Injectable clock on the limiter. Tests advance the clock by hand, so refill behaviour is exact and deterministic with no setTimeout.
  • Explicit allow list for state transitions. Reading the allowed moves as a list is much easier to review than reading the equivalent if-tree.

Trade-offs accepted

  • SQLite in production is not supported. The repository is built for the Postgres swap. Shipping the SQLite layer to production is on the user.
  • The Stripe adapter is a seam. Customer and subscription calls throw until you bring in the SDK. The webhook signature is the one piece I refused to stub.
  • Single-instance rate limiting by default. The bucket store is in-memory. Behind several instances the effective limit multiplies until you wire the Redis store. The interface exists for exactly that.
  • No UI kit. A minimal dashboard proves the wiring. Bring your own design system.
  • Not a single-tenant skeleton. If you are not building multi-tenant, the tenancy machinery is overhead. Start elsewhere.

10Conclusion

The hard parts of a B2B SaaS spine are the parts that fail subtly. Cross-tenant queries that look right. Role checks that drift. Webhooks that accept replays. Signature checks that leak through timing. Each of these is a one-line fix when you know the pattern, and a real outage when you do not. Shipyard’s contribution is to make the patterns explicit, narrow, and tested, so the spine you copy across projects is the same one each time and the guarantees travel with it.

What you build on top is your product. Bring your own UI kit, your own routes, your own data model. The spine gets out of the way as soon as you have it.

AConfiguration

VariableRequiredDefaultPurpose
SHIPYARD_DB_PATHFor devin-memoryPath to the SQLite file. Omit for an ephemeral in-memory database (the default for tests)
NODE_ENVNodevelopmentSets the session cookie’s secure flag when production
BILLING_PROVIDERNofakeSelects the billing provider. stripe wires up provider-stripe.ts
STRIPE_SECRET_KEYIf Stripe, Stripe secret. Used once you fill in the customer and subscription calls
STRIPE_WEBHOOK_SECRETIf Stripe, Webhook signing secret. The HMAC verification is already real and tested
STRIPE_PRICE_PROIf Stripe, Stripe Price id for the Pro plan
STRIPE_PRICE_SCALEIf Stripe, Stripe Price id for the Scale plan

BProduction Checklist

  • Swap SQLite for Postgres. Implement the same Repository interface against Postgres. The application code does not change.
  • Enable RLS as defence in depth. Keep the repository as the application-level guard and add an RLS policy keyed on a session variable. Both, not one.
  • Wire a Redis BucketStore. Implement get/set against Redis, ideally with refill-and-take in a small Lua script so concurrent requests across instances cannot both spend the last token.
  • Set BILLING_PROVIDER=stripe. pnpm add stripe, fill in createCustomer, createSubscription, cancelSubscription in provider-stripe.ts. The webhook signature check is already real.
  • Lock down audit-log mutation at the database. Revoke UPDATE and DELETE on audit_log for the application role, so the append-only property is enforced below the application as well.
  • Pin Node to the version you tested. Reproducible builds matter, especially for a project that depends on built-in node:sqlite.
  • Put the app behind a TLS-terminating proxy. Set NODE_ENV=production so the session cookie is marked secure.
  • Add bounces and complaints monitoring to your email provider. Sign-up and invitation flows depend on it.
All open-source projects