How it works

How Agent Orchestrator works

A complete tour of the architecture: data flow, subsystems, technology choices, performance, and where the project is heading next.

TL;DR

Workflows are TypeScript functions executed inside a journaling shell that records every model and tool call as an event in Postgres. BullMQ moves work between steps. Replay re-derives a run from its journal without re-invoking the model. Tool budgets enforce spend caps before the model is called. The Inspector UI is the engineering observability tier.

Core data flow

From the moment a request enters the system to the moment a response leaves it.

  HTTP / tRPC starts a run
       │
       ▼
  Postgres: insert into runs
       │
       ▼
  Workflow function begins executing
       │
       ▼
  ┌─── orch.callModel(...) ─────────────────┐
  │  check events table for this step:       │
  │    hit  → return journaled output         │
  │    miss → call provider, journal, return  │
  └──────────────────────────────────────────┘
       │
       ▼
  ┌─── orch.callTool(...) ──────────────────┐
  │  check budget → 403 if breached           │
  │  check journal: hit → replay, miss → run  │
  │  insert events row with output            │
  └──────────────────────────────────────────┘
       │
       ▼
  Long-running step?
       │ yes                            │ no
       ▼                                ▼
  Push BullMQ job, suspend         continue inline
       │
       ▼
  Worker picks up job,
  replays journal to here,
  continues
       │
       ▼
  Workflow returns → mark run finished
       │
       ▼
  Inspector tails events via SSE

Each subsystem, deep-dived

Every component in the data flow above, opened up and explained.

The journaling shell

The shell is the runtime API every workflow uses. orch.callModel(agent, messages) looks up whether this step has been journaled. If yes, it returns the journaled output (replay path). If no, it calls the provider, writes the result to events, and returns. The same logic applies to orch.callTool(tool, args) and orch.spawn(subworkflow, inputs).

The lookup key is the run ID plus a deterministic step sequence number. Workflows must be deterministic: the same inputs at the same step produce the same key. Branching that depends on model output is allowed because the model output itself is journaled , the next replay sees the same output and takes the same branch.

Postgres event log

The events table is the source of truth. Schema essentials: run_id, seq, kind, agent, tool, args_hash, output, ts. The combination (run_id, seq) is the primary key. kind is one of model_call, tool_call, spawn, finish, error. output is JSONB.

Postgres row-level security is on by default. Each row carries a tenant_id; policies bind reads and writes to auth.uid() in environments where the orchestrator is exposed to multiple tenants. The Drizzle schema and the migration scripts ship in the repo; running them against a fresh Postgres is a one-line command.

BullMQ step queue

Long-running steps suspend the workflow and push a BullMQ job onto Redis. A worker picks up the job, restarts the workflow, replays the journal up to the suspension point, and continues. This is how the runtime is horizontally scalable: more workers, more concurrent runs, no shared state outside Postgres and Redis.

Backoff, concurrency limits, and dead-letter queues come from BullMQ. The orchestrator wires these per workflow. A workflow that calls a flaky external API can configure exponential backoff with jitter; one that must run at most twice can configure max-attempts. The DLQ is queryable from the Inspector.

Tool budgets

Budgets are enforced before the tool is called. Three scopes: per-run, per-agent, per-tool. Two metrics: count and cost. A budget row in Postgres carries the limit and the running used value; the runtime checks atomically with a Postgres advisory lock or the budget table’s row lock. A breach raises a typed BudgetExceededError that the workflow can catch.

Costs are computed by the model adapter. Each model adapter knows its own price card. Adding a new model means adding the price card; the budget logic is unchanged. The £ figures show up in the Inspector as the workflow runs.

Replay engine

Replay is identical to live execution at the API level. The shell looks up each step in the journal; for replay, every step is a hit; for live, the suffix from the resumption point is misses. There is no separate replay code path.

The Inspector exposes a replay-with-edits mode: open a past run, edit the prompt at any step, hit Run, and the runtime executes a fresh run that copies the journal up to the edited step and goes live from there. This is the debugging loop that pays for the rest of the project.

Inspector UI

A Next.js app talking tRPC to the orchestrator runtime. The graph view shows the live state of each agent and the messages flowing between them. The timeline view shows the events of any past run, with the model prompts, the tool args, and the journaled outputs. The replay tab does what the previous section described.

SSE streams new events to the UI in real time. tRPC handles all reads and mutations. There is no GraphQL, no REST sprawl. The Inspector is the only UI most teams will write; it is built for engineers, not for screenshots.

OpenTelemetry

Every workflow step emits a span. Spans nest under their parent: agents under workflows, tools under agents, model calls under agents. Trace IDs follow the run, so any external system that picks up a workflow event sees the same ID. Wire your OTLP collector and the dashboard appears.

Why this stack

The road not taken matters as much as the road taken. Here is what was picked, why, and what was rejected and why.

Picked

TypeScript

End-to-end type safety from the workflow author down to the database row. Drizzle schemas surface as TypeScript types directly.

Not this

Python , fine choice; we picked TypeScript because the agent ecosystem in TS is converging fast and we wanted no marshalling at the language boundary.

Picked

Postgres + Drizzle

Honest SQL-shaped ORM. Migrations are checkpoint files. The schema reads in twenty minutes. Postgres is operationally familiar everywhere.

Not this

Prisma , too magic for this kind of system. MongoDB , wrong shape for an event log.

Picked

BullMQ + Redis

Mature Redis queue with backoff, concurrency, and DLQ. Saves us from writing one badly.

Not this

pg-boss , would have meant one less dependency, but Redis is cheap and BullMQ is the better-maintained library.

Picked

Event journal

Composes with replay-with-edits. A snapshot only replays as-is. We needed both.

Not this

State checkpoint , simpler to implement, worse for debugging.

Picked

tRPC

Type-safe RPC between the Inspector and the runtime, no codegen, no schema drift.

Not this

GraphQL , overkill. REST , would have meant maintaining types twice.

Picked

Next.js for the Inspector

The Inspector is a small app with SSR for run links, SSE for live tail, and tRPC for queries. Next.js is the obvious fit.

Not this

A Vite SPA , would have worked, no SSR, no server actions for replay-with-edits.

Picked

OpenTelemetry

Vendor-neutral spans, metrics, and logs. The right answer in 2026.

Not this

A bespoke trace format , would have meant building dashboards we did not need to build.

Performance & observability

The orchestrator does not add measurable latency to model or tool calls; the journaling write is asynchronous to the call return path on the live path, and on the replay path the only cost is one indexed Postgres lookup. The end-to-end latency of a workflow is dominated by the model and tool calls themselves.

Postgres throughput is the practical ceiling. A single Postgres 16 instance on a 4 vCPU / 16GB machine sustains around 1,800 events/second in our load tests, which translates to roughly 60 concurrent moderate workflows or several hundred lightweight ones. Above that, partition the events table by month and add a read replica for the Inspector.

BullMQ scales horizontally; add workers to the same Redis instance until Redis itself is the bottleneck, which in practice means many tens of workers. The Inspector is read-only against the runtime, so it scales with another instance behind a load balancer.

Observability is OpenTelemetry + structured logs. Every span and every log line carries the run ID and the trace ID, so cross-system correlation works out of the box.

Where it is heading

→Multi-language workflow authors via a journaling-shell SDK in Python.
→Sandbox mode where workflows run with mock adapters by default; useful for CI.
→Saga patterns with compensating actions, surfaced as a first-class API.
→Postgres LISTEN/NOTIFY-driven worker wakeups, removing one Redis hop for low-traffic deploys.
→A workflow versioning model that lets long-running workflows survive code changes via shadow runs.

Read the full whitepaper for the formal technical write-up.

Whitepaper Repository Get help shipping