Open Source · MIT · TypeScript · Postgres + BullMQ

Multi-agent workflows you can replay, audit, and trust.

Most agent frameworks are demos with delusions of grandeur. They fall over the moment a tool times out.

Agent Orchestrator runs multi-agent workflows the way payment systems run payments. Workflows are typed graphs, state is durable in Postgres, every step writes a trace event, and any run can be replayed deterministically from a checkpoint. Token, tool-call, and wall-clock budgets are hard limits that actually halt execution through an AbortSignal, not soft warnings. A Next.js Inspector draws the agent graph and lets you walk any past run step by step.

Durable
Postgres state
Replay
From any step
Budgets
Tokens / tools / wall-clock
Inspector
Live graph UI
MIT
Licence

Why this exists

A single agent doing one task on one happy path is a demo. A graph of agents handing work to each other across hours, with retries, partial failures, and tool budgets, is a system. The leap from one to the other is where most agent projects stall.

The bigger frameworks help you describe the graph but leave durability and replay as exercises for the reader. The smaller libraries solve the prompting layer and treat workflow concerns as out of scope. The result is that every team trying to ship an agent product writes the same Postgres journal, the same BullMQ wiring, the same retry plumbing, the same budget enforcement, the same in-house Inspector.

Agent Orchestrator is that infrastructure layer, written once. Workflows live in Postgres. Steps are events. Replay reads the events back deterministically from a checkpoint. Tool budgets are enforced before the model is invoked. The Inspector UI is the missing observability tier.

The four properties that matter

Orchestrators that survive 2am have these four. This one has all four.

1

Durable state

A run is checkpointed after every node, so a crash resumes from the last checkpoint, not the start.

2

Deterministic replay

Reconstruct any run from a checkpointed step without re-executing earlier nodes.

3

Hard budgets

Token, tool-call, and wall-clock limits that abort the run rather than logging a warning.

4

Visible execution

Every step recorded and queryable, with OpenTelemetry spans for external tracing.

What is in the box

Every feature below ships in the public repository today. Clone, migrate, run.

Graph DSL in TypeScript

Typed nodes and edges with conditional transitions and per-graph budgets, built with a fluent API in apps/api/src/graph/definition.ts. The graph is whatever shape your code makes.

Durable Postgres state

Every node writes a row to the runs, traces, and checkpoints tables before it runs. Crashes resume from the last checkpoint, not the start. Drizzle ORM, migrations in the repo.

Deterministic replay

POST /runs/{id}/replay with a fromStep rehydrates context from the latest checkpoint at or before that step and resumes the walk. Earlier nodes are not re-executed.

Hard budgets, real aborts

Token, tool-call, wall-clock, and per-tool caps. A breach aborts the run through an AbortSignal and lands it in budget_exceeded, rather than logging a warning while a runaway loop continues.

Next.js Inspector UI

apps/inspector reads run state and draws the agent graph. Live run list, run detail view, agent-graph visualisation. Built for engineers, not for marketing screenshots.

Three agent kinds

pipeline runs one LLM pass then tools in order. supervisor runs one LLM pass whose decision routes conditional edges. swarm runs concurrency parallel LLM passes and merges results.

BullMQ queue

Each run is a BullMQ job with backoff, concurrency, and dead-letter queue support. Without REDIS_URL the orchestrator degrades to in-process execution; the test suite uses that path.

OpenTelemetry spans

Every run, node, LLM call, and tool call is wrapped in an OTel span. Token counts and the MCP flag are attached as attributes. Ship spans over OTLP/HTTP into your existing collector.

Tool registry with Zod

Tools register with a Zod schema. Arguments are validated, the call is charged against the run budget before the handler runs, and a budget breach aborts before any side effect.

First-class MCP tools

registerMcpTool wires a Model Context Protocol server into the same registry as built-in tools. They share the tool budget, so an agent cannot dodge the cap by routing through MCP.

Offline fallback

Leave DATABASE_URL, REDIS_URL, and SARMALINK_API_KEY unset and the API falls back to an in-memory store, runs in-process, and the LLM adapter returns deterministic offline output. This is the test path.

Trace events plus spans

Trace events (enter, exit, error, replay) live in the traces table for after-the-fact debugging. OTel spans plug into live distributed tracing. Two complementary views, both wired by default.

Architecture, in one diagram

A small pnpm monorepo. apps/api is the control plane and graph executor; apps/inspector is the Next.js UI. State on Postgres, runs distributed through BullMQ on Redis.

rendering
Agent Orchestrator runtime: durable workflow state on Postgres, queued steps on BullMQ, live observability on the Inspector UI.
apps/api/src/graph/definition.ts

Graph DSL: typed nodes, edges, budgets, fluent API.

apps/api/src/graph/executor.ts

Durable executor: walks the graph in dependency order, checkpoints after every node.

apps/api/src/graph/agents.ts

Agent kinds: pipeline, supervisor, swarm. Real dispatch into the LLM adapter and tool registry.

apps/api/src/budgets

RunBudget owns the AbortController. TokenBudget and ToolBudget enforce the caps.

apps/api/src/tools

Tool registry with Zod schemas. registerMcpTool for Model Context Protocol servers.

apps/api/src/state

StateStore contract. PostgresStore (default) and MemoryStore (tests, offline).

apps/api/src/queue

enqueueRun via BullMQ with retry policy. Degrades to in-process when Redis is absent.

apps/api/src/telemetry

OpenTelemetry Node SDK boot. Exports over OTLP/HTTP when configured.

apps/inspector

Next.js UI: run list, run detail, agent-graph visualisation.

Database schema

Three tables. runs is the unit of work; traces is the append-only event log; checkpoints back deterministic replay.

rendering
Postgres schema. Drizzle ORM types it end to end; migrations live in the repo.

Run lifecycle

Six statuses, each persisted on the runs row. Every transition writes a trace event.

queued

Job written to BullMQ and waiting for a worker.

running

Executor is walking the graph. Each node writes enter and exit trace events.

completed

Walk finished successfully. Output is on the runs row.

failed

A node threw and the run cannot recover. Error recorded on the run and in the trace.

budget_exceeded

Token, tool, or wall-clock cap breached. AbortSignal fired; in-flight calls cancelled.

replayed

Run created via /replay rehydrated from a checkpoint and resumed the walk.

Quick start

Five commands from clone to a replayed run. Commands taken straight from the README.

01
Clone and install
git clone https://github.com/sarmakska/agent-orchestrator.git
cd agent-orchestrator && pnpm install
02
Bring up Postgres and Redis
docker compose up -d postgres redis
cp .env.example .env   # set DATABASE_URL, REDIS_URL, SARMALINK_API_KEY
03
Migrate and run
pnpm migrate
pnpm dev   # runner + Inspector in one command
04
Trigger an example run
curl -X POST http://localhost:4000/runs \
  -H "Content-Type: application/json" \
  -d '{"graph":"triage","input":{"intent":"refund"}}'

# Open the returned run id in the Inspector at http://localhost:3000
05
Replay from a checkpoint
curl -X POST http://localhost:4000/runs/<run-id>/replay \
  -H "Content-Type: application/json" \
  -d '{"fromStep": 2}'

Graphs and tools, in real code

Snippets straight from the repo and the wiki. Every example below is runnable as part of the bundled examples.

typescriptDefining a graph (research swarm)+
import { graph } from '../../src/graph/definition.js'

export const research = graph('research-swarm')
  .node('plan',      { agent: 'supervisor', llm: 'sarmalink' })
  .node('search',    { agent: 'pipeline',   tools: ['web_search'] })
  .node('analyse',   { agent: 'swarm',      llm: 'sarmalink', concurrency: 3 })
  .node('summarise', { agent: 'pipeline',   llm: 'sarmalink' })
  .edge('plan', 'search')
  .edge('search', 'analyse')
  .edge('analyse', 'summarise')
  .budget({
    tokens: 50000,
    tools: 100,
    wallClockSec: 300,
    perTool: { web_search: 20 },
  })
typescriptAuthoring a tool+
import { tool } from '../../src/tools/registry.js'
import { z } from 'zod'

export const stripeRefund = tool('stripe_refund', {
  description: 'Refund a Stripe charge by ID',
  schema: z.object({
    chargeId: z.string(),
    amountPence: z.number().int(),
  }),
  handler: async ({ chargeId, amountPence }) => ({
    refundId: `re_${chargeId}`,
    amountPence,
  }),
})

// Reference it by name from any node's tools array.
// Tool calls are validated against the schema and charged
// against the run's tool budget BEFORE the handler runs.
typescriptConditional edges+
// Edges may carry a 'when' predicate for routing.
.edge('a', 'b')                                          // unconditional
.edge('a', 'b', (ctx) => ctx.input?.confidence > 0.8)    // conditional

// supervisor agents expose their decision on the run context,
// so conditional edges can branch on it.
typescriptMCP tool registration+
import { z } from 'zod'
import { registerMcpTool } from '../../src/tools/mcp.js'

registerMcpTool({
  name: 'mcp_lookup',
  description: 'Look up a record on the MCP server',
  schema: z.object({ id: z.string() }),
  serverUrl: 'https://mcp.internal/rpc',
})

// MCP tools share the registry, so they are charged against the
// same tool budget as built-in tools. An agent cannot dodge the
// cap by routing through MCP.
bashTriggering a run+
curl -X POST http://localhost:4000/runs \
  -H "Content-Type: application/json" \
  -d '{"graph":"research-swarm","input":{"topic":"deterministic replay"}}'

# Returns { id: "<uuid>", status: "queued" }
# Open the run id in the Inspector to watch the graph light up.

Where it fits

The patterns this repository was built around, and the ones it deliberately is not.

Research and write workflows

A planner agent breaks a topic into questions, researcher agents fetch and summarise in a swarm, an editor stitches output. Replayable and audit-ready.

Internal ops automation

Multi-step workflows over Slack, Linear, Notion, GitHub. Tool budgets stop a runaway loop from rate-limiting your accounts, hard, not as a warning.

Data extraction pipelines

Documents in, structured records out. The orchestrator handles retries, partial failures, and replay so you do not need an idempotency layer above it.

Customer-facing agent flows

Long-running tasks where customers can pause and resume. Durable state survives restarts; replay lets support reproduce any past run for debugging.

When NOT to reach for it

Prototyping a single prompt or a one-shot chat completion. The durable state, queue, and Postgres dependency are overhead you do not need for a demo.

Not a model provider

You bring your own LLM through the adapter. SarmaLink-AI is the default; OpenAI-compatible providers slot in. The orchestrator is the workflow tier, not the model tier.

Tech stack

TypeScriptNode.js 22FastifyPostgresDrizzle ORMRedisBullMQNext.js 15tRPCZodOpenTelemetryDockerpnpm

Compared to the alternatives

Two popular agent frameworks and rolling your own. Honest comparisons on the properties that matter for production.

FeatureAgent OrchestratorLangGraphCrewAIDIY
Durable state across crashesYes, PostgresOptionalNoYou build it
Deterministic replayFrom any checkpointPartialNoYou build it
Hard budgets that abortTokens / tools / wall-clock / per-toolManualSoftYou build it
Live Inspector UINext.js + tRPC + SSELangSmith (paid)LimitedYou build it
MCP tools in the same budgetYesPartialNoYou build it
OpenTelemetry by defaultYesYesPartialYou write it
Self-hostable, MITYesYesYesYours

Frequently asked

Eight real questions from teams running this in production.

What makes replay deterministic?+

Every node writes a checkpoint after it runs. A replay rehydrates context from the latest checkpoint at or before the requested step and resumes the walk from that cut. Earlier nodes are not re-executed and their outputs are read from the checkpoint, not regenerated. The state machine that walks the graph is pure given the checkpoint, so the same fromStep produces the same continuation.

How are budgets actually enforced?+

Through a single RunBudget that owns an AbortController. Every LLM call charges its token usage and every tool call charges one before the handler runs. The wall-clock budget is checked at the start of each node. A breach aborts the AbortSignal, which cancels in-flight LLM and tool calls, throws BudgetExceededError, and lands the run in budget_exceeded.

What is the difference between supervisor, pipeline, and swarm?+

pipeline runs one LLM pass when an llm is set, then invokes each declared tool in order, useful for "do A then B then C". supervisor runs one LLM pass whose decision is exposed on the run context so conditional edges can branch on it, useful for routing nodes. swarm runs concurrency parallel LLM passes and merges the results, useful for parallel research, though it costs roughly that multiple of the tokens.

Do I need Postgres and Redis to try it?+

No. Leave DATABASE_URL, REDIS_URL, and SARMALINK_API_KEY unset and the API falls back to an in-memory store, runs in-process, and the LLM adapter returns deterministic offline output. This is the same path the test suite uses. For real runs you bring up Postgres and Redis via docker compose.

How do MCP tools fit in?+

They register through registerMcpTool and live in the same registry as built-in tools. Because they share the registry and the same callTool path, MCP tools are charged against the same tool budget. An agent cannot dodge the per-tool or global cap by routing through MCP, and a budget breach aborts before the call leaves the box.

How does the queue retry?+

BullMQ provides backoff and dead-letter queues. enqueueRun applies the configured retry policy. Failed jobs are retried with exponential backoff; persistent failures land in the DLQ. Without Redis, the orchestrator runs in-process and tests use that mode.

Where do traces live?+

Two places, deliberately. Trace events (enter, exit, error, replay) are rows in the traces table for after-the-fact debugging and the Inspector reads from there. OpenTelemetry spans flow over OTLP/HTTP into your existing collector for live distributed tracing. The first is your durable audit trail; the second plugs into your observability stack.

Can I run the LLM somewhere other than SarmaLink-AI?+

Yes. The LLM adapter is an interface; SarmaLink is one implementation. OpenAI-compatible providers slot in, and the LLM provider key in node options selects which adapter the executor calls.

Related products

The rest of the Sarma Linux toolkit. Same opinions throughout: open source, MIT, real depth.

SarmaLink-AI

multi-provider AI backend with sub-50ms failover across 36 engines.

Open product page

MCP Server Toolkit

Production-ready Model Context Protocol server starter, with plugins.

Open product page

Voice Agent Starter

Sub-second real-time voice loop with WebRTC, barge-in, and pluggable STT/TTS.

Open product page

AI Eval Runner

Evals as code. Datasets, scorers, traces, regressions, all in one CLI.

Open product page

Local LLM Router

OpenAI-compatible proxy that routes between local Ollama and cloud LLMs.

Open product page

StaffPortal

Open-source HR + ops platform built to replace three SaaS subscriptions.

Open product page

RAG-over-PDF

A minimal, production-shaped RAG starter with cited streaming answers.

Open product page

Receipt Scanner

Vision-OCR receipt scanning starter with Zod-typed JSON output.

Open product page

Webhook-to-Email

A tiny, production-grade webhook receiver with HMAC and React Email.

Open product page

k8s-ops-toolkit

Helm chart for Next.js + bootstrap script for the full observability stack.

Open product page

terraform-stack

Vercel + Supabase + Cloudflare + DigitalOcean as one Terraform repo.

Open product page

slipstream

Claude Code plugin v1.0: React dashboard with code graph, cross-tab agent bus, ~95% per-read savings, 75 skills.

Open product page

forge-infer

Minimal LLM inference server with paged KV-cache and speculative decoding.

Open product page

shipyard

Multi-tenant SaaS starter with isolation, RBAC, billing, audit and rate limits.

Open product page

lsmdb

Log-structured merge-tree storage engine in Go with WAL and MVCC snapshots.

Open product page

raftkv

Raft key-value store with a fault-injection harness that proves linearizability.

Open product page

sandboxd

WebAssembly sandbox for running untrusted code under strict CPU and memory limits.

Open product page

Run agents like payments. Durable, replayable, audited.

Clone the repo, bring up Postgres and Redis, trigger a run, watch the Inspector light up.

All open-source projects