Multi-agent workflows you can replay, audit, and trust.
Most agent frameworks are demos with delusions of grandeur. They fall over the moment a tool times out.
Agent Orchestrator runs multi-agent workflows the way payment systems run payments. Workflows are typed graphs, state is durable in Postgres, every step writes a trace event, and any run can be replayed deterministically from a checkpoint. Token, tool-call, and wall-clock budgets are hard limits that actually halt execution through an AbortSignal, not soft warnings. A Next.js Inspector draws the agent graph and lets you walk any past run step by step.
Why this exists
A single agent doing one task on one happy path is a demo. A graph of agents handing work to each other across hours, with retries, partial failures, and tool budgets, is a system. The leap from one to the other is where most agent projects stall.
The bigger frameworks help you describe the graph but leave durability and replay as exercises for the reader. The smaller libraries solve the prompting layer and treat workflow concerns as out of scope. The result is that every team trying to ship an agent product writes the same Postgres journal, the same BullMQ wiring, the same retry plumbing, the same budget enforcement, the same in-house Inspector.
Agent Orchestrator is that infrastructure layer, written once. Workflows live in Postgres. Steps are events. Replay reads the events back deterministically from a checkpoint. Tool budgets are enforced before the model is invoked. The Inspector UI is the missing observability tier.
The four properties that matter
Orchestrators that survive 2am have these four. This one has all four.
Durable state
A run is checkpointed after every node, so a crash resumes from the last checkpoint, not the start.
Deterministic replay
Reconstruct any run from a checkpointed step without re-executing earlier nodes.
Hard budgets
Token, tool-call, and wall-clock limits that abort the run rather than logging a warning.
Visible execution
Every step recorded and queryable, with OpenTelemetry spans for external tracing.
What is in the box
Every feature below ships in the public repository today. Clone, migrate, run.
Graph DSL in TypeScript
Typed nodes and edges with conditional transitions and per-graph budgets, built with a fluent API in apps/api/src/graph/definition.ts. The graph is whatever shape your code makes.
Durable Postgres state
Every node writes a row to the runs, traces, and checkpoints tables before it runs. Crashes resume from the last checkpoint, not the start. Drizzle ORM, migrations in the repo.
Deterministic replay
POST /runs/{id}/replay with a fromStep rehydrates context from the latest checkpoint at or before that step and resumes the walk. Earlier nodes are not re-executed.
Hard budgets, real aborts
Token, tool-call, wall-clock, and per-tool caps. A breach aborts the run through an AbortSignal and lands it in budget_exceeded, rather than logging a warning while a runaway loop continues.
Next.js Inspector UI
apps/inspector reads run state and draws the agent graph. Live run list, run detail view, agent-graph visualisation. Built for engineers, not for marketing screenshots.
Three agent kinds
pipeline runs one LLM pass then tools in order. supervisor runs one LLM pass whose decision routes conditional edges. swarm runs concurrency parallel LLM passes and merges results.
BullMQ queue
Each run is a BullMQ job with backoff, concurrency, and dead-letter queue support. Without REDIS_URL the orchestrator degrades to in-process execution; the test suite uses that path.
OpenTelemetry spans
Every run, node, LLM call, and tool call is wrapped in an OTel span. Token counts and the MCP flag are attached as attributes. Ship spans over OTLP/HTTP into your existing collector.
Tool registry with Zod
Tools register with a Zod schema. Arguments are validated, the call is charged against the run budget before the handler runs, and a budget breach aborts before any side effect.
First-class MCP tools
registerMcpTool wires a Model Context Protocol server into the same registry as built-in tools. They share the tool budget, so an agent cannot dodge the cap by routing through MCP.
Offline fallback
Leave DATABASE_URL, REDIS_URL, and SARMALINK_API_KEY unset and the API falls back to an in-memory store, runs in-process, and the LLM adapter returns deterministic offline output. This is the test path.
Trace events plus spans
Trace events (enter, exit, error, replay) live in the traces table for after-the-fact debugging. OTel spans plug into live distributed tracing. Two complementary views, both wired by default.
Architecture, in one diagram
A small pnpm monorepo. apps/api is the control plane and graph executor; apps/inspector is the Next.js UI. State on Postgres, runs distributed through BullMQ on Redis.
apps/api/src/graph/definition.tsGraph DSL: typed nodes, edges, budgets, fluent API.
apps/api/src/graph/executor.tsDurable executor: walks the graph in dependency order, checkpoints after every node.
apps/api/src/graph/agents.tsAgent kinds: pipeline, supervisor, swarm. Real dispatch into the LLM adapter and tool registry.
apps/api/src/budgetsRunBudget owns the AbortController. TokenBudget and ToolBudget enforce the caps.
apps/api/src/toolsTool registry with Zod schemas. registerMcpTool for Model Context Protocol servers.
apps/api/src/stateStateStore contract. PostgresStore (default) and MemoryStore (tests, offline).
apps/api/src/queueenqueueRun via BullMQ with retry policy. Degrades to in-process when Redis is absent.
apps/api/src/telemetryOpenTelemetry Node SDK boot. Exports over OTLP/HTTP when configured.
apps/inspectorNext.js UI: run list, run detail, agent-graph visualisation.
Database schema
Three tables. runs is the unit of work; traces is the append-only event log; checkpoints back deterministic replay.
Run lifecycle
Six statuses, each persisted on the runs row. Every transition writes a trace event.
queuedJob written to BullMQ and waiting for a worker.
runningExecutor is walking the graph. Each node writes enter and exit trace events.
completedWalk finished successfully. Output is on the runs row.
failedA node threw and the run cannot recover. Error recorded on the run and in the trace.
budget_exceededToken, tool, or wall-clock cap breached. AbortSignal fired; in-flight calls cancelled.
replayedRun created via /replay rehydrated from a checkpoint and resumed the walk.
Quick start
Five commands from clone to a replayed run. Commands taken straight from the README.
git clone https://github.com/sarmakska/agent-orchestrator.git cd agent-orchestrator && pnpm install
docker compose up -d postgres redis cp .env.example .env # set DATABASE_URL, REDIS_URL, SARMALINK_API_KEY
pnpm migrate pnpm dev # runner + Inspector in one command
curl -X POST http://localhost:4000/runs \
-H "Content-Type: application/json" \
-d '{"graph":"triage","input":{"intent":"refund"}}'
# Open the returned run id in the Inspector at http://localhost:3000curl -X POST http://localhost:4000/runs/<run-id>/replay \
-H "Content-Type: application/json" \
-d '{"fromStep": 2}'Graphs and tools, in real code
Snippets straight from the repo and the wiki. Every example below is runnable as part of the bundled examples.
typescriptDefining a graph (research swarm)+
import { graph } from '../../src/graph/definition.js'
export const research = graph('research-swarm')
.node('plan', { agent: 'supervisor', llm: 'sarmalink' })
.node('search', { agent: 'pipeline', tools: ['web_search'] })
.node('analyse', { agent: 'swarm', llm: 'sarmalink', concurrency: 3 })
.node('summarise', { agent: 'pipeline', llm: 'sarmalink' })
.edge('plan', 'search')
.edge('search', 'analyse')
.edge('analyse', 'summarise')
.budget({
tokens: 50000,
tools: 100,
wallClockSec: 300,
perTool: { web_search: 20 },
})typescriptAuthoring a tool+
import { tool } from '../../src/tools/registry.js'
import { z } from 'zod'
export const stripeRefund = tool('stripe_refund', {
description: 'Refund a Stripe charge by ID',
schema: z.object({
chargeId: z.string(),
amountPence: z.number().int(),
}),
handler: async ({ chargeId, amountPence }) => ({
refundId: `re_${chargeId}`,
amountPence,
}),
})
// Reference it by name from any node's tools array.
// Tool calls are validated against the schema and charged
// against the run's tool budget BEFORE the handler runs.typescriptConditional edges+
// Edges may carry a 'when' predicate for routing.
.edge('a', 'b') // unconditional
.edge('a', 'b', (ctx) => ctx.input?.confidence > 0.8) // conditional
// supervisor agents expose their decision on the run context,
// so conditional edges can branch on it.typescriptMCP tool registration+
import { z } from 'zod'
import { registerMcpTool } from '../../src/tools/mcp.js'
registerMcpTool({
name: 'mcp_lookup',
description: 'Look up a record on the MCP server',
schema: z.object({ id: z.string() }),
serverUrl: 'https://mcp.internal/rpc',
})
// MCP tools share the registry, so they are charged against the
// same tool budget as built-in tools. An agent cannot dodge the
// cap by routing through MCP.bashTriggering a run+
curl -X POST http://localhost:4000/runs \
-H "Content-Type: application/json" \
-d '{"graph":"research-swarm","input":{"topic":"deterministic replay"}}'
# Returns { id: "<uuid>", status: "queued" }
# Open the run id in the Inspector to watch the graph light up.Where it fits
The patterns this repository was built around, and the ones it deliberately is not.
Research and write workflows
A planner agent breaks a topic into questions, researcher agents fetch and summarise in a swarm, an editor stitches output. Replayable and audit-ready.
Internal ops automation
Multi-step workflows over Slack, Linear, Notion, GitHub. Tool budgets stop a runaway loop from rate-limiting your accounts, hard, not as a warning.
Data extraction pipelines
Documents in, structured records out. The orchestrator handles retries, partial failures, and replay so you do not need an idempotency layer above it.
Customer-facing agent flows
Long-running tasks where customers can pause and resume. Durable state survives restarts; replay lets support reproduce any past run for debugging.
When NOT to reach for it
Prototyping a single prompt or a one-shot chat completion. The durable state, queue, and Postgres dependency are overhead you do not need for a demo.
Not a model provider
You bring your own LLM through the adapter. SarmaLink-AI is the default; OpenAI-compatible providers slot in. The orchestrator is the workflow tier, not the model tier.
Tech stack
Compared to the alternatives
Two popular agent frameworks and rolling your own. Honest comparisons on the properties that matter for production.
| Feature | Agent Orchestrator | LangGraph | CrewAI | DIY |
|---|---|---|---|---|
| Durable state across crashes | Yes, Postgres | Optional | No | You build it |
| Deterministic replay | From any checkpoint | Partial | No | You build it |
| Hard budgets that abort | Tokens / tools / wall-clock / per-tool | Manual | Soft | You build it |
| Live Inspector UI | Next.js + tRPC + SSE | LangSmith (paid) | Limited | You build it |
| MCP tools in the same budget | Yes | Partial | No | You build it |
| OpenTelemetry by default | Yes | Yes | Partial | You write it |
| Self-hostable, MIT | Yes | Yes | Yes | Yours |
Documentation, all in the wiki
Focused pages with no homepage marketing in between. Each one answers a single operational question.
Frequently asked
Eight real questions from teams running this in production.
What makes replay deterministic?+
Every node writes a checkpoint after it runs. A replay rehydrates context from the latest checkpoint at or before the requested step and resumes the walk from that cut. Earlier nodes are not re-executed and their outputs are read from the checkpoint, not regenerated. The state machine that walks the graph is pure given the checkpoint, so the same fromStep produces the same continuation.
How are budgets actually enforced?+
Through a single RunBudget that owns an AbortController. Every LLM call charges its token usage and every tool call charges one before the handler runs. The wall-clock budget is checked at the start of each node. A breach aborts the AbortSignal, which cancels in-flight LLM and tool calls, throws BudgetExceededError, and lands the run in budget_exceeded.
What is the difference between supervisor, pipeline, and swarm?+
pipeline runs one LLM pass when an llm is set, then invokes each declared tool in order, useful for "do A then B then C". supervisor runs one LLM pass whose decision is exposed on the run context so conditional edges can branch on it, useful for routing nodes. swarm runs concurrency parallel LLM passes and merges the results, useful for parallel research, though it costs roughly that multiple of the tokens.
Do I need Postgres and Redis to try it?+
No. Leave DATABASE_URL, REDIS_URL, and SARMALINK_API_KEY unset and the API falls back to an in-memory store, runs in-process, and the LLM adapter returns deterministic offline output. This is the same path the test suite uses. For real runs you bring up Postgres and Redis via docker compose.
How do MCP tools fit in?+
They register through registerMcpTool and live in the same registry as built-in tools. Because they share the registry and the same callTool path, MCP tools are charged against the same tool budget. An agent cannot dodge the per-tool or global cap by routing through MCP, and a budget breach aborts before the call leaves the box.
How does the queue retry?+
BullMQ provides backoff and dead-letter queues. enqueueRun applies the configured retry policy. Failed jobs are retried with exponential backoff; persistent failures land in the DLQ. Without Redis, the orchestrator runs in-process and tests use that mode.
Where do traces live?+
Two places, deliberately. Trace events (enter, exit, error, replay) are rows in the traces table for after-the-fact debugging and the Inspector reads from there. OpenTelemetry spans flow over OTLP/HTTP into your existing collector for live distributed tracing. The first is your durable audit trail; the second plugs into your observability stack.
Can I run the LLM somewhere other than SarmaLink-AI?+
Yes. The LLM adapter is an interface; SarmaLink is one implementation. OpenAI-compatible providers slot in, and the LLM provider key in node options selects which adapter the executor calls.
Related products
The rest of the Sarma Linux toolkit. Same opinions throughout: open source, MIT, real depth.
Run agents like payments. Durable, replayable, audited.
Clone the repo, bring up Postgres and Redis, trigger a run, watch the Inspector light up.