Project case studies

What I have shipped, written down honestly

Nineteen open-source projects under github.com/sarmakska. Each case study below frames the problem I was actually solving, the architecture I picked, the trade-offs I accepted, and the line back to the repo and the product page. Built on weekends and evenings while I hold a PAYE engineering role until February 2030.

18
project case studies
8
with whitepapers
12
with Mermaid architecture diagrams
MIT
license across the board
AI infraDistributed systemsAgent toolingPlatform plumbing
AllAI infraAgent toolingStorageConsensusSandboxingSaaS scaffoldPlatform plumbing
AI infra

SarmaLink-AI

sarmakska/Sarmalink-ai
Problem

One LLM provider going down should not break my apps. Vendor-locked SDKs make that the default. I needed an OpenAI-compatible gateway that fans a request out across many providers and picks the next live one when the current engine 5xxs.

Approach

Multi-provider gateway with health-checked failover across 14 engines. OpenAI-shaped request and response. Plugin auto-router keyed on intent (research, voice, eval, RAG, OCR) so the same gateway can dispatch into the rest of the open-source toolchain.

14 engine failoverOpenAI-compatiblePlugin auto-routerTypeScript
Agent tooling

slipstream

sarmakska/slipstream
Problem

Coding agents die two ways. Whole-file reads burn the context window. The session ends and every durable decision evaporates. I wanted scoped reads, durable memory, and a window into the session that does not phone home.

Approach

Bundled MCP server with sp_map, sp_symbol, sp_lines, sp_search. Markdown memory store plus a PreCompact hook that writes a structured digest the instant before the window is trimmed. Local 127.0.0.1 dashboard that observes, never drives.

9 MCP tools88 tests127.0.0.1 onlyMIT
Agent tooling

Agent Orchestrator

sarmakska/agent-orchestrator
Problem

Multi-agent workflows that crash halfway through should resume from the last good step, not from zero. Most agent frameworks treat durability as an afterthought, so a transient network blip rewinds an hour of work.

Approach

Durable execution on Postgres via Drizzle, queued through BullMQ, with deterministic replay so any step can be rerun against the same inputs. Inspector UI for watching the workflow graph live.

Postgres + DrizzleBullMQDeterministic replayTypeScript
AI infra

Voice Agent Starter

sarmakska/voice-agent-starter
Problem

A real-time voice loop is a system of latency budgets. Capture, transcribe, infer, synthesise, play back. Cross one budget and the conversation stops feeling alive. Most starters hide where the time actually goes.

Approach

pnpm workspace with mediasoup for the media path, Fastify on the server, and a Next.js client. The round trip is instrumented end to end so the slow stage is always visible. Tuned to a sub-second turn.

mediasoupFastifySub-second turnpnpm workspace
AI infra

AI Eval Runner

sarmakska/ai-eval-runner
Problem

Evals as a Notion checklist drift the moment the model changes. They need to be code, runnable on a cron, comparable across runs. Otherwise the next regression lands in production unnoticed.

Approach

Python 3.12 with uv and Typer. DuckDB for run history so deltas across runs are a single query. FastAPI plus HTMX viewer renders the diff between runs without a build step.

Python 3.12DuckDBHTMX vieweruv
AI infra

RAG over PDF

sarmakska/rag-over-pdf
Problem

PDF retrieval starters tend to be one of two things: an unreadable framework wrapper, or a notebook with the chunking strategy hidden under a dependency. I wanted a minimal one I could fork and reason about in an afternoon.

Approach

Plain Python, explicit chunking with overlap, embeddings to a local vector store, retrieval with rerank. Every step is one file. No framework leakage.

Single-file stepsLocal vector storeRerank passMIT
AI infra

Receipt Scanner

sarmakska/receipt-scanner
Problem

Vision OCR for receipts works in the demo and falls over on the messy real ones. Most starters skip the structured-output discipline that makes the result usable downstream.

Approach

A clean vision pipeline that returns strict JSON for line items, totals, tax. Retries on schema mismatch. Test fixtures with crumpled, partial and bilingual receipts so regressions surface fast.

Strict JSON outSchema retriesMessy fixturesTypeScript
Platform plumbing

Webhook to Email

sarmakska/webhook-to-email
Problem

Every project ends up needing the same tiny receiver that turns an HMAC-signed webhook into a transactional email. Writing it from scratch each time is how secret leaks happen.

Approach

Constant-time HMAC verification on the way in, idempotency keyed on the provider event id, Resend on the way out. One small repo to fork, one tested receiver to deploy.

HMAC verifiedIdempotentResend1 file path
Agent tooling

MCP Server Toolkit

sarmakska/mcp-server-toolkit
Problem

Production-grade MCP servers need more than a "hello world". Auth, structured logging, request validation, a deploy story. Most templates stop short and leave the operator to invent the rest.

Approach

Python plus FastAPI starter with structured logs, request validation, an auth seam and a Dockerfile that runs. Designed to be forked for a specific tool surface without reinventing the plumbing.

Python + FastAPIStructured logsAuth seamDocker ready
AI infra

Local LLM Router

sarmakska/local-llm-router
Problem

Local LLM via Ollama is great until the prompt needs a frontier model. Switching code paths between the two breaks the developer loop and ends with two diverging clients.

Approach

OpenAI-compatible proxy that routes per request to Ollama or a cloud provider based on a simple rule set. The client code never changes, only the routing config does.

OpenAI shapeOllama + cloudRule-based routingTypeScript
AI infra

forge-infer

sarmakska/forge-infer
Problem

I wanted to understand inference servers from the bottom up. Paged KV-cache, continuous batching, speculative decoding. Reading vLLM source is one way. Implementing a minimal version is the other.

Approach

Minimal Python LLM inference server with a paged KV-cache, continuous batching of requests in flight, and a speculative decoding path. Written to be read, not to win throughput crowns.

Paged KV-cacheContinuous batchingSpeculative decodePython
Storage

lsmdb

sarmakska/lsmdb
Problem

Most database internals posts wave at LSM trees and skip the corner cases. The compaction policy is where the engine earns its keep. I wanted a clean implementation that exercised the hard parts.

Approach

Log-structured merge-tree engine in Go. Write-ahead log, immutable SSTables, bloom filters on reads, MVCC snapshots for consistent ranges. Compaction policy is its own readable module.

GoWAL + SSTablesBloom filtersMVCC snapshots
Consensus

raftkv

sarmakska/raftkv
Problem

Claiming a Raft implementation is correct is cheap. Proving it stays linearizable under partition, lost messages, leader churn and clock skew is the actual bar. Most implementations skip the harness.

Approach

Raft-backed key-value store in Go with a fault-injection harness that drives partitions, message loss, leader churn and replays the trace against a linearizability checker.

GoFault injectionLinearizability checkReplay trace
Sandboxing

sandboxd

sarmakska/sandboxd
Problem

Running untrusted code from an LLM tool call without a hardened sandbox is how production becomes a postmortem. Off-the-shelf runtimes leak ambient authority by default.

Approach

WebAssembly sandbox in Rust with a deny-by-default host ABI. Strict CPU, wall-clock and memory limits enforced at the runtime boundary. Every capability is opt-in and audited.

RustDeny-by-default ABICPU + wall-clock limitsWASM
SaaS scaffold

shipyard

sarmakska/shipyard
Problem

Multi-tenant SaaS scaffolds either skip tenant isolation or bolt it on after the fact. Both routes end in a cross-tenant leak. The scaffold has to bake isolation in from the first row.

Approach

TypeScript starter with row-level tenant isolation, role-based access control, metered billing, an append-only audit log and rate limits per tenant. The defaults are the safe ones.

TypeScriptRLS isolationRBAC + audit logPer-tenant limits
Platform plumbing

k8s-ops-toolkit

sarmakska/k8s-ops-toolkit
Problem

A fresh Kubernetes cluster needs the same five things before any app lands. Ingress, certs, metrics, logs, a sensible Helm pattern for a Next.js app. Wiring that by hand each time burns a day.

Approach

Opinionated Helm chart for a Next.js app plus an observability bootstrap: ingress-nginx, cert-manager, kube-prometheus-stack and Loki. Apply once, then ship features.

Helm chartingress-nginx + cert-managerPrometheus + LokiWhitepaper
Platform plumbing

terraform-stack

sarmakska/terraform-stack
Problem

Vercel, Supabase, Cloudflare and DigitalOcean are the stack I actually ship on. Their Terraform stories live in four separate repos with inconsistent variable shapes. Reconciling that on every project is wasted motion.

Approach

One Terraform repository with first-class modules for Vercel, Supabase, Cloudflare and DigitalOcean. Shared variable conventions so the stacks compose. Includes the whitepaper on the trade-offs.

TerraformVercel + Supabase + Cloudflare + DOComposable modulesWhitepaper
Platform plumbing

staff-portal

sarmakska/staff-portal
Problem

Small teams need a HR and ops portal that does not require an enterprise contract. Attendance, leave, expenses, timesheets and a kiosk sign-in path, integrated, not glued together with spreadsheets.

Approach

Next.js portal with Supabase auth and Postgres, scheduled jobs for digests and reminders, kiosk mode for physical sign-in, and a small analytics surface for managers. Deployed on Vercel.

Next.js + SupabaseKiosk modeScheduled digestsVercel
Still shipping

All nineteen, in one place

The full directory of open-source projects sits at /open-source, grouped by category, every repo linked with its README, wiki and whitepaper. If you are reading this because you might want to work with me on something like these inside a permanent role, the /hire-me page covers what that looks like (PAYE only, available from February 2030).