What I have shipped, written down honestly
Nineteen open-source projects under github.com/sarmakska. Each case study below frames the problem I was actually solving, the architecture I picked, the trade-offs I accepted, and the line back to the repo and the product page. Built on weekends and evenings while I hold a PAYE engineering role until February 2030.
SarmaLink-AI
One LLM provider going down should not break my apps. Vendor-locked SDKs make that the default. I needed an OpenAI-compatible gateway that fans a request out across many providers and picks the next live one when the current engine 5xxs.
Multi-provider gateway with health-checked failover across 14 engines. OpenAI-shaped request and response. Plugin auto-router keyed on intent (research, voice, eval, RAG, OCR) so the same gateway can dispatch into the rest of the open-source toolchain.
slipstream
Coding agents die two ways. Whole-file reads burn the context window. The session ends and every durable decision evaporates. I wanted scoped reads, durable memory, and a window into the session that does not phone home.
Bundled MCP server with sp_map, sp_symbol, sp_lines, sp_search. Markdown memory store plus a PreCompact hook that writes a structured digest the instant before the window is trimmed. Local 127.0.0.1 dashboard that observes, never drives.
Agent Orchestrator
Multi-agent workflows that crash halfway through should resume from the last good step, not from zero. Most agent frameworks treat durability as an afterthought, so a transient network blip rewinds an hour of work.
Durable execution on Postgres via Drizzle, queued through BullMQ, with deterministic replay so any step can be rerun against the same inputs. Inspector UI for watching the workflow graph live.
Voice Agent Starter
A real-time voice loop is a system of latency budgets. Capture, transcribe, infer, synthesise, play back. Cross one budget and the conversation stops feeling alive. Most starters hide where the time actually goes.
pnpm workspace with mediasoup for the media path, Fastify on the server, and a Next.js client. The round trip is instrumented end to end so the slow stage is always visible. Tuned to a sub-second turn.
AI Eval Runner
Evals as a Notion checklist drift the moment the model changes. They need to be code, runnable on a cron, comparable across runs. Otherwise the next regression lands in production unnoticed.
Python 3.12 with uv and Typer. DuckDB for run history so deltas across runs are a single query. FastAPI plus HTMX viewer renders the diff between runs without a build step.
RAG over PDF
PDF retrieval starters tend to be one of two things: an unreadable framework wrapper, or a notebook with the chunking strategy hidden under a dependency. I wanted a minimal one I could fork and reason about in an afternoon.
Plain Python, explicit chunking with overlap, embeddings to a local vector store, retrieval with rerank. Every step is one file. No framework leakage.
Receipt Scanner
Vision OCR for receipts works in the demo and falls over on the messy real ones. Most starters skip the structured-output discipline that makes the result usable downstream.
A clean vision pipeline that returns strict JSON for line items, totals, tax. Retries on schema mismatch. Test fixtures with crumpled, partial and bilingual receipts so regressions surface fast.
Webhook to Email
Every project ends up needing the same tiny receiver that turns an HMAC-signed webhook into a transactional email. Writing it from scratch each time is how secret leaks happen.
Constant-time HMAC verification on the way in, idempotency keyed on the provider event id, Resend on the way out. One small repo to fork, one tested receiver to deploy.
MCP Server Toolkit
Production-grade MCP servers need more than a "hello world". Auth, structured logging, request validation, a deploy story. Most templates stop short and leave the operator to invent the rest.
Python plus FastAPI starter with structured logs, request validation, an auth seam and a Dockerfile that runs. Designed to be forked for a specific tool surface without reinventing the plumbing.
Local LLM Router
Local LLM via Ollama is great until the prompt needs a frontier model. Switching code paths between the two breaks the developer loop and ends with two diverging clients.
OpenAI-compatible proxy that routes per request to Ollama or a cloud provider based on a simple rule set. The client code never changes, only the routing config does.
forge-infer
I wanted to understand inference servers from the bottom up. Paged KV-cache, continuous batching, speculative decoding. Reading vLLM source is one way. Implementing a minimal version is the other.
Minimal Python LLM inference server with a paged KV-cache, continuous batching of requests in flight, and a speculative decoding path. Written to be read, not to win throughput crowns.
lsmdb
Most database internals posts wave at LSM trees and skip the corner cases. The compaction policy is where the engine earns its keep. I wanted a clean implementation that exercised the hard parts.
Log-structured merge-tree engine in Go. Write-ahead log, immutable SSTables, bloom filters on reads, MVCC snapshots for consistent ranges. Compaction policy is its own readable module.
raftkv
Claiming a Raft implementation is correct is cheap. Proving it stays linearizable under partition, lost messages, leader churn and clock skew is the actual bar. Most implementations skip the harness.
Raft-backed key-value store in Go with a fault-injection harness that drives partitions, message loss, leader churn and replays the trace against a linearizability checker.
sandboxd
Running untrusted code from an LLM tool call without a hardened sandbox is how production becomes a postmortem. Off-the-shelf runtimes leak ambient authority by default.
WebAssembly sandbox in Rust with a deny-by-default host ABI. Strict CPU, wall-clock and memory limits enforced at the runtime boundary. Every capability is opt-in and audited.
shipyard
Multi-tenant SaaS scaffolds either skip tenant isolation or bolt it on after the fact. Both routes end in a cross-tenant leak. The scaffold has to bake isolation in from the first row.
TypeScript starter with row-level tenant isolation, role-based access control, metered billing, an append-only audit log and rate limits per tenant. The defaults are the safe ones.
k8s-ops-toolkit
A fresh Kubernetes cluster needs the same five things before any app lands. Ingress, certs, metrics, logs, a sensible Helm pattern for a Next.js app. Wiring that by hand each time burns a day.
Opinionated Helm chart for a Next.js app plus an observability bootstrap: ingress-nginx, cert-manager, kube-prometheus-stack and Loki. Apply once, then ship features.
terraform-stack
Vercel, Supabase, Cloudflare and DigitalOcean are the stack I actually ship on. Their Terraform stories live in four separate repos with inconsistent variable shapes. Reconciling that on every project is wasted motion.
One Terraform repository with first-class modules for Vercel, Supabase, Cloudflare and DigitalOcean. Shared variable conventions so the stacks compose. Includes the whitepaper on the trade-offs.
staff-portal
Small teams need a HR and ops portal that does not require an enterprise contract. Attendance, leave, expenses, timesheets and a kiosk sign-in path, integrated, not glued together with spreadsheets.
Next.js portal with Supabase auth and Postgres, scheduled jobs for digests and reminders, kiosk mode for physical sign-in, and a small analytics surface for managers. Deployed on Vercel.
All nineteen, in one place
The full directory of open-source projects sits at /open-source, grouped by category, every repo linked with its README, wiki and whitepaper. If you are reading this because you might want to work with me on something like these inside a permanent role, the /hire-me page covers what that looks like (PAYE only, available from February 2030).