Today the studio ships its first hosted product. sarmalink[1] is a bring-your-own-keys AI chat workspace, live now, free during beta, built solo in London. You paste your own provider keys once, they are encrypted in your browser before anything is stored, and from then on you chat across seventeen providers through one interface that always tries the free tiers first and fails over automatically when a provider throttles or falls over.
One naming note before anything else, because I have an open-source project with a confusingly similar name. Sarmalink-AI is the open-source, self-hosted gateway on my GitHub: you clone it, deploy it, and it is yours. sarmalink is different: a hosted product you just open in a browser. They share a philosophy, multi-provider with aggressive failover, but this post is about the hosted product.
Why bring your own keys
Every mainstream AI chat product is built on the same shape: the operator holds the model keys, you pay the operator a flat monthly fee, and your conversations sit in the operator's database under the operator's policies. That shape has two structural problems.
The first is economic. A flat subscription charges you the same twenty dollars whether you send three messages or three thousand. The margin lives in the gap between what you pay and what you use. If you are a light or bursty user, you are the margin.
The second is trust. When the operator holds the keys and the chats, "we take privacy seriously" is a policy, not a property. Policies change. Retention windows change. Training-data terms change. The only privacy that survives a terms-of-service update is the kind enforced by mathematics.
BYOK inverts both. You bring keys you got directly from Groq, Cerebras, SambaNova, Gemini, OpenRouter, Mistral, DeepSeek and the rest, most of which hand out genuinely useful free tiers to anyone with an email address. sarmalink orchestrates those keys; it does not own them. Your usage bill, if you ever exceed the free tiers, goes to the provider at cost, with no middleman markup. And because the workspace never holds anything it can decrypt, the trust question mostly evaporates.
The free-tier economics
The quiet fact of mid-2026 is that the free inference floor is very high if you are willing to spread across providers. Two examples from sarmalink's ladder, both published limits you can check today.
Groq gives free-tier accounts up to 14,400 requests per day on its fast production models. That is one request every six seconds, all day, without paying anything. Cerebras gives roughly one million tokens per day free. At a typical chat exchange of about 500 tokens round trip, my arithmetic, that is on the order of 2,000 exchanges a day before the meter would matter.
Source: Provider free-tier documentation, author's arithmetic (500 tokens per exchange for Cerebras)
Those are two providers out of seventeen. Stack the rest, Gemini's free quota, OpenRouter's free-model catalogue, Mistral's experiment tier, and the practical answer for most people is that day-to-day chat never touches a paid meter at all. The full, current list of providers and models lives on the models page[5].
sarmalink's routing encodes this directly: free tiers first. Every request walks a ladder that starts with the free allowances on your keys and only reaches paid options if you have configured them. When a provider rate-limits, errors or times out, the router steps to the next rung automatically, mid-conversation, without you doing anything. Failover is not a premium feature; it is the routing model.
Zero-knowledge, in the strict sense
"Zero-knowledge" gets used loosely, so here is precisely what sarmalink does, all of it verifiable on the security page[2].
Your API keys and your chat history are encrypted client-side, in your browser, before they are written to any storage. The cipher is AES-256-GCM. The encryption key is derived from your passphrase with PBKDF2 at 600,000 iterations, which is the current OWASP-recommended order of magnitude for PBKDF2-HMAC-SHA256. Derivation happens on your device. The passphrase never leaves it. The derived key never leaves it.
The consequence is blunt: there is nothing on the server side that can be read. Not by an attacker who dumps a database, because there is no chat database to dump. Not by a subpoena, because the operator cannot decrypt what it does not hold keys for. Not by me, and I built it. If you lose your passphrase, I cannot recover your history, and that inability is the guarantee.
Where your chats actually live
This is the part of the architecture I am proudest of, because it deletes the most common failure mode of chat products: the operator-side conversation store.
sarmalink gives you three storage backends, all encrypted with the scheme above, all owned by you:
Your browser. The default. Chats persist in IndexedDB on your device. Zero setup, works offline, never leaves the machine.
Your GitHub Gists. Point sarmalink at your own GitHub account and encrypted chat blobs sync through private Gists. Free, versioned, and portable across your devices, and you can open your Gist list any time and see exactly what is stored, which is ciphertext.
Your Cloudflare R2. For people who want proper object storage, bring your own R2 bucket and credentials. Your bucket, your billing, your retention policy.
What is deliberately missing from that list is "our database". There is no fourth option where conversations rest on infrastructure I control. The sync layer moves ciphertext between your devices and your storage; it has nothing worth stealing.
Seventeen providers and the failover ladder
Why seventeen? Because provider reliability in 2026 is good on average and terrible in the tail. Any single provider has bad hours: rate-limit storms, model deprecations, regional outages. Multi-provider used to mean seventeen tabs and a spreadsheet of which key goes where.
In sarmalink you add whichever keys you have, and the router does the rest. Preference order respects free tiers first, then your configured paid keys. Health tracking is per provider, per model. A failed request is retried down the ladder transparently, and the conversation simply continues. In practice the visible effect is that the workspace feels more reliable than any single provider in it, which is the whole point of a gateway, applied to a consumer product.
Coder mode and the live preview
The feature that surprised early users most is Coder mode. Ask for an app, a pomodoro timer, a mortgage calculator, a particle toy, and sarmalink produces a complete single-file app and runs it immediately in a sandboxed live preview next to the chat. You watch it render, click around it, ask for changes, and the preview updates on the next turn.
The single-file constraint is deliberate. One HTML file with inline styles and script is trivially portable: download it, open it, host it anywhere, no build step, no dependency tree. The sandbox is equally deliberate: generated code runs in an isolated frame with no access to your keys, your chats or the surrounding app. It is the fastest loop I have found for the class of small tools people actually ask a model to build, and it comes free with the same BYOK economics as everything else.
What free beta means
sarmalink is free during beta. Concretely, per the pricing page[3]: no charge for the workspace, no card on file, no artificial message caps layered on top of your providers' own limits. Your only possible cost is what a provider bills your own key if you push past its free tier, and that money goes to the provider, not to me.
Source: Public subscription pricing, July 2026; sarmalink pricing page
For comparison, the two big consumer subscriptions both sit at twenty dollars a month before you send a single message. The honest caveat: beta means beta. Features will move, and when the beta ends there will be a paid tier for the hosted conveniences. What will not change is the architecture. Chats and keys stay client-side encrypted in storage you own; that is a foundation, not a promotional setting.
| Spec | sarmalink | Typical AI subscription |
|---|---|---|
| Monthly cost | Free during beta; usage rides on your own keys and their free tiers | $20 per month, flat, before you send a single message |
| Whose keys | Yours, encrypted client-side before they are stored anywhere | The operator's; you never hold them |
| Who can read your chats | You, on your devices. The operator holds nothing decryptable | The operator, subject to their retention and training policies |
| Where chats live | Your browser, your GitHub Gists, or your Cloudflare R2 bucket | The operator's database |
Also an app, in the boring good way
sarmalink is an installable PWA. Add it from the browser and it behaves like a native app on desktop and mobile: its own window, offline access to your local chats, no app store between you and updates. For a product whose storage story is "your device first", the PWA shape is the honest one.
What is next
The beta roadmap, in the order I intend to ship it: deeper Coder mode with multi-turn refinement of larger apps, more providers as free tiers appear and die, shared encrypted workspaces, and export tooling so leaving sarmalink is always one click, because a product whose pitch is ownership has to make departure easy to be credible.
If you want to kick the tyres: open ai.sarmalinux.com[1], paste one free key, Groq's takes about ninety seconds to get, and you are chatting. The docs cover keys, storage backends and the failover ladder[4]. The security page is the full cryptographic spec[2].
Seventeen providers, your keys, your storage, mathematics instead of promises. Built solo in London, live today.
Build apps by talking
Coder mode has a dedicated home at ai.sarmalinux.com/build[6], and it goes further than the single-file loop described above. Describe the app you want and sarmalink builds it and runs it live in a sandboxed preview beside the chat. You iterate by conversation: click around the running app, say what should change, and the preview updates on the next turn.
Ask for something bigger than a single file and sarmalink produces a full multi-file, full-stack project, nested folders, a package.json, run instructions, and saves it straight to a project folder on your own machine via the File System Access API, in Chromium browsers on Windows, macOS and Linux. Everywhere else you get the same project as a download. Saved apps load back into the workspace, so a project is something you return to and keep refining, not a one-shot transcript.
The economics are the same as the rest of the workspace: it runs on your own keys, free tiers first, with Cerebras's Qwen3-Coder-480B leading the coding ladder. And if you would rather work from an editor, your personal OpenAI-compatible API means Cursor, Continue, Claude Code or any OpenAI SDK can drive the same models on the same keys.
---
A note on this post
Every figure in this post is either a provider's published free-tier limit at the time of writing, straightforward arithmetic on one, or a property of sarmalink you can verify yourself by opening the app and its security page. Citations link to the primary source. Where a number is my own derivation, the post says so in the same sentence.
About the data
A note on what the numbers in this post represent so you can read them with the right confidence:
- "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
- Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
- Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
- Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
- Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.
If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.