Photo in, validated JSON out.
High-resolution vision OCR constrained by JSON Schema, re-validated by Zod, batched up to fifty files at a time, ready for Supabase, Xero, QuickBooks or any n8n workflow you wire in. Photo in, validated JSON out. OFX statement to follow.
Why this exists
Receipt OCR used to be a half-product: OCR a photo into text, then write hundreds of lines of regexes to pull out vendor, total, currency, line items. Half of those regexes failed on the next receipt format and you spent the rest of the year patching edge cases.
Vision models removed the regex layer entirely. A modern high-resolution vision model can read a printed receipt, a thermal print, or a mobile screenshot and emit a structured JSON object directly. The hard problem moves up a level: how do you make sure that JSON is always in the right shape so your database, your accounting export, and your UI can trust it?
The answer is a strict schema boundary. The model is constrained at the API by a JSON Schema, so it is forced to emit exactly the shape your app expects. The output is then re-validated by Zod on the way in. Malformed output is rejected at a single, named boundary before it reaches the UI, an export, or your database.
Receipt Scanner is the working version of that pattern. Photo in. JSON out. OFX out. Forty file lines of business code, fixture-backed tests, batch support, Cloudflare R2 originals, and a documented persistence stub waiting for your Supabase insert. Fork it, point it at your storage and accounting backend, ship.
A schema boundary is the difference between a demo and a system you can trust.
Vision models do the hard part now. The remaining failure mode is shape drift: the model emits something that almost-but-not-quite matches your downstream contract, and a wrong number lands silently in a ledger. Receipt Scanner refuses that path by making the model emit the JSON Schema directly and then re-validating with Zod before the result reaches your database, your UI or an accounting export. Two enforcement points, one shape, no surprises.
Built-in features
Everything below ships in the repository. Clone, set ANTHROPIC_API_KEY, deploy.
High-resolution vision OCR
A vision model is called with the receipt schema enforced as a structured-output constraint. Reads thermal prints, supermarket dot-matrix, mobile screenshots, faded receipts. Swap to any vision provider in lib/vision.ts.
Structured-output boundary
The model is constrained to emit the receipt JSON Schema directly, not free-form text the app has to repair. Output reaches the validator already in the right shape every time.
Zod contract on the way in
Vendor, address, date, time, currency, line items, subtotal, tax, tip, total, payment method. Every field optional. Malformed output is rejected at a single boundary before it reaches the UI, an export, or your database.
sharp pre-processing pipeline
EXIF auto-rotate, downscale to MAX_IMAGE_PX (default 2576 to match the model's high-resolution vision tier), re-encode to JPEG. Biggest cost lever in the whole pipeline.
Batch processing up to 50 files
POST several files to /api/scan/batch. Each is scanned independently with bounded concurrency. One unreadable image returns a per-file error without failing the batch.
Itemised line items
Each line item carries description, quantity, unit price, line total. Captures vendor-specific formatting from Tesco, Sainsbury's, Pret, Costco, and most printed receipts.
OFX 1.0.2 export
POST validated receipts to /api/export/ofx and receive a downloadable .ofx statement. Each receipt is one DEBIT transaction with a stable id so re-exports dedupe in the importer. Imports into Xero, QuickBooks, GnuCash.
Optional Cloudflare R2 originals
Original images stored content-addressed by SHA-256 for audit and deduplication. When R2 is not configured the pipeline runs unchanged and image_key is null.
Content addressing by SHA-256
Every original gets a stable SHA-256 hash regardless of R2 configuration. Re-uploads of the same image are detectable. Audit trails stay deterministic.
Persistence stub plus SQL schema
lib/persist.ts is a no-op save() with the contract documented inline. Replace with a single Supabase insert. The Postgres schema mirroring the Zod contract lives in docs/schema.sql.
Swap vision providers in one file
lib/vision.ts is a single function. Replace its body to call a different model. Same JSON Schema, same validator, same UI. Comparison reference is left in the file.
Vercel-native
sharp ships its Linux binaries automatically on the Vercel Node runtime. The scan routes pin the Node runtime. No build configuration needed.
Wire to Supabase, Xero, n8n
After validation the JSON is yours. Drop in a Supabase insert, wrap in an accounting API, or POST to an n8n workflow. The hard bit is solved at the schema boundary.
CI gate on every push
Type check, lint, test, build run on every push. Test fixtures and an end-to-end test ship in test/. Catches regressions in the vision call, validator, OFX export, and batch fan-out.
Architecture at a glance
One linear pipeline. Upload, store the original, downscale, call the vision model with the schema as a constraint, validate by Zod, return a complete StoredReceipt to the UI.
Scan pipeline
From a phone photo to a validated StoredReceipt. R2 storage and persistence are optional but documented stubs.
Structured-output boundary
Two enforcement points: the vision API constrains the model to emit the JSON Schema, and Zod re-validates the response before the rest of the app sees it.
Batch processing
Each file is scanned independently with bounded concurrency. The response is an array of per-file results so one unreadable image never fails the batch.
Quick start
Clone, set ANTHROPIC_API_KEY, drop in a receipt. R2 is optional, leave the keys empty and the pipeline still works end to end.
# 1. Clone git clone https://github.com/sarmakska/receipt-scanner.git cd receipt-scanner # 2. Install (pnpm lockfile is committed) pnpm install # 3. Configure cp .env.example .env.local # then edit .env.local and set: # ANTHROPIC_API_KEY=sk-ant-... # optional, enables R2 originals + deduplication: # R2_ACCOUNT_ID=... # R2_ACCESS_KEY_ID=... # R2_SECRET_ACCESS_KEY=... # R2_BUCKET=... # 4. Run pnpm dev # 5. Visit http://localhost:3000 # - drop in one receipt or fifty # - see parsed tables # - click Export OFX to download a statement
Full walkthrough including env var reference and first scan: Quick-Start wiki page.
The vision call
The whole model call lives in one file. The JSON Schema constraint is the structured-output contract; prompt caching keeps the system prompt cost low across many scans.
// lib/vision.ts (the one place the vision call lives)
import Anthropic from '@anthropic-ai/sdk'
import { RECEIPT_JSON_SCHEMA } from './schema'
const client = new Anthropic()
export async function scanReceipt(imageJpeg: Buffer): Promise<unknown> {
const response = await client.messages.create({
model: process.env.VISION_MODEL ?? 'claude-opus-4-7',
max_tokens: 4096,
system: [
{
type: 'text',
text: SYSTEM_PROMPT,
cache_control: { type: 'ephemeral' }, // prompt caching
},
],
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/jpeg',
data: imageJpeg.toString('base64'),
},
},
],
},
],
// Structured-output constraint: model must emit the schema
response_format: {
type: 'json_schema',
json_schema: { name: 'Receipt', schema: RECEIPT_JSON_SCHEMA },
},
})
return response.content[0]?.type === 'text'
? JSON.parse(response.content[0].text)
: null
}The contract
Every field is optional because real receipts are messy. The Zod schema is the single source of truth on the way in. The Postgres schema in docs/schema.sql mirrors it exactly.
// lib/schema.ts (the Zod contract every scan must satisfy)
import { z } from 'zod'
export const LineItemSchema = z.object({
description: z.string().nullable(),
quantity: z.number().nullable(),
unit_price: z.number().nullable(),
total: z.number().nullable(),
})
export const ReceiptSchema = z.object({
vendor: z.object({
name: z.string().nullable(),
address: z.string().nullable(),
}),
transaction: z.object({
date: z.string().nullable(), // ISO 8601
time: z.string().nullable(), // HH:MM:SS
}),
currency: z.string().length(3).nullable(), // ISO 4217
line_items: z.array(LineItemSchema),
subtotal: z.number().nullable(),
tax: z.number().nullable(),
tip: z.number().nullable(),
total: z.number().nullable(),
payment_method: z.string().nullable(),
})
export type Receipt = z.infer<typeof ReceiptSchema>
export type StoredReceipt = Receipt & {
id: string
image_key: string | null // R2 key, null when R2 is not configured
image_sha256: string // always computed, used as content address
created_at: string
}Use cases
What people actually build with this.
Internal expense workflow
Staff snap receipts on their phone, the scan returns structured fields, the original lands in R2 for audit, and a Supabase insert files the claim. Replace Expensify or Dext for a small team.
Month-end accounting import
Drop a folder into the batch upload, click Export OFX, import the statement into Xero, QuickBooks, or GnuCash. The OFX transaction ids are stable so re-imports dedupe cleanly.
AI bookkeeping prototype
A working OCR baseline you can fork. Add categorisation, VAT handling, HMRC submission on top. Ship in days, not months.
Personal expense tracker
Self-host on Vercel. Receipt to Notion, Google Sheets, or Supabase via the persistence stub. Replace SaaS subscriptions you do not need.
n8n / Zapier automation
POST every validated receipt to an n8n workflow. Branch on vendor, route for approval, push to a spreadsheet, raise a Slack message. The hard bit is the schema-validated JSON, which is solved.
Provider benchmarking
Compare Opus 4.7 against a different vision model on your own receipts. Replace the body of lib/vision.ts, keep the same schema, the UI and validation stay identical.
Tech stack
Receipt Scanner vs alternatives
Expensify, Dext, and Mindee are commercial products. Receipt Scanner is an open-source starter you self-host. Rows reflect the public capabilities of each.
| Capability | Receipt Scanner | Expensify | Dext | Mindee |
|---|---|---|---|---|
| Vision OCR | Opus 4.7 (swappable) | Hosted model | Hosted model | Hosted model |
| Structured-output enforcement | Yes, JSON Schema constraint | Internal | Internal | Internal |
| Schema validation boundary | Zod, single boundary | N/A | N/A | Provided fields |
| Batch processing | Up to 50 per request | Yes | Yes | Yes |
| Original image storage | Cloudflare R2 (optional) | Hosted | Hosted | Hosted |
| OFX export | Built in (Xero, QuickBooks, GnuCash) | CSV | CSV | JSON |
| Self-host | Vercel, Docker, anywhere Node runs | No | No | On-prem (enterprise) |
| Per-scan cost | Vision API token cost only | Per-user subscription | Per-user subscription | Per call |
| Source available | Yes, MIT | No | No | No |
| License | MIT | Commercial | Commercial | Commercial |
An honest limitations list
Trade-offs you should know about before adopting this starter.
Hand-written receipts are hit and miss
Cursive and low-light handwriting are unreliable. The model returns what it can read and leaves other fields null. The schema marks every field optional for this reason.
Per-scan token cost is small but real
Each scan is one vision API call. Downscaling keeps it cheap but not free. Tune MAX_IMAGE_PX down if you want to trade detail for cost.
No managed SLA
This is an open-source starter, not a hosted product. Buy Expensify or Dext if you want a managed service with support. Fork this if you want to own the pipeline.
Two schema definitions to keep in sync
The hand-written JSON Schema (RECEIPT_JSON_SCHEMA) handed to the model and the Zod schema must stay aligned. The SDK helper that derives one from the other needs Zod 4 and the project pins Zod 3.
Default path is a hosted vision API
The default vision provider runs over the public internet. On-premise inference is possible by swapping the body of lib/vision.ts but the wiring is your task.
In-process pipeline, no queue
Each request scans inline. Fine for typical small-business volume. Put a queue in front of it for sustained heavy load.
Wiki documentation
Twelve wiki pages covering architecture, vision models, batch upload, OFX export, R2 storage, configuration, database wiring, edge cases, and deployment.
Full index: receipt-scanner wiki home.
Frequently asked
Which vision model does the default ship with?+
A high-resolution frontier vision model with structured-output support. High-resolution support handles printed receipts, faded thermal prints, and mobile screenshots better than the lighter models in head-to-head testing on common UK receipts. Structured-output support means the model is constrained to emit the JSON Schema rather than free-form text the app has to repair. Both features matter at this task. Swap in any other vision provider by replacing the body of lib/vision.ts.
How much does each scan cost?+
It is one vision API call. Downscaling to MAX_IMAGE_PX (2576px default) and JPEG re-encode keeps image tokens low. Pricing depends on the provider and your model choice; for typical receipts the cost lands in low single-digit pence territory. Set MAX_IMAGE_PX lower for cheaper scans at the cost of detail.
Do I need Cloudflare R2?+
No. The R2 storage step is a no-op when the R2_* env vars are unset and the rest of the pipeline runs unchanged. R2 stores the original image content-addressed by SHA-256, which is useful for audit and deduplication but not required to validate a receipt.
How is malformed output handled?+
The model is constrained at the API by a JSON Schema, so it is forced to emit the right shape. The response is then re-validated by the Zod schema in lib/schema.ts on the way in. Anything that fails Zod is a boundary violation, surfaced as an error rather than passed through. The hand-written JSON Schema and the Zod schema must be kept in sync because the SDK's helper that derives one from the other requires Zod 4 and the project pins Zod 3.
Can it read hand-written receipts?+
Sometimes. Hand-written cafe slips work better than you might expect. Cursive or low-light handwriting is unreliable. The model returns what it can read and leaves other fields null. The schema marks every field optional for this reason.
What does the OFX export look like?+
OFX 1.0.2 bank statement format. Each receipt becomes one DEBIT transaction with a stable FITID derived from the receipt id, so re-exports dedupe in the importer rather than creating duplicates. Vendor name maps to NAME, total maps to TRNAMT, date maps to DTPOSTED. Multi-currency is preserved.
How does batch upload behave on partial failure?+
The batch endpoint runs each scan independently with bounded concurrency. The response is an array of per-file results, each carrying ok: true with the StoredReceipt or ok: false with an error. One unreadable image never fails the batch.
How do I wire it into my real backend?+
lib/persist.ts is a stub save(). Replace its body with one Supabase insert against the tables in docs/schema.sql, or POST the JSON to an n8n workflow, or call an accounting API directly. The schema is the contract; the persistence target is your choice.
Use it. Fork it. Ship it.
MIT licensed. Pull requests welcome, especially around vendor-specific format quirks, additional accounting exports, and provider adapters for other vision models.
Ready to ship receipt OCR?
Clone the repo, set ANTHROPIC_API_KEY, drop in a receipt. Validated JSON in two seconds, OFX export ready to import into Xero or QuickBooks.