How Receipt Scanner works
One API route. One vision call. One Zod schema. Token-conscious image preprocessing, schema-enforced JSON output, graceful degradation on poor-quality images. The whole loop in plain English.
Six steps. One API call.
No regex.
Resize the image with sharp. Base64 encode it. Send to Claude with a strict JSON prompt. Parse the response. Validate against Zod. Return the typed object.
The hard problems — line item parsing, vendor normalisation, total reconciliation — are now things the language model does for you. What used to be a 2,000-line OCR pipeline is now 400 lines of TypeScript that you can read in an afternoon.
The remaining engineering is token economics, EXIF rotation, prompt strictness, and graceful degradation when the photo is genuinely unreadable. That is what this codebase makes explicit.
The scan loop
┌──────────────── /api/scan ──────────────────────────────────┐
│ Browser │
│ │ POST FormData(image) │
│ ▼ │
│ Route handler (lib/vision.ts) │
│ │ │
│ │ Buffer ──▶ sharp.rotate() │
│ │ rotated ──▶ .resize(1568, fit:'inside') │
│ │ resized ──▶ .jpeg(85) │
│ │ jpeg ──▶ .toString('base64') │
│ │ │
│ │ base64 ──▶ anthropic.messages.create({ │
│ │ model: 'claude-3-5-sonnet-latest', │
│ │ messages: [{ role:'user', content:[ │
│ │ { type:'image', source: base64 }, │
│ │ { type:'text', text: SYS_PROMPT } ]}] │
│ │ }) │
│ │ │
│ │ text ──▶ JSON.parse │
│ │ json ──▶ Receipt.parse (zod) │
│ │ rcpt ──▶ persist.save (optional) │
│ │ │
│ ▼ │
│ 200 OK { ok: true, id, receipt } │
└─────────────────────────────────────────────────────────────┘Each piece, deep-dived
Image preprocessing
Vision APIs charge per image token, and image tokens scale with resolution. A naive implementation pays 4× per scan.
sharp.rotate() applies EXIF orientation. resize({ width: 1568, fit: "inside" }) bounds the longest edge. jpeg({ quality: 85 }) re-encodes to a compact format. The output is a Buffer roughly 60% smaller than the input PNG, with no measurable accuracy loss on receipts.
Vision call
A single language model call replaces a Tesseract pipeline with regex line-item parsers. The model sees the layout and extracts structure in one round trip.
anthropic.messages.create() with max_tokens 1024, model claude-3-5-sonnet-latest, content array containing one image block (base64 JPEG) and one text block (the system prompt). No streaming — we want the full JSON to validate before responding.
Prompt design
The model must produce parseable JSON, must not invent values, and must use null for fields it cannot read. The prompt is the contract.
The prompt embeds a TypeScript-style type definition, demands "JSON only, no markdown, no backticks", and explicitly instructs "do not invent values" and "return null if uncertain". This combination eliminates the regex-strip-markdown step that earlier prototypes needed.
Schema validation
Vision models return text that claims to be JSON. Without runtime validation, malformed output crashes downstream code or — worse — silently propagates wrong types.
lib/schema.ts defines a Zod schema with every field nullable. Receipt.parse(json) either returns a typed object or throws a ZodError with the specific path that failed. Errors bubble up as a 422 with the validation message.
Persistence stub
Receipt Scanner is a starter. The "save the result" step belongs to your stack, not ours.
lib/persist.ts exports save(receipt: Receipt) that does nothing by default. Replace its body with a Supabase, Prisma, or raw pg insert. The Postgres schema is documented in docs/schema.sql and reproduced in the whitepaper.
UI rendering
The user wants a confidence check before they save. Show the image and the extracted fields side by side; nulls render as em dashes.
app/page.tsx is a single client component. Drag-and-drop upload, optimistic preview using URL.createObjectURL, fetch to /api/scan, render the structured table from the typed Receipt response. No state library, no form library, no toast library.
Why this, not that
Next.js 14 App Router
File-based API routes, native multipart parsing, edge-ready deployment in one framework. The whole backend is one route file.
Express + separate React frontend — two repos to maintain, CORS to configure, no benefit for a single-route service.
TypeScript + Zod
Schema-driven from start to finish. The same shape defines the runtime validator and the compile-time type. Single source of truth.
JSON Schema + ajv — duplicates the type elsewhere. Manual class definitions — diverges from the runtime check.
sharp (libvips)
4× faster than ImageMagick, fraction of the memory, ships natively on Vercel. Best-in-class for server-side image work.
jimp — pure JS, slower, missing some formats. ImageMagick — heavy native dep, GPL licensing concerns, slower.
Anthropic Claude 3.5 Sonnet
Best vision OCR I have benchmarked on real-world UK receipts. Stricter JSON adherence than GPT-4o. lower hallucination rate on missing fields.
GPT-4o — slightly worse line-item accuracy on supermarket receipts. Gemini 2.5 Pro — better latency, slightly weaker structured-output adherence.
Single API call per scan
Predictable cost, predictable latency, no re-prompt loops. The model either reads the receipt or returns nulls.
Multi-pass agents — 3-5× cost, unbounded latency, marginal accuracy gain on a task this constrained.
Vercel deployment
sharp ships on Vercel's Linux runtime. Push to GitHub, set ANTHROPIC_API_KEY, you are live in 60 seconds.
Lambda — sharp native bindings need a custom layer. ECS — six AWS services for what one git push solves.
What you can measure
Failure modes you should expect
What’s next
Multi-page PDF receipts
Rasterise via pdfjs, scan each page, merge totals. Hotel folios, multi-day rentals.
Email-to-receipt ingestion
Inbound email parser. Forward a Pret receipt to receipts@yourdomain, get it in your database in seconds.
Bulk batch upload
Drag a folder, queue scans, show progress. Background worker processing rather than UI route.
HMRC-compatible export
CSV format that drops into Self Assessment expense schedules. Currency normalisation included.
Confidence scoring
Ask the model for a self-assessed confidence per field. Surface low-confidence fields in the UI for review.
Receipt deduplication
Hash the extracted fields, alert when the same receipt is scanned twice. Common in expense fraud workflows.
Ready to try it?
Clone the repo. Add ANTHROPIC_API_KEY. Drop a receipt photo in. Get JSON back. Five minutes from zero to working.