Open source · MIT · vision OCR · OFX export

Photo in, validated JSON out.

High-resolution vision OCR constrained by JSON Schema, re-validated by Zod, batched up to fifty files at a time, ready for Supabase, Xero, QuickBooks or any n8n workflow you wire in. Photo in, validated JSON out. OFX statement to follow.

50
files per batch
~2s
per scan
2576px
default max edge
SHA-256
R2 content addressing
MIT
license

Why this exists

Receipt OCR used to be a half-product: OCR a photo into text, then write hundreds of lines of regexes to pull out vendor, total, currency, line items. Half of those regexes failed on the next receipt format and you spent the rest of the year patching edge cases.

Vision models removed the regex layer entirely. A modern high-resolution vision model can read a printed receipt, a thermal print, or a mobile screenshot and emit a structured JSON object directly. The hard problem moves up a level: how do you make sure that JSON is always in the right shape so your database, your accounting export, and your UI can trust it?

The answer is a strict schema boundary. The model is constrained at the API by a JSON Schema, so it is forced to emit exactly the shape your app expects. The output is then re-validated by Zod on the way in. Malformed output is rejected at a single, named boundary before it reaches the UI, an export, or your database.

Receipt Scanner is the working version of that pattern. Photo in. JSON out. OFX out. Forty file lines of business code, fixture-backed tests, batch support, Cloudflare R2 originals, and a documented persistence stub waiting for your Supabase insert. Fork it, point it at your storage and accounting backend, ship.

Why this matters

A schema boundary is the difference between a demo and a system you can trust.

Vision models do the hard part now. The remaining failure mode is shape drift: the model emits something that almost-but-not-quite matches your downstream contract, and a wrong number lands silently in a ledger. Receipt Scanner refuses that path by making the model emit the JSON Schema directly and then re-validating with Zod before the result reaches your database, your UI or an accounting export. Two enforcement points, one shape, no surprises.

Built-in features

Everything below ships in the repository. Clone, set ANTHROPIC_API_KEY, deploy.

High-resolution vision OCR

A vision model is called with the receipt schema enforced as a structured-output constraint. Reads thermal prints, supermarket dot-matrix, mobile screenshots, faded receipts. Swap to any vision provider in lib/vision.ts.

Structured-output boundary

The model is constrained to emit the receipt JSON Schema directly, not free-form text the app has to repair. Output reaches the validator already in the right shape every time.

Zod contract on the way in

Vendor, address, date, time, currency, line items, subtotal, tax, tip, total, payment method. Every field optional. Malformed output is rejected at a single boundary before it reaches the UI, an export, or your database.

sharp pre-processing pipeline

EXIF auto-rotate, downscale to MAX_IMAGE_PX (default 2576 to match the model's high-resolution vision tier), re-encode to JPEG. Biggest cost lever in the whole pipeline.

Batch processing up to 50 files

POST several files to /api/scan/batch. Each is scanned independently with bounded concurrency. One unreadable image returns a per-file error without failing the batch.

Itemised line items

Each line item carries description, quantity, unit price, line total. Captures vendor-specific formatting from Tesco, Sainsbury's, Pret, Costco, and most printed receipts.

OFX 1.0.2 export

POST validated receipts to /api/export/ofx and receive a downloadable .ofx statement. Each receipt is one DEBIT transaction with a stable id so re-exports dedupe in the importer. Imports into Xero, QuickBooks, GnuCash.

Optional Cloudflare R2 originals

Original images stored content-addressed by SHA-256 for audit and deduplication. When R2 is not configured the pipeline runs unchanged and image_key is null.

Content addressing by SHA-256

Every original gets a stable SHA-256 hash regardless of R2 configuration. Re-uploads of the same image are detectable. Audit trails stay deterministic.

Persistence stub plus SQL schema

lib/persist.ts is a no-op save() with the contract documented inline. Replace with a single Supabase insert. The Postgres schema mirroring the Zod contract lives in docs/schema.sql.

Swap vision providers in one file

lib/vision.ts is a single function. Replace its body to call a different model. Same JSON Schema, same validator, same UI. Comparison reference is left in the file.

Vercel-native

sharp ships its Linux binaries automatically on the Vercel Node runtime. The scan routes pin the Node runtime. No build configuration needed.

Wire to Supabase, Xero, n8n

After validation the JSON is yours. Drop in a Supabase insert, wrap in an accounting API, or POST to an n8n workflow. The hard bit is solved at the schema boundary.

CI gate on every push

Type check, lint, test, build run on every push. Test fixtures and an end-to-end test ship in test/. Catches regressions in the vision call, validator, OFX export, and batch fan-out.

Architecture at a glance

One linear pipeline. Upload, store the original, downscale, call the vision model with the schema as a constraint, validate by Zod, return a complete StoredReceipt to the UI.

Scan pipeline

From a phone photo to a validated StoredReceipt. R2 storage and persistence are optional but documented stubs.

rendering
Receipt Scanner: linear pipeline with two optional sinks (R2 originals, persistence stub) and one export route.

Structured-output boundary

Two enforcement points: the vision API constrains the model to emit the JSON Schema, and Zod re-validates the response before the rest of the app sees it.

rendering
Structured output and Zod together: the model is constrained on the way out and re-validated on the way in.

Batch processing

Each file is scanned independently with bounded concurrency. The response is an array of per-file results so one unreadable image never fails the batch.

rendering
Batch upload: fan-out with bounded concurrency, per-file results, partial failures preserved.

Quick start

Clone, set ANTHROPIC_API_KEY, drop in a receipt. R2 is optional, leave the keys empty and the pipeline still works end to end.

# 1. Clone
git clone https://github.com/sarmakska/receipt-scanner.git
cd receipt-scanner

# 2. Install (pnpm lockfile is committed)
pnpm install

# 3. Configure
cp .env.example .env.local
# then edit .env.local and set:
#   ANTHROPIC_API_KEY=sk-ant-...
# optional, enables R2 originals + deduplication:
#   R2_ACCOUNT_ID=...
#   R2_ACCESS_KEY_ID=...
#   R2_SECRET_ACCESS_KEY=...
#   R2_BUCKET=...

# 4. Run
pnpm dev

# 5. Visit http://localhost:3000
#    - drop in one receipt or fifty
#    - see parsed tables
#    - click Export OFX to download a statement

Full walkthrough including env var reference and first scan: Quick-Start wiki page.

The vision call

The whole model call lives in one file. The JSON Schema constraint is the structured-output contract; prompt caching keeps the system prompt cost low across many scans.

// lib/vision.ts (the one place the vision call lives)
import Anthropic from '@anthropic-ai/sdk'
import { RECEIPT_JSON_SCHEMA } from './schema'

const client = new Anthropic()

export async function scanReceipt(imageJpeg: Buffer): Promise<unknown> {
  const response = await client.messages.create({
    model: process.env.VISION_MODEL ?? 'claude-opus-4-7',
    max_tokens: 4096,
    system: [
      {
        type: 'text',
        text: SYSTEM_PROMPT,
        cache_control: { type: 'ephemeral' }, // prompt caching
      },
    ],
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'image',
            source: {
              type: 'base64',
              media_type: 'image/jpeg',
              data: imageJpeg.toString('base64'),
            },
          },
        ],
      },
    ],
    // Structured-output constraint: model must emit the schema
    response_format: {
      type: 'json_schema',
      json_schema: { name: 'Receipt', schema: RECEIPT_JSON_SCHEMA },
    },
  })

  return response.content[0]?.type === 'text'
    ? JSON.parse(response.content[0].text)
    : null
}

The contract

Every field is optional because real receipts are messy. The Zod schema is the single source of truth on the way in. The Postgres schema in docs/schema.sql mirrors it exactly.

// lib/schema.ts (the Zod contract every scan must satisfy)
import { z } from 'zod'

export const LineItemSchema = z.object({
  description: z.string().nullable(),
  quantity: z.number().nullable(),
  unit_price: z.number().nullable(),
  total: z.number().nullable(),
})

export const ReceiptSchema = z.object({
  vendor: z.object({
    name: z.string().nullable(),
    address: z.string().nullable(),
  }),
  transaction: z.object({
    date: z.string().nullable(),    // ISO 8601
    time: z.string().nullable(),    // HH:MM:SS
  }),
  currency: z.string().length(3).nullable(), // ISO 4217
  line_items: z.array(LineItemSchema),
  subtotal: z.number().nullable(),
  tax: z.number().nullable(),
  tip: z.number().nullable(),
  total: z.number().nullable(),
  payment_method: z.string().nullable(),
})

export type Receipt = z.infer<typeof ReceiptSchema>

export type StoredReceipt = Receipt & {
  id: string
  image_key: string | null     // R2 key, null when R2 is not configured
  image_sha256: string         // always computed, used as content address
  created_at: string
}

Use cases

What people actually build with this.

Internal expense workflow

Staff snap receipts on their phone, the scan returns structured fields, the original lands in R2 for audit, and a Supabase insert files the claim. Replace Expensify or Dext for a small team.

Month-end accounting import

Drop a folder into the batch upload, click Export OFX, import the statement into Xero, QuickBooks, or GnuCash. The OFX transaction ids are stable so re-imports dedupe cleanly.

AI bookkeeping prototype

A working OCR baseline you can fork. Add categorisation, VAT handling, HMRC submission on top. Ship in days, not months.

Personal expense tracker

Self-host on Vercel. Receipt to Notion, Google Sheets, or Supabase via the persistence stub. Replace SaaS subscriptions you do not need.

n8n / Zapier automation

POST every validated receipt to an n8n workflow. Branch on vendor, route for approval, push to a spreadsheet, raise a Slack message. The hard bit is the schema-validated JSON, which is solved.

Provider benchmarking

Compare Opus 4.7 against a different vision model on your own receipts. Replace the body of lib/vision.ts, keep the same schema, the UI and validation stay identical.

Tech stack

Next.js 14TypeScriptVision LLM (swappable)Structured outputssharpZodCloudflare R2AWS SDK v3OFX 1.0.2Tailwind CSSVitestVercel

Receipt Scanner vs alternatives

Expensify, Dext, and Mindee are commercial products. Receipt Scanner is an open-source starter you self-host. Rows reflect the public capabilities of each.

CapabilityReceipt ScannerExpensifyDextMindee
Vision OCROpus 4.7 (swappable)Hosted modelHosted modelHosted model
Structured-output enforcementYes, JSON Schema constraintInternalInternalInternal
Schema validation boundaryZod, single boundaryN/AN/AProvided fields
Batch processingUp to 50 per requestYesYesYes
Original image storageCloudflare R2 (optional)HostedHostedHosted
OFX exportBuilt in (Xero, QuickBooks, GnuCash)CSVCSVJSON
Self-hostVercel, Docker, anywhere Node runsNoNoOn-prem (enterprise)
Per-scan costVision API token cost onlyPer-user subscriptionPer-user subscriptionPer call
Source availableYes, MITNoNoNo
LicenseMITCommercialCommercialCommercial

An honest limitations list

Trade-offs you should know about before adopting this starter.

Hand-written receipts are hit and miss

Cursive and low-light handwriting are unreliable. The model returns what it can read and leaves other fields null. The schema marks every field optional for this reason.

Per-scan token cost is small but real

Each scan is one vision API call. Downscaling keeps it cheap but not free. Tune MAX_IMAGE_PX down if you want to trade detail for cost.

No managed SLA

This is an open-source starter, not a hosted product. Buy Expensify or Dext if you want a managed service with support. Fork this if you want to own the pipeline.

Two schema definitions to keep in sync

The hand-written JSON Schema (RECEIPT_JSON_SCHEMA) handed to the model and the Zod schema must stay aligned. The SDK helper that derives one from the other needs Zod 4 and the project pins Zod 3.

Default path is a hosted vision API

The default vision provider runs over the public internet. On-premise inference is possible by swapping the body of lib/vision.ts but the wiring is your task.

In-process pipeline, no queue

Each request scans inline. Fine for typical small-business volume. Put a queue in front of it for sustained heavy load.

Frequently asked

Which vision model does the default ship with?+

A high-resolution frontier vision model with structured-output support. High-resolution support handles printed receipts, faded thermal prints, and mobile screenshots better than the lighter models in head-to-head testing on common UK receipts. Structured-output support means the model is constrained to emit the JSON Schema rather than free-form text the app has to repair. Both features matter at this task. Swap in any other vision provider by replacing the body of lib/vision.ts.

How much does each scan cost?+

It is one vision API call. Downscaling to MAX_IMAGE_PX (2576px default) and JPEG re-encode keeps image tokens low. Pricing depends on the provider and your model choice; for typical receipts the cost lands in low single-digit pence territory. Set MAX_IMAGE_PX lower for cheaper scans at the cost of detail.

Do I need Cloudflare R2?+

No. The R2 storage step is a no-op when the R2_* env vars are unset and the rest of the pipeline runs unchanged. R2 stores the original image content-addressed by SHA-256, which is useful for audit and deduplication but not required to validate a receipt.

How is malformed output handled?+

The model is constrained at the API by a JSON Schema, so it is forced to emit the right shape. The response is then re-validated by the Zod schema in lib/schema.ts on the way in. Anything that fails Zod is a boundary violation, surfaced as an error rather than passed through. The hand-written JSON Schema and the Zod schema must be kept in sync because the SDK's helper that derives one from the other requires Zod 4 and the project pins Zod 3.

Can it read hand-written receipts?+

Sometimes. Hand-written cafe slips work better than you might expect. Cursive or low-light handwriting is unreliable. The model returns what it can read and leaves other fields null. The schema marks every field optional for this reason.

What does the OFX export look like?+

OFX 1.0.2 bank statement format. Each receipt becomes one DEBIT transaction with a stable FITID derived from the receipt id, so re-exports dedupe in the importer rather than creating duplicates. Vendor name maps to NAME, total maps to TRNAMT, date maps to DTPOSTED. Multi-currency is preserved.

How does batch upload behave on partial failure?+

The batch endpoint runs each scan independently with bounded concurrency. The response is an array of per-file results, each carrying ok: true with the StoredReceipt or ok: false with an error. One unreadable image never fails the batch.

How do I wire it into my real backend?+

lib/persist.ts is a stub save(). Replace its body with one Supabase insert against the tables in docs/schema.sql, or POST the JSON to an n8n workflow, or call an accounting API directly. The schema is the contract; the persistence target is your choice.

Open source · MIT

Use it. Fork it. Ship it.

MIT licensed. Pull requests welcome, especially around vendor-specific format quirks, additional accounting exports, and provider adapters for other vision models.

Ready to ship receipt OCR?

Clone the repo, set ANTHROPIC_API_KEY, drop in a receipt. Validated JSON in two seconds, OFX export ready to import into Xero or QuickBooks.

All open-source projects