All playbooks
AI engineering
25 min read

Building a Claude-powered receipt scanner

Receipt extraction is the boring, useful, perfectly-shaped first AI project. It has a clear input, a clear output, an enormous amount of variation, and an obvious win. Here is the production version I have shipped twice.

Why this is the right "first AI tool"

Every business with more than a few employees has receipts piling up somewhere — Dropbox, an inbox, a shoebox. Someone, usually the wrong someone, types them into a spreadsheet. The cost of this work is invisible until you measure it; once you do, it is always more than you thought.

A Claude-powered scanner that takes a photo or PDF, returns a structured row, and writes it into Supabase removes that work. It is a perfect "first LLM project" because: the input is bounded, the output is structured, the failure modes are visible, and the value is denominated in hours saved per month.

Pick a problem where the LLM is the smallest interesting part of the system. Build the dull plumbing well.

Shape of the system

Three components. Keep them simple.

  • An ingest endpoint. Accepts a file (image or PDF), stores it in Supabase Storage, queues a job.
  • A worker. Pulls the job, sends the image to Claude with a tool definition, parses the structured response, validates it, writes a row into the receipts table.
  • A review UI. Lists receipts with a confidence score; humans correct the low-confidence ones.

That review UI is the part most teams skip. Do not skip it. The system is a partnership with humans, not a replacement for them.

A schema receipts actually fit

Receipts are deceptively non-uniform. Petrol stations, restaurants, taxis and Amazon warehouses all produce things that legally count as receipts and look nothing alike. Design your schema to be flexible at the edges.

sql
create table public.receipts ( id uuid primary key default gen_random_uuid(), user_id uuid not null references auth.users on delete cascade, storage_path text not null, vendor_name text, vendor_address text, purchased_at timestamptz, currency char(3), subtotal_cents bigint, tax_cents bigint, total_cents bigint not null, payment_method text, category text, raw_text text, line_items jsonb not null default '[]'::jsonb, confidence numeric(3, 2), needs_review boolean not null default true, created_at timestamptz not null default now() ); alter table public.receipts enable row level security; create policy "users see own receipts" on public.receipts for all using (auth.uid() = user_id);

The prompt and tool definition

Use Claude's tool-use API and define the receipt as a tool the model "calls". This is dramatically more reliable than asking for JSON in the response and parsing it. The tool definition is the contract; the model is forced to fill it.

typescript
import Anthropic from '@anthropic-ai/sdk' const client = new Anthropic() const RECEIPT_TOOL = { name: 'record_receipt', description: 'Record the structured contents of a receipt image.', input_schema: { type: 'object' as const, properties: { vendor_name: { type: 'string' }, vendor_address: { type: 'string' }, purchased_at: { type: 'string', description: 'ISO-8601 timestamp. If only a date is visible, use 12:00 local time.', }, currency: { type: 'string', description: 'ISO-4217 code, e.g. GBP, USD.' }, subtotal_cents: { type: 'integer' }, tax_cents: { type: 'integer' }, total_cents: { type: 'integer' }, payment_method: { type: 'string' }, category: { type: 'string', enum: ['food', 'travel', 'fuel', 'office', 'software', 'other'], }, line_items: { type: 'array', items: { type: 'object', properties: { description: { type: 'string' }, quantity: { type: 'number' }, unit_price_cents: { type: 'integer' }, total_cents: { type: 'integer' }, }, required: ['description', 'total_cents'], }, }, confidence: { type: 'number', description: '0.0 to 1.0. Be conservative. If anything is unclear or hand-written, use 0.6 or below.', }, }, required: ['total_cents', 'currency', 'confidence'], }, }

The system prompt should be short and opinionated. Long prompts do not make Claude better at receipts; they make it slower and more expensive.

typescript
const SYSTEM_PROMPT = `You extract structured data from receipt images. Rules: - Always express money in the smallest unit (cents/pence). 12.50 GBP -> 1250. - If you cannot read a field clearly, omit it. Do not guess. - Drop confidence below 0.5 if the image is partially obscured, hand-written, or in a script you cannot fully read. - Always call the record_receipt tool exactly once. Do not return free text.`

Ingest, validate, store

The worker is the meat. It downloads the image from Supabase Storage, sends it to Claude with the tool definition, validates the response, and inserts a row.

typescript
import { createClient } from '@/lib/supabase/server' export async function processReceipt(receiptId: string, storagePath: string) { const supabase = await createClient() const { data: file } = await supabase.storage .from('receipts').download(storagePath) if (!file) throw new Error('file not found: ' + storagePath) const buffer = Buffer.from(await file.arrayBuffer()) const response = await client.messages.create({ model: 'claude-sonnet-4-5', max_tokens: 1024, system: SYSTEM_PROMPT, tools: [RECEIPT_TOOL], tool_choice: { type: 'tool', name: 'record_receipt' }, messages: [{ role: 'user', content: [ { type: 'image', source: { type: 'base64', media_type: file.type as 'image/jpeg', data: buffer.toString('base64'), }}, { type: 'text', text: 'Extract the receipt.' }, ], }], }) const toolUse = response.content.find((c) => c.type === 'tool_use') if (!toolUse || toolUse.type !== 'tool_use') { throw new Error('no tool call returned') } const parsed = ReceiptSchema.parse(toolUse.input) // zod await supabase.from('receipts').update({ ...parsed, needs_review: parsed.confidence < 0.85, }).eq('id', receiptId) }

Validate with Zod. The tool schema and the Zod schema should look almost identical, and the Zod one wins if they disagree. Validating after the model means a single hallucinated field becomes a job-failure you can retry, not bad data in your database.

The receipts that ruin your day

Reality, in roughly the order it will hit you:

  • Multi-currency receipts. Dual-priced petrol-station receipts in border towns. Always trust the visible currency symbol over a guess.
  • Long thermal-paper receipts. Photographed in three vertical sections. Send all three images in one message and Claude will stitch them.
  • Hand-written receipts. Driver tips, market stalls. Confidence drops, route to human review.
  • PDFs of digital receipts. Convert to images first; Claude's PDF support is fine but image input is faster and cheaper for single-page receipts.
  • Photos of screens. Reflections, moiré, screen flicker. Claude handles these surprisingly well; worry less about this than you think.

Evals, not vibes

A handful of curated receipts run as a CI gate is worth more than a hundred ad-hoc spot checks. Build a fixtures directory. Each fixture is an image plus the expected JSON. Run the pipeline against all of them on every prompt or model change and assert that nothing regressed.

typescript
// evals/receipts.test.ts import { describe, expect, it } from 'vitest' import fixtures from './fixtures.json' import { extractReceipt } from '@/lib/receipts/extract' describe('receipt extraction', () => { for (const f of fixtures) { it(f.name, async () => { const result = await extractReceipt(f.imagePath) expect(result.total_cents).toBe(f.expected.total_cents) expect(result.currency).toBe(f.expected.currency) expect(result.confidence).toBeGreaterThan(0.7) }) } })

Twenty fixtures is a useful starting set. Add one every time the system gets a receipt wrong in production. After three months you have a regression suite that reflects your actual receipt distribution.

Cost and latency

At time of writing, Sonnet handles a single-image receipt for a small fraction of a penny. That is well below the cost of a human typing the same data. Where you can lose money is on retries and on people uploading entire bound stacks of paper as one PDF.

  • Cap pages. Reject any PDF over ten pages at the API boundary.
  • Resize before sending. Anything over 1568px on the long edge is wasted; resize on the worker.
  • Cache nothing. Receipts are one-shot; there is no benefit to prompt caching for this workload.
  • Use Haiku for re-extractions. If a human edits a field, do not re-run Sonnet over the whole image.

Pitfalls

Trusting confidence as a single number

A model can be wrong with high confidence. Use confidence as one of several signals — alongside total-vs-line-items reconciliation and known-vendor checks.

Storing money as floats

Use integers in the smallest currency unit. The day a receipt totalling 19.99 stores as 19.989999... is the day you lose a customer.

No PII handling

Receipts contain card numbers, names, addresses. Encrypt the storage bucket, restrict access by RLS, and decide up front whether the LLM provider may train on this data (it should not).

Forgetting the human queue

Auto-accepting low-confidence extractions creates a dataset of subtly-wrong rows that nobody notices until tax time. The needs_review queue is the most important UI in the product.

Wrap-up

The receipt scanner is a small, complete, useful AI product. The model is the easy bit. The schema, the validation, the storage, the review queue and the eval suite are the parts that make it production software instead of a demo.

Build it once, properly. Then notice that the same shape — file in, structured row out, human-reviewed when uncertain — describes about a third of the AI projects you will ever ship.

Want this done for you?

If you would rather skip the YAK shave and have someone who has done this fifty times set it up properly, that is what I do for a living.

Start a project