Open Source · MIT License · Production-shaped starter

RAG-over-PDF

A minimal, production-shaped RAG starter. Upload a PDF, ask questions, get cited streaming answers. The cleanest end-to-end RAG you can clone, run, and ship in 10 minutes. No vector DB to provision, no Pinecone account, no LangChain weight. Just the moving parts.

~£0.001
per question
<3s
streaming answer
1536
embedding dims
200ms
cosine on 1k chunks
MIT
license

Why this exists

Every product team eventually needs to "chat with our docs". The default response is to reach for LangChain, spin up Pinecone, write 400 lines of glue code, and ship something nobody understands six months later.

Most of that complexity is not load-bearing. The actual moving parts of a working RAG system are: chunk the document, embed the chunks, embed the question, take the most similar chunks, stuff them in the prompt. That is roughly 600 lines of TypeScript.

RAG-over-PDF is that 600 lines, written cleanly, with no framework hiding the moving parts. Clone it. Read it. Ship it. Swap the in-memory store for pgvector when you outgrow it. Add re-ranking when you measure that you need it. Do not pay framework tax up front.

Built-in features

Everything below works out of the box. Clone, add an OpenAI key, deploy.

PDF parsing in pure JS

pdf-parse handles 95% of real-world PDFs with no native bindings, no Docker, no OCR setup. Drop a PDF, extract text in milliseconds.

Fixed-size chunking with overlap

1,000-char chunks with 200-char overlap. Sentences that span boundaries are still findable in either chunk. Tunable via env vars.

OpenAI text-embedding-3-small

1536-dimensional vectors at £0.000016 per 1k tokens. Indexing a 500-page PDF costs about 2p. Override the model with one env var.

In-memory cosine retrieval

Zero infrastructure. Top-5 chunks in 2-12ms for sub-1k chunk corpora. The whole vector store is a 30-line module you can swap in an afternoon.

Streaming answers via SSE

gpt-4o-mini generates token by token through the App Router stream API. Time-to-first-token: 600-900ms. Users perceive responsiveness, not latency.

pgvector when you outgrow memory

The vector store interface has three methods: add, search, clear. Replace the body with Postgres calls — the retrieval pipeline does not care.

Grounded answers, no hallucination prompts

System prompt pins the model to the retrieved chunks only. If the answer is not there, the model says so plainly. Pin and test, do not trust.

TypeScript end to end

Strict mode. Every chunk, embedding, and message is typed. Schema-first means provider API changes break the build, not your users.

Wiki with the full theory

How RAG works, architecture diagrams, cost and performance, swap-to-pgvector walkthrough. Read the whole thing in 25 minutes.

One-click Vercel deploy

Vercel ships sharp natively, App Router supports streaming on the free tier, and the only secret is your OpenAI key. From clone to live in 60 seconds.

Tech stack

Next.js 14TypeScriptOpenAIpdf-parsecosine similarityTailwind CSSVercelpgvector (optional)

Architecture sketch

Two API routes. One in-memory store. One UI page. That is the whole thing.

┌─────────────────────────────────────────────────────────────┐
│  Indexing (POST /api/upload)                                │
│    Browser ──FormData(file)──▶ Route handler                │
│       │                                                     │
│       ▼ pdf-parse(buffer)            // pure JS, no deps    │
│       ▼ chunk(text, 1000, 200)       // overlap window      │
│       ▼ openai.embeddings(chunks)    // 1536 dims           │
│       ▼ vectorStore.add(vectors)     // in-memory cosine    │
│       ▼ 200 OK { chunks: 47 }                               │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  Question (POST /api/chat)                                  │
│    Browser ──{ question }──▶ Route handler                  │
│       │                                                     │
│       ▼ openai.embeddings([question])                       │
│       ▼ vectorStore.search(qVec, k=5)                       │
│       ▼ buildPrompt(system, chunks, question)               │
│       ▼ openai.chat.stream(prompt)   // gpt-4o-mini         │
│       ▼ SSE tokens ──▶ Browser                              │
└─────────────────────────────────────────────────────────────┘

Quick start

From clone to running locally in four commands. From running to deployed in another two.

git clone https://github.com/sarmakska/rag-over-pdf.git
cd rag-over-pdf
pnpm install
cp .env.example .env.local
# Add OPENAI_API_KEY to .env.local
pnpm dev

Open http://localhost:3000, upload a PDF, ask a question. That is the loop.

Use cases

What people actually build with this.

Internal docs chat

"Make our 200 internal PDFs searchable." Index policies, runbooks, contracts. Cite the exact passage.

Customer support copilot

Ground answers in your real product docs, not the model's training data. Update docs, re-index, done.

Research assistant

Skim 50-page papers in seconds. Top-5 chunk retrieval is precise enough for academic prose without re-ranking.

Learning RAG end-to-end

Read 600 lines of TypeScript and understand every moving part. No framework hiding the indexing or retrieval steps.

Open source · MIT

Use it. Fork it. Ship it.

MIT licensed. No strings attached. Attribution appreciated, not required. Pull requests welcome — chunking strategies, re-rankers, citation rendering, local-embedding adapters all wanted.