Open source storage engine/MIT/Standard library only

lsmdb, an LSM-tree you can actually read

A log-structured merge-tree storage engine in pure Go. Write-ahead log with CRC framing, skip-list MemTable, block-based SSTables with per-table bloom filters, levelled compaction, MVCC snapshots over a 56-bit sequence.

Small enough to read end to end in an afternoon. Real enough to stress with millions of writes. The Go standard library and one go.mod with no require block.

View on GitHub Read whitepaper How it works

At a glance

~2.9k

lines of Go

public methods

2.15us

SSTable point read

2.82ms

durable Put with fsync

MIT

license

WAL fsynced before Put returns

Torn trailing record dropped on replay

Snapshot reads under a fixed sequence

go test ./... clean, race-free

Why this exists

I have read the LevelDB and RocksDB source more times than I can count, and every time I came away with the same feeling: I understood the diagram but not the code. The papers explain the idea of an LSM-tree in a page. The production engines bury that idea under twenty years of flags, formats and compaction strategies.

I wanted one I could hold in my head, where I could point at the exact lines that make a write durable, that decide which table a read touches, that drop a tombstone at the right moment and not a moment sooner. So I wrote lsmdb top to bottom with the Go standard library and nothing else.

Every hard part is implemented for real: the WAL with CRC framing and torn-tail recovery, the block-based table format with a sparse index and a per-table bloom filter, the heap-based merging iterator, levelled compaction with correct tombstone handling, and MVCC snapshots over monotonic sequence numbers. It is a portfolio engine and a teaching engine, not a RocksDB replacement, and I have tried to be honest about that line throughout.

Why this matters

A complete LSM-tree in ~2,900 lines

Production LSM engines carry twenty years of flags, formats and compaction strategies on top of an idea the paper covers in a page. lsmdb strips back to the core that defines the data structure: WAL with torn-tail recovery, block-based SSTable with a sparse index and a per-table bloom filter, heap-based merging iterator, levelled compaction with correct tombstone handling, MVCC over monotonic sequences. Every box on every paper diagram corresponds to a real Go file you can open.

What is in there

No part is faked or stubbed. Each piece corresponds to a real file and a real test.

Write-ahead log with CRC framing

Every mutation is appended to the WAL and fsynced before Put returns. Records are length-prefixed and CRC32C-checked (Castagnoli). A torn trailing record from a crash is detected by short read or CRC mismatch and dropped on replay, so an uncommitted write never resurrects.

Skip-list MemTable

The in-memory write buffer is a probabilistic skip list with O(log n) insert and search. A single writer appends while concurrent readers walk forward, which is the access pattern the engine actually needs. Lives in internal/skiplist and internal/memtable.

Block-based SSTable format

Immutable on-disk tables with a sparse block index, a properties block, an 8-byte magic footer and a per-table bloom filter. Whole keys are stored in data blocks, not prefix-compressed, which keeps the block reader a straight scan with no restart-point bookkeeping.

Bloom filter, one percent target

Per-table double-hashing bloom filter sized from m/n = -ln(p)/ln(2)^2 with a default false-positive rate of 0.01. A point read that misses the filter skips the table entirely. The test suite holds the observed false-positive rate within bounds across twenty thousand probes.

Levelled compaction, newest wins

L0 holds overlapping MemTable flushes. L1 through L6 are disjoint and each ten times larger than the one above. pickCompaction triggers on L0 table count or per-level byte budget, runCompaction merges newest-wins through a heap-based iterator and drops tombstones only at the bottom level.

MVCC snapshots over a 56-bit sequence

Every key is stored as an internal key: user key followed by an 8-byte trailer packing a 56-bit monotonic sequence in the high bits and a Kind byte in the low bits. A Snapshot captures lastSeq; reads through it see only versions with seq less than or equal to the snapshot.

Append-only manifest, atomic swap

manifest.go writes newline-delimited JSON edits. A compaction records both the added outputs and the deleted inputs in one fsynced edit, so the swap is atomic against a crash. The manifest is small, human-readable and the natural commit point for every change to the level layout.

Heap-based merging iterator

iterator.go runs a min-heap over every sorted source (active MemTable, immutable MemTable, every L0 table, every L1 through L6 table the read touches). The first source that holds any version of a key, live or tombstone, decides the result, because newer always shadows older.

Standard library only

No external dependencies. crypto, encoding/binary, hash/crc32, os, sync, sort. go build ./... produces a static binary with no surprises. The whole engine sits in one package plus internal/ and you can read it in an afternoon.

Wiki goes byte-deep

25 wiki pages cover the SSTable byte layout, the WAL framing, the bloom-filter sizing, the internal-key trailer, the compaction state machine, the manifest format and the recovery walkthrough. Every page references real Go file paths and struct names.

Write path, end to end

WAL fsync, MemTable, flush to L0 SSTable, manifest commit, compaction.

rendering

Write path: WAL fsync, memtable, flush to L0 SSTable, manifest commit, levelled compaction.

Tech stack

Go 1.26+Standard library onlycrc32 CastagnoliSkip listBloom filterBlock-based SSTableAppend-only manifestMVCC sequences

The source tree

One package at the root, four subsystems under internal/. No external dependencies.

lsmdb/
  db.go                  Open, Close, Put, Delete, Get, write path
  compaction.go          pickCompaction, runCompaction, overlap policy
  iterator.go            mergingIterator over every sorted source
  public_iterator.go     Snapshot, Iterator, snapshot-filtered reads
  manifest.go            append-only edit log, atomic commit point
  record.go              WAL record encoding
  internal/wal/          Writer, Reader, CRC32C framing
  internal/skiplist/     Skip-list primitives
  internal/memtable/     MVCC-aware MemTable
  internal/sstable/      writer.go, reader.go, format.go, block-based table
  internal/bloom/        Double-hashing bloom filter
  internal/encoding/     InternalKey, MakeInternalKey, CompareInternal
  cmd/lsmdb-demo/        End-to-end demo binary

Quick start

Seven methods. Open, Close, Put, Delete, Get, NewIterator, Snapshot.

git clone https://github.com/sarmakska/lsmdb.git
cd lsmdb
go build ./...
go test ./...
go run ./cmd/lsmdb-demo          # end-to-end demo

db, _ := lsmdb.Open("./data", lsmdb.Options{})
db.Put([]byte("greeting"), []byte("hello"))
snap := db.Snapshot()                  // freeze a view
db.Put([]byte("greeting"), []byte("updated"))

live, _ := db.Get([]byte("greeting"))    // updated
old, _  := snap.Get([]byte("greeting"))  // hello

Use cases

Who this is actually for.

Reference engine for an interview or whiteboard

The whole engine is small enough that you can point at the exact lines that make a write durable, decide which table a read touches, or drop a tombstone at the right moment. That is the production-engine internals interview, with code you can read.

Embedded key-value store for a Go service

Open a directory, get a DB. Put, Delete, Get, NewIterator, Snapshot. A successful Put is durable across a crash. No server, no network, no daemon. Suitable for a CLI cache, a local-first app, or a test harness that wants real persistence.

Teaching the LSM-tree top to bottom

Every paper diagram corresponds to a real file: write path in db.go, compaction in compaction.go, levels in the same, internal keys in internal/encoding, the WAL framing in internal/wal. Run the test suite and watch every box light up.

Foundation for raftkv, the distributed sibling

lsmdb is the local state machine that the raftkv project replicates with Raft. Two repos, one for the engine, one for the consensus layer; both small enough to hold in your head at the same time.

Open source · MIT

Read it. Fork it. Learn it.

MIT licensed. Standard library only. Roadmap includes batched writes behind one fsync, background flush and compaction, snapshot pinning, and an orphaned-file sweep on open.

Star on GitHub Read the wiki

Ready to read the code?

View on GitHub Read whitepaper How it works Hire me to build with it

All open-source projects