raftkv, the Raft repo that judges itself
A from-scratch Raft key-value store in Go with a Jepsen-style fault-injection harness and a Wing and Gong linearizability checker. Every consistency claim ships with a recorded history a checker has accepted.
Leader election with pre-vote, log replication with fast conflict backtracking, the read-index path with leader leases, crash-safe append-only log, snapshots and InstallSnapshot. Standard library, no go get.
Why this exists
I had implemented Raft once before, the way most people do. Follow the paper, write a few tests that elect a leader and replicate a value, watch them go green, move on. Months later I genuinely could not tell you whether that implementation was correct under a partition, because I had never put it under one.
This time I inverted the priorities. I wrote the linearizability checker and the fault harness first, treated them as the product, and built the Raft core to satisfy them. The rule I held myself to was that no consistency claim ships without a recorded history a checker has accepted.
The result is a from-scratch Raft with leader election, log replication, persistence and snapshots, sitting under a Jepsen-style nemesis and a Wing and Gong checker, on the Go standard library with nothing to go get.
The checker is the product
Most Raft implementations stop at the happy-path tests. raftkv inverts the priorities. The linearizability checker and fault harness were written first, treated as the deliverable, and the Raft core was built to satisfy them. No consistency claim ships without a recorded history a checker has accepted. If the cluster ever lies, the verdict turns red.
What is in the box
Every claim below is backed by a test in the repo. go test -race ./... is clean.
Raft consensus from scratch
Randomised-timeout leader election with a pre-vote phase, log replication with fast conflict backtracking, the term and commit-index commitment rules from the paper, all in the raft/ package. Reads next to the pseudocode.
Crash-safe persistence
A hand-rolled, length-prefixed, CRC-checked append-only log. TestTornTrailingRecordDiscarded appends garbage mid-write and asserts the log drops the torn record cleanly on reopen, keeping the good entries.
Snapshots and log compaction
InstallSnapshot ships state to a lagging follower and lets the leader truncate its log. TestSnapshotInstall drives a follower behind the snapshot point and asserts it catches up.
Jepsen-style nemesis
fault/ ships an Injector that partitions, drops, delays and reorders messages, and a Nemesis that schedules faults against a live cluster. Installed as a Filter on the in-memory network, the Raft code never knows it is being attacked.
Wing and Gong linearizability checker
linz/ records every client invoke and return into a History and runs a per-key backtracking search with memoisation on (linearized set, model state). Verdict is one line: linearizable=true or linearizable=false with the offending key.
Read-index for linearizable reads
raft/read.go implements the read-index path from paper section 6.4. A leader lease strictly shorter than the minimum election timeout serves reads locally; outside the lease the leader pays for a heartbeat round. Correctness never rests on the clock alone.
One mutex per node
A single sync.Mutex per node with Locked-suffixed helpers, so the lock discipline is obvious and the code maps onto the paper. RPC fan-out happens off the lock to avoid stalling the cluster.
Leader-aware client
cluster/client.go retries the leader on redirects, drives Put/Get/Delete through the read-index path, and writes every invoke and return into the history the checker judges.
Standard library only
go build ./... has nothing to fetch. The full stack, consensus, transport, fault injection, checker, is the Go standard library and one go.mod with no require block.
One-command chaos demo
go run ./cmd/raftkvd -nodes 5 -ops 200 boots a 5-node cluster, starts the nemesis, drives a workload through the leader-aware client and prints the verdict. Reproducible end to end.
Tech stack
Architecture sketch
A cluster, an in-memory network with a fault seam, a checker sitting in judgement.
Quick start
Go 1.26+, nothing to fetch. The whole demo runs in one terminal.
git clone https://github.com/sarmakska/raftkv && cd raftkv go build ./... # standard library only go test ./... # 27 tests, race-clean go run ./cmd/raftkvd -nodes 5 -ops 200 # 5-node cluster under chaos
raftkv: started 5-node cluster in /tmp/raftkvd... raftkv: leader elected: node 2 raftkv: nemesis running (partitions, delays, crashes) raftkv: ran 200 operations raftkv: history is LINEARIZABLE
Who this is for
A teaching-grade, correctness-first implementation. Not a production data store.
Teach yourself consensus
Most Raft tutorials stop at "the test passed". This one inverts the priorities, the checker is the product and the Raft core exists to satisfy it. Reading the linz/ checker first changes how you read every Raft paper after.
A reference for your own implementation
If you are writing Raft in another language, the test names map directly onto the invariants you have to defend: election, partition, heal, leadership change, disk recovery, snapshot install, torn write.
Interview and study material
TestLinearizableUnderChaos and the Wing and Gong checker are good fodder for distributed-systems interviews. Both are small enough to read end to end in a single session.
A seam to bolt a real transport onto
cluster/network.go is the seam. Slot in a gRPC or TCP transport behind the Transport interface and the nemesis, checker and Raft core are unchanged.
Read it. Fork it. Break it.
MIT licensed. Standard library only. A real network transport behind the existing seam and a recorded chaos run with the seed checked into the repo are on the roadmap.