Most SMEs do not have big data. They have moderate data they think is big.
Volume
Source: My client receipts
50M rows, queried for analytics dashboards plus ad-hoc analysis. Postgres on Neon at £25/month does this fine[1]. DuckDB on S3 with parquet does it for £7/month if read patterns are batchy. BigQuery and Snowflake are both 5-10x more expensive at this scale, with no benefit.
Pick by volume
| Spec | Volume | Tool | Why |
|---|---|---|---|
| <10M rows | Postgres | Single tool, fast queries | |
| 10-100M rows | Postgres + good indexes | Still fits | |
| 100M-1B rows | Postgres + read replicas + partitioning | Stretches | |
| >1B rows or wide aggregates | DuckDB / ClickHouse / BigQuery | Columnar wins | |
| Real-time streaming | Materialize / RisingWave | Niche | |
| Long-history archive | S3/R2 + DuckDB ad hoc | Cheapest tail |
Below 100M rows: just use Postgres. Add the right indexes and the queries are fast enough.
Above 1B rows or with very wide aggregations: columnar databases earn their keep. DuckDB is wildly underrated; for many "I just want to query this big parquet file" tasks it is the answer at near-zero cost.
Real-time streaming is a niche. Most SMEs say "real-time" and mean "within 15 minutes", for which Postgres + a 1-minute ETL job is fine.
My SME recipe
- Postgres for everything until you hit a wall
- Schedule a daily ETL into a reporting schema (still Postgres)
- Add read replicas if dashboards slow the OLTP database
- Add DuckDB-on-S3 for "long history we never query" archive
- Only then consider BigQuery/Snowflake/ClickHouse
I have not yet had an SME client where step 5 was needed. Maybe at year 4 some will.
What I see go wrong
Snowflake bought "for the future." Bill hits £2k/month with no benefit until volumes are right. Cancellation usually within 12 months.
Custom ETL frameworks. Airflow on a cheap box, in-house orchestrator, complex DAGs. For SME volumes, a cron job and 200 lines of SQL beats it.
Real-time obsession. "We need real-time dashboards." No, you need 5-minute dashboards and 15 minutes between data refresh. Saves an order of magnitude in cost.
About the data
A note on what the numbers in this post represent so you can read them with the right confidence:
- "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
- Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
- Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
- Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
- Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.
If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.