Most SMEs do not have big data. They have moderate data they think is big.

Volume

Chart

12-month data infra cost, 50M-row workload

Source: My client receipts

50M rows, queried for analytics dashboards plus ad-hoc analysis. Postgres on Neon at £25/month does this fine^[1]. DuckDB on S3 with parquet does it for £7/month if read patterns are batchy. BigQuery and Snowflake are both 5-10x more expensive at this scale, with no benefit.

Pick by volume

Tool by data volume

Spec	Volume	Tool
<10M rows	Postgres	Single tool, fast queries
10-100M rows	Postgres + good indexes	Still fits
100M-1B rows	Postgres + read replicas + partitioning	Stretches
>1B rows or wide aggregates	DuckDB / ClickHouse / BigQuery	Columnar wins
Real-time streaming	Materialize / RisingWave	Niche
Long-history archive	S3/R2 + DuckDB ad hoc	Cheapest tail

Below 100M rows: just use Postgres. Add the right indexes and the queries are fast enough.

Above 1B rows or with very wide aggregations: columnar databases earn their keep. DuckDB is wildly underrated; for many "I just want to query this big parquet file" tasks it is the answer at near-zero cost.

Real-time streaming is a niche. Most SMEs say "real-time" and mean "within 15 minutes", for which Postgres + a 1-minute ETL job is fine.

My SME recipe

Postgres for everything until you hit a wall
Schedule a daily ETL into a reporting schema (still Postgres)
Add read replicas if dashboards slow the OLTP database
Add DuckDB-on-S3 for "long history we never query" archive
Only then consider BigQuery/Snowflake/ClickHouse

I have not yet had an SME client where step 5 was needed. Maybe at year 4 some will.

What I see go wrong

Snowflake bought "for the future." Bill hits £2k/month with no benefit until volumes are right. Cancellation usually within 12 months.

Custom ETL frameworks. Airflow on a cheap box, in-house orchestrator, complex DAGs. For SME volumes, a cron job and 200 lines of SQL beats it.

Real-time obsession. "We need real-time dashboards." No, you need 5-minute dashboards and 15 minutes between data refresh. Saves an order of magnitude in cost.

About the data

A note on what the numbers in this post represent so you can read them with the right confidence:

"My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.

If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.

References

[1]
DuckDB documentation
https://duckdb.org

Comments

Sign in to comment, reply, and like.

By signing in, Sarma will receive your name, avatar, email, sign-in provider, and approximate location (country/city, derived from your IP) for moderation and reply purposes. None of this is shown publicly, only your name and avatar appear on the post. No newsletter, no marketing, no third-party sharing.

Loading comments…

S

Sarma

Independent software engineer, AI systems, automation platforms, and modern infrastructure.

LinkedIn More posts →

Work with Sarma

Have a project in mind?

I take on a small number of projects each quarter, AI systems, automation, infrastructure, and full-stack engineering.

Get in touch

Volume#

Pick by volume#

My SME recipe#

What I see go wrong#

About the data#