Open Source · MIT · Helm + Bash + ArgoCD

Next.js on Kubernetes, production-grade in five commands.

A Helm chart for the app and a version-pinned bootstrap for the platform. Ingress, TLS, autoscaling, metrics, logs, alerts, spend tracking and an optional GitOps path. None of the yaml on your side.

5
cmds to live
~8 min
platform install
6
pinned upstream charts
2
grafana dashboards
MIT
licence

Why this exists

Most teams reach for Kubernetes when they outgrow Vercel or want to cut a hosting bill that has stopped making sense. Then they spend two weeks configuring the same things every other team configures: ingress, cert-manager, monitoring, logging, autoscaling, secrets. The end result is fine; the path to it is a waste.

The big ecosystems solve much larger problems. Argo and Crossplane bring serious machinery for serious orgs. Backstage brings a developer portal. The lighter starters often skip observability entirely and leave the next person to wire metrics by hand.

The toolkit is the middle. A small Helm chart you can read in twenty minutes, a single version-pinned bash installer for the platform, an ArgoCD app-of-apps if you prefer GitOps, and the dashboards and alert rules already opinionated for Next.js workloads. The week you would have spent, given back.

Why this matters

The first week on any new cluster is identical across teams: ingress, TLS, autoscaling, metrics, logs, alerts. Burning it every project is a tax. The toolkit pays that tax once, in public, and pins the answers so every cluster after this one inherits them. The time you keep is the entire point.

What is in the box

Everything below is pinned, tested, and wired together by the installer. Nothing here is aspirational.

Helm chart for Next.js

Deployment with tuned rolling update strategy, hardened pod and container security context, ClusterIP service, ingress with cert-manager TLS, HorizontalPodAutoscaler, PodDisruptionBudget, liveness and readiness probes on /api/health, and a Prometheus ServiceMonitor scraping /api/metrics.

cert-manager with Let's Encrypt

The installer creates the Let's Encrypt production ClusterIssuer and wires every ingress to request a real certificate. Automatic renewal. A bundled alert fires when a certificate is within fourteen days of expiry.

kube-prometheus-stack

Prometheus, Grafana, Alertmanager and node-exporter installed from the upstream community chart at a pinned version. ServiceMonitor on the chart picks up app metrics without extra config.

Loki 3.x with Promtail

Logs go from stdout to Promtail to Loki, queryable from Grafana with the same Explore UI as metrics. Labels keep the index small; the bulk lives on object storage.

Alertmanager rules

Bundled PrometheusRule covers crash-looping pods, ingress-nginx 5xx spikes above five percent, p99 latency above two seconds, certificate expiry inside fourteen days, and PV space predicted to exhaust within six hours. Slack webhook wired by the installer.

HPA on CPU by default

Default min 2, max 10, target 70 percent CPU. A documented pattern in the wiki swaps in custom-metrics HPA via the ServiceMonitor for requests-per-second autoscaling when you need it.

ingress-nginx, documented

The ingress everyone runs. The chart documents annotations for body-size limits, websocket support, redirects, and per-host TLS. ingress-nginx is the LoadBalancer-typed entry point.

PodDisruptionBudget on by default

minAvailable: 1 keeps a floor of replicas during voluntary disruptions such as node drains and cluster upgrades. Disable per-release if a workload prefers full availability over safe drains.

No service mesh by design

For a small fleet of Next.js workloads, mesh complexity is rarely worth the operational cost. The toolkit deliberately does not ship one. You add Linkerd or Istio when you have a reason.

Plain Helm, no operator

You can read every template, copy it, fork it. No CRDs to learn beyond what cert-manager and Prometheus already require. No hidden state.

OpenCost spend dashboard

OpenCost is pinned and installed with the rest. A bundled Grafana dashboard breaks down cluster spend by namespace and workload so you can see where the money goes.

ArgoCD app-of-apps for GitOps

Prefer reconciliation from git over a bash installer? The same components are described as an Argo Application set under gitops/argocd. Apply once, git is the source of truth.

End-to-end pytest suite

A real Helm render is fed into pytest fixtures that assert pod-selector match, service target-port wiring, TLS wiring, gating of optional objects, and version parity between the installer and the GitOps Applications.

Version-pinned everything

Every upstream chart version is declared in scripts/install.sh and mirrored in the Argo Applications. The same command produces the same platform every time.

Cluster topology

Internet traffic enters through ingress-nginx, lands on the Next.js pods, and produces metrics and logs that Prometheus, Loki and OpenCost feed back into Grafana and Alertmanager.

rendering
Cluster topology: ingress-nginx fronts the Next.js pods; Prometheus, Loki and OpenCost feed Grafana; Alertmanager routes to Slack; ArgoCD optionally reconciles the platform from git.

Request path

What a user request actually touches, from the LoadBalancer to the streamed response.

rendering
Request path: ingress-nginx terminates TLS via cert-manager, routes by host header, and the pod emits metrics and logs out of band.

Quick start

Five commands. About eight minutes from an empty cluster to ingress, TLS, metrics, logs, alerts, spend tracking, and your first app running.

01Clone the repo
git clone https://github.com/sarmakska/k8s-ops-toolkit.git
cd k8s-ops-toolkit
export KUBECONFIG=~/.kube/your-cluster.yaml
02Bootstrap the platform
./scripts/install.sh \
  --domain example.com \
  --email you@example.com \
  --slack-webhook https://hooks.slack.com/services/...
03Load the bundled dashboards
./scripts/load-dashboards.sh
# Cluster Overview, Next.js app, OpenCost spend installed as sidecar ConfigMaps
04Deploy your Next.js app
helm install my-app ./charts/nextjs-app \
  --set image.repository=ghcr.io/you/my-app \
  --set image.tag=v1.0.0 \
  --set ingress.host=app.example.com \
  --set replicas=3
05Open Grafana
kubectl -n monitoring port-forward svc/kube-prometheus-stack-grafana 3000:80
# user: admin · pwd: kubectl -n monitoring get secret kube-prometheus-stack-grafana -o jsonpath='{.data.admin-password}' | base64 -d

A real values.yaml

The actual default values that ship with charts/nextjs-app. Sensible defaults, then override only what your service needs.

replicas: 2

image:
  repository: ghcr.io/your-org/your-app
  tag: latest
  pullPolicy: IfNotPresent
  pullSecrets: []

rollingUpdate:
  maxSurge: 1
  maxUnavailable: 0

service:
  port: 3000

ingress:
  enabled: true
  className: nginx
  host: app.example.com
  annotations: {}
  tls:
    enabled: true
    issuer: letsencrypt-prod

resources:
  requests: { cpu: 100m, memory: 256Mi }
  limits:   { cpu: 1000m, memory: 1Gi }

autoscaling:
  enabled: true
  min: 2
  max: 10
  targetCPU: 70

pdb:
  enabled: true
  minAvailable: 1

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile: { type: RuntimeDefault }

containerSecurityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop: [ALL]

probes:
  liveness:
    path: /api/health
    initialDelaySeconds: 30
    periodSeconds: 10
  readiness:
    path: /api/health
    initialDelaySeconds: 5
    periodSeconds: 5

monitoring:
  enabled: true
  prometheusServiceMonitor: true
  metricsPath: /api/metrics
  metricsPort: 3000
  interval: 30s
  serviceMonitorLabels:
    release: monitoring

Full reference: Helm-Chart wiki page

Platform components

Every upstream pinned in scripts/install.sh and mirrored in gitops/argocd. Same versions either path.

ComponentPurpose
ingress-nginxLayer-7 ingress controller exposed as a LoadBalancer service. The cluster's only public endpoint.
cert-managerIssues and renews TLS certificates via Let's Encrypt. ClusterIssuer is created by the installer.
kube-prometheus-stackPrometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics. The metrics backbone.
Loki 3.xLog aggregation. Cheap to run because it indexes labels, not log content.
PromtailSidecar-less log shipper. Reads container stdout from the node, ships to Loki.
OpenCostSpend attribution. Queries Prometheus for utilisation, emits cost-per-namespace and cost-per-workload.
AlertmanagerRoutes alerts to Slack via the installer-supplied webhook. PagerDuty or Opsgenie one variable away.

Bundled alert rules

PrometheusRule under manifests/prometheus-rules/app-rules.yaml. Loaded automatically when the kube-prometheus-stack release label matches.

AlertSeverityFires when
KubePodCrashLoopingcriticalA container restarts more than five times in ten minutes
KubePersistentVolumeFillingUpwarningA PV is predicted to run out of space within six hours
IngressNginxHigh5xxRatecriticalIngress 5xx ratio above five percent for five minutes
IngressNginxHighLatencywarningIngress p99 latency above two seconds for five minutes
CertManagerCertificateExpirySoonwarningA certificate has not been renewed within fourteen days of expiry

Slack webhook wired by the installer through manifests/values-alertmanager.yaml.

The install script

The platform install is a single bash script. Every chart version is pinned. Idempotent.

#!/usr/bin/env bash
set -euo pipefail

INGRESS_VERSION=4.11.3
CERT_MANAGER_VERSION=v1.15.3
KPS_VERSION=65.1.0
LOKI_VERSION=6.16.0
PROMTAIL_VERSION=6.16.5
OPENCOST_VERSION=2.4.6

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  --version "$INGRESS_VERSION" -n ingress-nginx --create-namespace

helm upgrade --install cert-manager jetstack/cert-manager \
  --version "$CERT_MANAGER_VERSION" -n cert-manager --create-namespace \
  --set installCRDs=true
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata: { name: letsencrypt-prod }
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ${EMAIL}
    privateKeySecretRef: { name: letsencrypt-prod }
    solvers: [ { http01: { ingress: { class: nginx } } } ]
EOF

helm upgrade --install monitoring prometheus-community/kube-prometheus-stack \
  --version "$KPS_VERSION" -n monitoring --create-namespace \
  -f manifests/values-alertmanager.yaml

helm upgrade --install loki grafana/loki \
  --version "$LOKI_VERSION" -n monitoring -f manifests/values-loki.yaml
helm upgrade --install promtail grafana/promtail \
  --version "$PROMTAIL_VERSION" -n monitoring

helm upgrade --install opencost opencost/opencost \
  --version "$OPENCOST_VERSION" -n monitoring

Truncated for the page. The real script also wires the Slack webhook, waits for cert-manager webhooks, and prints a readiness summary.

GitOps with ArgoCD

Prefer the platform reconciled from git instead of installed by hand? Apply the app-of-apps root once.

rendering
ArgoCD app-of-apps: a single root Application reconciles every platform component from gitops/argocd.

Full GitOps walkthrough: GitOps wiki page

Use cases

What teams actually run this for.

First production cluster

Greenfield team going from "we deploy to Vercel" to "we run our own k8s." Skip the week of yak-shaving on ingress, TLS, autoscaling and metrics.

Adding observability later

Apps already running but no metrics or logs. The installer drops in Prometheus, Loki and Grafana in an afternoon without touching workloads.

Standardising deploys

Pin every Next.js service in your org to the same chart. Consistent probes, consistent autoscaling, consistent alerts, consistent labels.

Cost-controlled SaaS infrastructure

One DigitalOcean cluster hosting an arbitrary number of services. OpenCost surfaces where the spend lives. Predictable bill.

Platform team with multiple Next.js services

Each service installs the chart with its own values file. Helm release name is the unit of isolation. ArgoCD reconciles the platform from git.

Staging environments that look like prod

Same install script, smaller node pool. Real TLS, real metrics, real alerts, a fraction of the spend.

k8s-ops-toolkit vs alternatives

How the toolkit compares to other ways to put a Next.js app on Kubernetes. Honest scope-by-scope.

Featurek8s-ops-toolkitStock Helm + bashBackstagePure ArgoCDVercel
Helm chart for Next.jsYes, opinionatedBuild yourselfVia pluginBring your ownN/A
TLS via cert-managerPinned + wiredManual installOut of scopeManual installManaged
Prometheus + GrafanaPinned + dashboardsManual installOut of scopeManual installManaged
Loki for logsPinnedManual installOut of scopeManual installManaged
OpenCost spendPinned + dashboardManual installOut of scopeManual installLimited
GitOps reconcileArgoCD app-of-appsBring your ownOut of scopeYes, nativeN/A
E2E test suitepytest renders chartNoNoNoN/A
LicenceMITMIT componentsApache 2.0Apache 2.0Commercial
Total time to liveAbout 8 minutesDaysDaysHoursMinutes (managed)

Tech stack

Every piece pinned. No surprise minor-version drift.

Kubernetes 1.31+Helm 3.17ingress-nginxcert-managerPrometheusGrafanaLoki 3.xPromtailAlertmanagerkube-prometheus-stackOpenCostArgoCDpytest + pyyamlShellCheckbash bootstrap

Frequently asked

The questions that come up most often before adoption.

Why not just use Vercel?+

For some teams Vercel is the right answer forever. For others, three or four services on Vercel cost more than a single $70 a month DigitalOcean cluster that hosts an arbitrary number of apps. This toolkit is for the day you cross that line.

Does it lock me into ingress-nginx?+

No. ingress-nginx is the default because it is the controller most teams already run and the one the bundled rules and dashboards target. Swap to Traefik or Contour and the chart still works; you would re-author the ingress-specific alerts and dashboards.

How is this different from Argo, Crossplane, Backstage?+

Those solve much larger problems and bring much heavier machinery. This toolkit is the small platform-layer most teams need. The ArgoCD app-of-apps is an opt-in path, not a replacement for the imperative installer.

Can I run the installer twice?+

Yes. Every step uses helm upgrade --install. The script is idempotent: re-running it converges on the same pinned versions and the same values.

How do I add a custom Grafana dashboard?+

Drop a JSON file into manifests/grafana-dashboards/ and re-run scripts/load-dashboards.sh. The sidecar discovers ConfigMaps with the grafana_dashboard label and loads them on the next reconcile.

What about secrets management?+

The chart supports inline env, individual secret-backed env, and whole-Secret envFrom mounting. Sealed Secrets or External Secrets Operator are documented patterns; neither is pinned by default because the right choice is team-specific.

Does autoscaling on CPU cover real-world Next.js?+

For most workloads, yes. For request-bound services with long-tail latency, the wiki includes a pattern for HPA on requests-per-second sourced from the ServiceMonitor via Prometheus Adapter.

How are upgrades managed?+

Upstream chart versions are pinned in scripts/install.sh and gitops/argocd. Bumping a version is a single edit, a re-run of the installer (or an Argo sync), and the e2e pytest suite to verify the chart still renders cleanly.

Stop yak-shaving the platform

Clone the repo, run the installer, deploy the chart. The same five commands every time, on every cluster.