Promptheus/rules53 rule sets · CC0Promptheus hub ↗

DevOps · Kubernetes 1.36 · Kustomize · Helm 4.2 · Gateway API 1.6

Kubernetes

Declarative, resourced, probed workloads — safe K8s manifests.

kubernetesk8sdevops

Updated 5 Jul 2026 · CC0

AGENTS.mdrepo root

You author and operate Kubernetes workload manifests. "Good" here means declarative, version-controlled YAML that a kubectl diff shows converging cleanly: every container has resources, three probes, a non-root hardened securityContext, graceful shutdown, and an availability story (PDB + HPA + spread). No :latest, no root, no secrets in git, no unbounded pods.

Stack

  • Kubernetes 1.36 "Haru" — target the N-2 supported window (1.34–1.36). Do not use APIs that GA'd after your minimum target.
  • Manifests: Kustomize (built into kubectl 1.36 via kubectl apply -k, or standalone kustomize v5) for env overlays; Helm 4.2 for packaged/redistributed apps.
  • Helm 4.2.x — installs use Server-Side Apply by default and kstatus readiness gating. Note renamed flags: --rollback-on-failure (was --atomic), --force-replace (was --force); post-renderers are now Wasm/plugins, not executable paths. Don't lean on the deprecated aliases: they only reliably warn-then-work on upgrade; install --atomic was dropped and hard-errored (unknown flag: --atomic) until 4.1.3 restored the binding — write the new flag names.
  • GitOps delivery: Argo CD or Flux reconcile from git. Humans never kubectl edit/apply/patch prod.
  • Secrets: External Secrets Operator (ESO) 2.6 syncing from Vault/cloud KMS; or Sealed Secrets for a git-only workflow. Never plaintext Secret data in git.
  • Policy/admission: native ValidatingAdmissionPolicy (CEL, GA) for guardrails; Kyverno 1.18 where mutation/generation is needed. Enforce Pod Security Admission restricted per namespace.
  • Ingress: Gateway API 1.6 (gateway.networking.k8s.io/v1) for new north-south traffic; legacy Ingress only to match existing infra.
  • Pinned apiVersions: apps/v1, batch/v1, autoscaling/v2, policy/v1, networking.k8s.io/v1, rbac.authorization.k8s.io/v1, gateway.networking.k8s.io/v1. Never extensions/v1beta1, policy/v1beta1, autoscaling/v2beta2.
  • Tooling in CI: kubeconform (schema), conftest/OPA or Kyverno CLI (policy), helm-unittest (Helm), kube-score, trivy (image + IaC), yamllint.

Project conventions

  • Kustomize layout: base/ holds environment-agnostic resources; overlays/{dev,staging,prod}/ patch them. One Kubernetes object per file (deployment.yaml, service.yaml), each listed in kustomization.yaml.
  • Helm layout: charts/<app>/{Chart.yaml,values.yaml,templates/}; ship a values.schema.json and lint with helm lint --strict. Pin apiVersion: v2 charts and dependency versions in Chart.yaml.
  • Recommended labels on every object — do not invent ad-hoc keys:
    labels:
      app.kubernetes.io/name: payments-api
      app.kubernetes.io/instance: payments-api-prod
      app.kubernetes.io/version: "1.8.3"
      app.kubernetes.io/component: api
      app.kubernetes.io/part-of: payments
      app.kubernetes.io/managed-by: argocd
    
  • Selectors are immutable — set spec.selector.matchLabels to a small stable subset (app.kubernetes.io/name + instance) and never change it; changing it forces delete/recreate.
  • Every namespaced object sets an explicit metadata.namespace (or inherits it from the Kustomize/Helm release) — never rely on the caller's current context.
  • Format with yamllint (2-space indent, no tabs, no trailing whitespace); keys ordered apiVersion, kind, metadata, spec.
  • Set annotations for provenance, not config: kubectl.kubernetes.io/last-applied-configuration is managed by the tool — don't hand-write it.

Declarative & version-controlled

  • All cluster state lives in git and flows through Argo CD/Flux. kubectl edit, kubectl scale, kubectl patch, imperative kubectl create are forbidden in staging/prod — they drift from source of truth.
  • Prefer Server-Side Apply (kubectl apply --server-side --field-manager=<tool>) so field ownership is tracked; this is Helm 4's and Argo's default.
  • Never commit generated artifacts (helm template output) as the source; commit the chart/overlay and render in CI.
  • When a controller owns a field (HPA owns replicas, VPA/in-place owns resources), omit that field from the manifest or add ignoreDifferences in Argo so GitOps and the controller don't fight.

Deployments & workloads

  • Deployment for stateless; StatefulSet for anything needing stable identity, ordered rollout, or per-pod PersistentVolumeClaim (volumeClaimTemplates); DaemonSet for node agents; Job/CronJob (batch/v1, set backoffLimit, activeDeadlineSeconds, ttlSecondsAfterFinished) for batch.
  • Rolling update tuned explicitly; add minReadySeconds so a pod must stay healthy before it counts as available:
    spec:
      revisionHistoryLimit: 5
      minReadySeconds: 10
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 0   # zero-downtime: never drop below desired
    
  • Sidecars are native init containers with restartPolicy: Always (GA) — they start before app containers, run for the pod lifetime, and support probes. Don't bolt sidecars into the main containers list where startup ordering matters.
  • Pin the image by immutable tag or digest; imagePullPolicy: IfNotPresent with a pinned tag:
    image: registry.example.com/payments-api@sha256:9f2c...   # or :1.8.3, never :latest
    
  • Set automountServiceAccountToken: false at pod level unless the workload calls the API server.
  • Schedule intentionally: topologySpreadConstraints, nodeAffinity, and tolerations — not nodeName.

Resources — requests AND limits on every container

  • Every container declares both. No container ships without them; enforce a namespace LimitRange default so nothing lands unbounded, and a ResourceQuota per namespace.
  • Memory: set requests == limits (Guaranteed QoS for memory) — memory is incompressible; a limit below real usage means OOMKill, no limit means node-pressure eviction of neighbors.
  • CPU: always set a request (drives scheduling and HPA). Set a CPU limit too, but generous — a tight CPU limit causes CFS throttling and tail-latency spikes; measure before tightening.
    resources:
      requests: { cpu: "250m", memory: "512Mi" }
      limits:   { cpu: "1",    memory: "512Mi" }
    
  • In-place pod resize (resizePolicy) mutates CPU/memory on a running pod without recreating it — GA in 1.35, but the field is beta and on-by-default since 1.33, so it is accepted across the whole 1.34–1.36 window and is safe to set at the 1.34 floor (only its full stability, e.g. memory-limit decreases, lands at 1.35). This is the one place the "no APIs newer than min-target" rule bends: the field predates the min as beta, so it doesn't drift older-but-in-window clusters.
    resizePolicy:
      - resourceName: cpu
        restartPolicy: NotRequired      # apply live to the cgroup, no restart
      - resourceName: memory
        restartPolicy: RestartContainer # decrease needs a restart to reclaim
    

Health probes — liveness + readiness + startup

  • Three distinct probes with distinct endpoints. Never point them at the same handler.
    • startupProbe guards slow boots and disables the other two until it passes. Budget = failureThreshold * periodSeconds. Use it instead of a long initialDelaySeconds.
    • readinessProbe gates Service traffic and rollouts; it MAY check hard dependencies (DB, cache) so a pod that can't serve is pulled from Endpoints.
    • livenessProbe checks only that the process is wedged; it MUST NOT check external dependencies — a DB blip would trigger a cluster-wide restart storm.
    startupProbe:   { httpGet: { path: /healthz, port: 8080 }, periodSeconds: 5,  failureThreshold: 30 }
    readinessProbe: { httpGet: { path: /readyz,  port: 8080 }, periodSeconds: 10, timeoutSeconds: 2, failureThreshold: 3 }
    livenessProbe:  { httpGet: { path: /livez,   port: 8080 }, periodSeconds: 15, timeoutSeconds: 2, failureThreshold: 3 }
    
  • Prefer httpGet/grpc over exec (exec forks a process each period). Keep timeoutSeconds small and realistic; leave successThreshold: 1 for liveness/startup (only readiness may raise it).

Config & secrets

  • App config in ConfigMap, referenced via envFrom or projected volumes — never baked into the image, never hardcoded in the manifest.
  • No plaintext Secret data/stringData in git. Use ESO ExternalSecret (source of truth = Vault/AWS/GCP/Azure) or Sealed Secrets (kubeseal-encrypted, safe in a public repo).
    apiVersion: external-secrets.io/v1
    kind: ExternalSecret
    spec:
      refreshInterval: 1h
      secretStoreRef: { name: vault-backend, kind: ClusterSecretStore }
      target: { name: payments-db }
      data:
        - secretKey: password
          remoteRef: { key: prod/payments/db, property: password }
    
  • Mutate ConfigMap/Secret by rev, not in place: use Kustomize configMapGenerator/secretGenerator (content-hash suffix) or a checksum annotation so pods roll on change. In-place edits to a mounted ConfigMap don't restart pods.
  • Consume secrets as env vars or files; never log them, never pass via image build args.

Graceful shutdown

  • Handle SIGTERM in the app: stop accepting new work, drain in-flight requests, close pools, exit 0.
  • Set terminationGracePeriodSeconds >= preStop sleep + max drain time (default is 30):
    terminationGracePeriodSeconds: 45
    lifecycle:
      preStop:
        exec: { command: ["/bin/sh","-c","sleep 10"] }   # let Endpoints/LB deregister before SIGTERM
    
  • The preStop sleep covers the race where the pod still receives traffic after termination begins because Endpoint removal is asynchronous. Combine with readinessProbe flipping to not-ready.
  • For long-running jobs, checkpoint before the grace period expires or Kubernetes sends SIGKILL.

Availability — PDB, HPA, spread, namespaces

  • PodDisruptionBudget (policy/v1) for every HA workload so voluntary disruptions (node drains, upgrades) can't take you to zero:
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    spec:
      maxUnavailable: 1        # or minAvailable; NEVER minAvailable == replicas (deadlocks drains)
      selector: { matchLabels: { app.kubernetes.io/name: payments-api } }
    
  • HorizontalPodAutoscaler (autoscaling/v2) with minReplicas >= 2; add behavior stabilization to prevent flapping. Do not also hardcode replicas in the Deployment — let the HPA own it.
    spec:
      minReplicas: 3
      maxReplicas: 20
      metrics:
        - type: Resource
          resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
      behavior:
        scaleDown: { stabilizationWindowSeconds: 300 }
    
  • Spread replicas across nodes and zones so one failure domain can't take the service down:
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector: { matchLabels: { app.kubernetes.io/name: payments-api } }
    
  • One namespace per team/environment with ResourceQuota, LimitRange, and the PSA restricted label. Never deploy to default.

Testing

  • In CI, before merge: yamllintkustomize build overlays/prod | kubeconform -strict -summary (schema, including CRDs via -schema-location) → policy check (conftest test against OPA/Rego, or kyverno test) → kube-score score (probe/resource/security lint) → trivy config and trivy image (misconfig + CVE).
  • Helm charts: helm lint --strict, helm-unittest for template assertions (asserts rendered manifests for given values), and helm template ... | conftest test - for policy on rendered output.
  • Cluster smoke tests: helm test hooks or a post-sync Job; verify the readiness endpoint and one real request path.
  • Pre-deploy dry run: kubectl apply --server-side --dry-run=server catches admission/CRD/quota rejections that offline validation misses.
  • Assert the invariants a reviewer would: probes present, resources present, runAsNonRoot: true, no :latest, PDB exists for HA. Encode these as policies so they fail the pipeline, not the on-call.

Security

  • Pod/container securityContext — the non-negotiable baseline (satisfies PSA restricted):
    securityContext:            # pod
      runAsNonRoot: true
      runAsUser: 10001
      fsGroup: 10001
      seccompProfile: { type: RuntimeDefault }
    containers:
      - name: api
        securityContext:        # container
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          privileged: false
          capabilities: { drop: ["ALL"] }
    
    Give writable paths as emptyDir volumes when readOnlyRootFilesystem: true. Never privileged, never hostNetwork/hostPID/hostIPC, never hostPath for app data.
  • RBAC least privilege: a dedicated ServiceAccount per workload; scope with Role/RoleBinding (namespaced) over ClusterRole; never bind to cluster-admin; never grant "*" verbs/resources or secrets list cluster-wide.
  • NetworkPolicy default-deny, then allow explicitly. A namespace with no policy allows all traffic:
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    spec:
      podSelector: {}
      policyTypes: [Ingress, Egress]   # deny-all; pair with explicit allow rules
    
  • Enforce the above cluster-wide with PSA restricted labels plus ValidatingAdmissionPolicy/Kyverno so a bad manifest is rejected at admission, not caught in review.
  • Scan images for CVEs (trivy) and pin by digest; run on a distroless/minimal base; drop shells from prod images.

Do

  • Set requests+limits, three probes, and a hardened securityContext on every container.
  • Pin images by digest or immutable tag; render manifests in CI and reconcile via GitOps.
  • Give each workload its own ServiceAccount and a default-deny NetworkPolicy.
  • Ship a PDB and an HPA (minReplicas >= 2) with zone/node spread for anything user-facing.
  • Handle SIGTERM, add a preStop sleep, and size terminationGracePeriodSeconds to the real drain time.
  • Store secrets in ESO/Sealed Secrets; roll pods on config change via content-hash configMapGenerator.
  • Validate with kubeconform + policy + kube-score + trivy before merge; --dry-run=server before deploy.

Avoid

  • image: ...:latest or unpinned tags → pin @sha256: or a semver tag.
  • Missing resources or CPU/mem limits → enforce a LimitRange; memory request == limit.
  • No probes, or liveness pointed at a database → liveness = process health only; readiness gates traffic and may check deps.
  • runAsRoot/privileged/hostPath/hostNetworkrunAsNonRoot, drop ALL caps, readOnlyRootFilesystem.
  • Plaintext Secret in git → ESO or Sealed Secrets.
  • HA workload with no PDB, or minAvailable == replicasmaxUnavailable: 1.
  • Deprecated APIs: extensions/v1beta1 Deployment/Ingress, policy/v1beta1 PDB, autoscaling/v2beta* HPA → the v1/v2 GA versions.
  • kubectl edit/scale/patch in prod, or committing helm template output as source → GitOps from the chart/overlay.
  • Hardcoding replicas while an HPA manages it → omit the field or ignoreDifferences.
  • Helm 3 muscle memory: --atomic/--force--rollback-on-failure/--force-replace; post-renderer paths → plugins.

When you code

  • Keep diffs small and reviewable — one workload or concern per change; show the rendered kustomize build / helm template diff, not just the source.
  • After every change run, in order: yamllintkubeconform -strict → policy check (conftest/kyverno test) → kube-scorehelm-unittest (if Helm). Paste the failures you fixed.
  • Before proposing to apply, run kubectl apply --server-side --dry-run=server and report what admission accepted/rejected.
  • Ask before: choosing Deployment vs StatefulSet when persistence is ambiguous; setting a CPU limit that risks throttling a latency-sensitive service; picking ESO vs Sealed Secrets; touching RBAC scope, NetworkPolicy, or PSA level; changing a Service selector or a StatefulSet volumeClaimTemplate (both are destructive).
  • Never widen RBAC, disable a probe, or drop a security field "to make it deploy" — fix the manifest so it passes admission.
  • State the target Kubernetes minor and confirm every API/field you use is GA at that version before writing it.

Drop it in your repo

Save these rules as AGENTS.md, CLAUDE.md, .cursorrules, .windsurfrules or .github/copilot-instructions.md — your agent instantly codes to the same standard on Kubernetes 1.36 · Kustomize · Helm 4.2 · Gateway API 1.6.

Back to top ↑