DevOps · Kubernetes 1.36 · Kustomize · Helm 4.2 · Gateway API 1.6
Kubernetes
Declarative, resourced, probed workloads — safe K8s manifests.
Updated 5 Jul 2026 · CC0
AGENTS.mdrepo rootYou author and operate Kubernetes workload manifests. "Good" here means declarative, version-controlled YAML that a kubectl diff shows converging cleanly: every container has resources, three probes, a non-root hardened securityContext, graceful shutdown, and an availability story (PDB + HPA + spread). No :latest, no root, no secrets in git, no unbounded pods.
Stack
- Kubernetes 1.36 "Haru" — target the N-2 supported window (1.34–1.36). Do not use APIs that GA'd after your minimum target.
- Manifests: Kustomize (built into
kubectl 1.36viakubectl apply -k, or standalonekustomizev5) for env overlays; Helm 4.2 for packaged/redistributed apps. - Helm 4.2.x — installs use Server-Side Apply by default and kstatus readiness gating. Note renamed flags:
--rollback-on-failure(was--atomic),--force-replace(was--force); post-renderers are now Wasm/plugins, not executable paths. Don't lean on the deprecated aliases: they only reliably warn-then-work onupgrade;install --atomicwas dropped and hard-errored (unknown flag: --atomic) until 4.1.3 restored the binding — write the new flag names. - GitOps delivery: Argo CD or Flux reconcile from git. Humans never
kubectl edit/apply/patchprod. - Secrets: External Secrets Operator (ESO) 2.6 syncing from Vault/cloud KMS; or Sealed Secrets for a git-only workflow. Never plaintext
Secretdata in git. - Policy/admission: native ValidatingAdmissionPolicy (CEL, GA) for guardrails; Kyverno 1.18 where mutation/generation is needed. Enforce Pod Security Admission
restrictedper namespace. - Ingress: Gateway API 1.6 (
gateway.networking.k8s.io/v1) for new north-south traffic; legacyIngressonly to match existing infra. - Pinned apiVersions:
apps/v1,batch/v1,autoscaling/v2,policy/v1,networking.k8s.io/v1,rbac.authorization.k8s.io/v1,gateway.networking.k8s.io/v1. Neverextensions/v1beta1,policy/v1beta1,autoscaling/v2beta2. - Tooling in CI:
kubeconform(schema),conftest/OPA or Kyverno CLI (policy),helm-unittest(Helm),kube-score,trivy(image + IaC),yamllint.
Project conventions
- Kustomize layout:
base/holds environment-agnostic resources;overlays/{dev,staging,prod}/patch them. One Kubernetes object per file (deployment.yaml,service.yaml), each listed inkustomization.yaml. - Helm layout:
charts/<app>/{Chart.yaml,values.yaml,templates/}; ship avalues.schema.jsonand lint withhelm lint --strict. PinapiVersion: v2charts and dependency versions inChart.yaml. - Recommended labels on every object — do not invent ad-hoc keys:
labels: app.kubernetes.io/name: payments-api app.kubernetes.io/instance: payments-api-prod app.kubernetes.io/version: "1.8.3" app.kubernetes.io/component: api app.kubernetes.io/part-of: payments app.kubernetes.io/managed-by: argocd - Selectors are immutable — set
spec.selector.matchLabelsto a small stable subset (app.kubernetes.io/name+instance) and never change it; changing it forces delete/recreate. - Every namespaced object sets an explicit
metadata.namespace(or inherits it from the Kustomize/Helm release) — never rely on the caller's current context. - Format with
yamllint(2-space indent, no tabs, no trailing whitespace); keys orderedapiVersion, kind, metadata, spec. - Set annotations for provenance, not config:
kubectl.kubernetes.io/last-applied-configurationis managed by the tool — don't hand-write it.
Declarative & version-controlled
- All cluster state lives in git and flows through Argo CD/Flux.
kubectl edit,kubectl scale,kubectl patch, imperativekubectl createare forbidden in staging/prod — they drift from source of truth. - Prefer Server-Side Apply (
kubectl apply --server-side --field-manager=<tool>) so field ownership is tracked; this is Helm 4's and Argo's default. - Never commit generated artifacts (
helm templateoutput) as the source; commit the chart/overlay and render in CI. - When a controller owns a field (HPA owns
replicas, VPA/in-place ownsresources), omit that field from the manifest or addignoreDifferencesin Argo so GitOps and the controller don't fight.
Deployments & workloads
- Deployment for stateless; StatefulSet for anything needing stable identity, ordered rollout, or per-pod
PersistentVolumeClaim(volumeClaimTemplates); DaemonSet for node agents; Job/CronJob (batch/v1, setbackoffLimit,activeDeadlineSeconds,ttlSecondsAfterFinished) for batch. - Rolling update tuned explicitly; add
minReadySecondsso a pod must stay healthy before it counts as available:spec: revisionHistoryLimit: 5 minReadySeconds: 10 strategy: type: RollingUpdate rollingUpdate: maxSurge: 25% maxUnavailable: 0 # zero-downtime: never drop below desired - Sidecars are native init containers with
restartPolicy: Always(GA) — they start before app containers, run for the pod lifetime, and support probes. Don't bolt sidecars into the maincontainerslist where startup ordering matters. - Pin the image by immutable tag or digest;
imagePullPolicy: IfNotPresentwith a pinned tag:image: registry.example.com/payments-api@sha256:9f2c... # or :1.8.3, never :latest - Set
automountServiceAccountToken: falseat pod level unless the workload calls the API server. - Schedule intentionally:
topologySpreadConstraints,nodeAffinity, andtolerations— notnodeName.
Resources — requests AND limits on every container
- Every container declares both. No container ships without them; enforce a namespace
LimitRangedefault so nothing lands unbounded, and aResourceQuotaper namespace. - Memory: set
requests==limits(Guaranteed QoS for memory) — memory is incompressible; a limit below real usage means OOMKill, no limit means node-pressure eviction of neighbors. - CPU: always set a
request(drives scheduling and HPA). Set a CPUlimittoo, but generous — a tight CPU limit causes CFS throttling and tail-latency spikes; measure before tightening.resources: requests: { cpu: "250m", memory: "512Mi" } limits: { cpu: "1", memory: "512Mi" } - In-place pod resize (
resizePolicy) mutates CPU/memory on a running pod without recreating it — GA in 1.35, but the field is beta and on-by-default since 1.33, so it is accepted across the whole 1.34–1.36 window and is safe to set at the 1.34 floor (only its full stability, e.g. memory-limit decreases, lands at 1.35). This is the one place the "no APIs newer than min-target" rule bends: the field predates the min as beta, so it doesn't drift older-but-in-window clusters.resizePolicy: - resourceName: cpu restartPolicy: NotRequired # apply live to the cgroup, no restart - resourceName: memory restartPolicy: RestartContainer # decrease needs a restart to reclaim
Health probes — liveness + readiness + startup
- Three distinct probes with distinct endpoints. Never point them at the same handler.
startupProbeguards slow boots and disables the other two until it passes. Budget =failureThreshold * periodSeconds. Use it instead of a longinitialDelaySeconds.readinessProbegates Service traffic and rollouts; it MAY check hard dependencies (DB, cache) so a pod that can't serve is pulled from Endpoints.livenessProbechecks only that the process is wedged; it MUST NOT check external dependencies — a DB blip would trigger a cluster-wide restart storm.
startupProbe: { httpGet: { path: /healthz, port: 8080 }, periodSeconds: 5, failureThreshold: 30 } readinessProbe: { httpGet: { path: /readyz, port: 8080 }, periodSeconds: 10, timeoutSeconds: 2, failureThreshold: 3 } livenessProbe: { httpGet: { path: /livez, port: 8080 }, periodSeconds: 15, timeoutSeconds: 2, failureThreshold: 3 } - Prefer
httpGet/grpcoverexec(exec forks a process each period). KeeptimeoutSecondssmall and realistic; leavesuccessThreshold: 1for liveness/startup (only readiness may raise it).
Config & secrets
- App config in
ConfigMap, referenced viaenvFromor projected volumes — never baked into the image, never hardcoded in the manifest. - No plaintext
Secretdata/stringDatain git. Use ESOExternalSecret(source of truth = Vault/AWS/GCP/Azure) or Sealed Secrets (kubeseal-encrypted, safe in a public repo).apiVersion: external-secrets.io/v1 kind: ExternalSecret spec: refreshInterval: 1h secretStoreRef: { name: vault-backend, kind: ClusterSecretStore } target: { name: payments-db } data: - secretKey: password remoteRef: { key: prod/payments/db, property: password } - Mutate
ConfigMap/Secretby rev, not in place: use KustomizeconfigMapGenerator/secretGenerator(content-hash suffix) or a checksum annotation so pods roll on change. In-place edits to a mounted ConfigMap don't restart pods. - Consume secrets as env vars or files; never log them, never pass via image build args.
Graceful shutdown
- Handle SIGTERM in the app: stop accepting new work, drain in-flight requests, close pools, exit 0.
- Set
terminationGracePeriodSeconds>= preStop sleep + max drain time (default is 30):terminationGracePeriodSeconds: 45 lifecycle: preStop: exec: { command: ["/bin/sh","-c","sleep 10"] } # let Endpoints/LB deregister before SIGTERM - The
preStopsleep covers the race where the pod still receives traffic after termination begins because Endpoint removal is asynchronous. Combine withreadinessProbeflipping to not-ready. - For long-running jobs, checkpoint before the grace period expires or Kubernetes sends SIGKILL.
Availability — PDB, HPA, spread, namespaces
- PodDisruptionBudget (
policy/v1) for every HA workload so voluntary disruptions (node drains, upgrades) can't take you to zero:apiVersion: policy/v1 kind: PodDisruptionBudget spec: maxUnavailable: 1 # or minAvailable; NEVER minAvailable == replicas (deadlocks drains) selector: { matchLabels: { app.kubernetes.io/name: payments-api } } - HorizontalPodAutoscaler (
autoscaling/v2) withminReplicas >= 2; addbehaviorstabilization to prevent flapping. Do not also hardcodereplicasin the Deployment — let the HPA own it.spec: minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } } behavior: scaleDown: { stabilizationWindowSeconds: 300 } - Spread replicas across nodes and zones so one failure domain can't take the service down:
topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: { matchLabels: { app.kubernetes.io/name: payments-api } } - One namespace per team/environment with
ResourceQuota,LimitRange, and the PSArestrictedlabel. Never deploy todefault.
Testing
- In CI, before merge:
yamllint→kustomize build overlays/prod | kubeconform -strict -summary(schema, including CRDs via-schema-location) → policy check (conftest testagainst OPA/Rego, orkyverno test) →kube-score score(probe/resource/security lint) →trivy configandtrivy image(misconfig + CVE). - Helm charts:
helm lint --strict,helm-unittestfor template assertions (asserts rendered manifests for givenvalues), andhelm template ... | conftest test -for policy on rendered output. - Cluster smoke tests:
helm testhooks or a post-sync Job; verify the readiness endpoint and one real request path. - Pre-deploy dry run:
kubectl apply --server-side --dry-run=servercatches admission/CRD/quota rejections that offline validation misses. - Assert the invariants a reviewer would: probes present, resources present,
runAsNonRoot: true, no:latest, PDB exists for HA. Encode these as policies so they fail the pipeline, not the on-call.
Security
- Pod/container
securityContext— the non-negotiable baseline (satisfies PSArestricted):
Give writable paths assecurityContext: # pod runAsNonRoot: true runAsUser: 10001 fsGroup: 10001 seccompProfile: { type: RuntimeDefault } containers: - name: api securityContext: # container allowPrivilegeEscalation: false readOnlyRootFilesystem: true privileged: false capabilities: { drop: ["ALL"] }emptyDirvolumes whenreadOnlyRootFilesystem: true. Neverprivileged, neverhostNetwork/hostPID/hostIPC, neverhostPathfor app data. - RBAC least privilege: a dedicated
ServiceAccountper workload; scope withRole/RoleBinding(namespaced) overClusterRole; never bind tocluster-admin; never grant"*"verbs/resources orsecretslistcluster-wide. - NetworkPolicy default-deny, then allow explicitly. A namespace with no policy allows all traffic:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy spec: podSelector: {} policyTypes: [Ingress, Egress] # deny-all; pair with explicit allow rules - Enforce the above cluster-wide with PSA
restrictedlabels plus ValidatingAdmissionPolicy/Kyverno so a bad manifest is rejected at admission, not caught in review. - Scan images for CVEs (
trivy) and pin by digest; run on a distroless/minimal base; drop shells from prod images.
Do
- Set requests+limits, three probes, and a hardened
securityContexton every container. - Pin images by digest or immutable tag; render manifests in CI and reconcile via GitOps.
- Give each workload its own
ServiceAccountand a default-denyNetworkPolicy. - Ship a PDB and an HPA (
minReplicas >= 2) with zone/node spread for anything user-facing. - Handle SIGTERM, add a
preStopsleep, and sizeterminationGracePeriodSecondsto the real drain time. - Store secrets in ESO/Sealed Secrets; roll pods on config change via content-hash
configMapGenerator. - Validate with
kubeconform+ policy +kube-score+trivybefore merge;--dry-run=serverbefore deploy.
Avoid
image: ...:latestor unpinned tags → pin@sha256:or a semver tag.- Missing
resourcesor CPU/memlimits→ enforce aLimitRange; memoryrequest == limit. - No probes, or liveness pointed at a database → liveness = process health only; readiness gates traffic and may check deps.
runAsRoot/privileged/hostPath/hostNetwork→runAsNonRoot, dropALLcaps,readOnlyRootFilesystem.- Plaintext
Secretin git → ESO or Sealed Secrets. - HA workload with no PDB, or
minAvailable == replicas→maxUnavailable: 1. - Deprecated APIs:
extensions/v1beta1Deployment/Ingress,policy/v1beta1PDB,autoscaling/v2beta*HPA → thev1/v2GA versions. kubectl edit/scale/patchin prod, or committinghelm templateoutput as source → GitOps from the chart/overlay.- Hardcoding
replicaswhile an HPA manages it → omit the field orignoreDifferences. - Helm 3 muscle memory:
--atomic/--force→--rollback-on-failure/--force-replace; post-renderer paths → plugins.
When you code
- Keep diffs small and reviewable — one workload or concern per change; show the rendered
kustomize build/helm templatediff, not just the source. - After every change run, in order:
yamllint→kubeconform -strict→ policy check (conftest/kyverno test) →kube-score→helm-unittest(if Helm). Paste the failures you fixed. - Before proposing to apply, run
kubectl apply --server-side --dry-run=serverand report what admission accepted/rejected. - Ask before: choosing Deployment vs StatefulSet when persistence is ambiguous; setting a CPU limit that risks throttling a latency-sensitive service; picking ESO vs Sealed Secrets; touching RBAC scope, NetworkPolicy, or PSA level; changing a Service selector or a StatefulSet
volumeClaimTemplate(both are destructive). - Never widen RBAC, disable a probe, or drop a security field "to make it deploy" — fix the manifest so it passes admission.
- State the target Kubernetes minor and confirm every API/field you use is GA at that version before writing it.
Drop it in your repo
Save these rules as AGENTS.md, CLAUDE.md, .cursorrules, .windsurfrules or .github/copilot-instructions.md — your agent instantly codes to the same standard on Kubernetes 1.36 · Kustomize · Helm 4.2 · Gateway API 1.6.