Kubernetes — AI agent rules (AGENTS.md, CLAUDE.md, .cursorrules)

You author and operate Kubernetes workload manifests. "Good" here means declarative, version-controlled YAML that a kubectl diff shows converging cleanly: every container has resources, three probes, a non-root hardened securityContext, graceful shutdown, and an availability story (PDB + HPA + spread). No :latest, no root, no secrets in git, no unbounded pods.

Stack

Kubernetes 1.36 "Haru" — target the N-2 supported window (1.34–1.36). Do not use APIs that GA'd after your minimum target.
Manifests: Kustomize (built into kubectl 1.36 via kubectl apply -k, or standalone kustomize v5) for env overlays; Helm 4.2 for packaged/redistributed apps.
Helm 4.2.x — installs use Server-Side Apply by default and kstatus readiness gating. Note renamed flags: --rollback-on-failure (was --atomic), --force-replace (was --force); post-renderers are now Wasm/plugins, not executable paths. Don't lean on the deprecated aliases: they only reliably warn-then-work on upgrade; install --atomic was dropped and hard-errored (unknown flag: --atomic) until 4.1.3 restored the binding — write the new flag names.
GitOps delivery: Argo CD or Flux reconcile from git. Humans never kubectl edit/apply/patch prod.
Secrets: External Secrets Operator (ESO) 2.6 syncing from Vault/cloud KMS; or Sealed Secrets for a git-only workflow. Never plaintext Secret data in git.
Policy/admission: native ValidatingAdmissionPolicy (CEL, GA) for guardrails; Kyverno 1.18 where mutation/generation is needed. Enforce Pod Security Admission restricted per namespace.
Ingress: Gateway API 1.6 (gateway.networking.k8s.io/v1) for new north-south traffic; legacy Ingress only to match existing infra.
Pinned apiVersions: apps/v1, batch/v1, autoscaling/v2, policy/v1, networking.k8s.io/v1, rbac.authorization.k8s.io/v1, gateway.networking.k8s.io/v1. Never extensions/v1beta1, policy/v1beta1, autoscaling/v2beta2.
Tooling in CI: kubeconform (schema), conftest/OPA or Kyverno CLI (policy), helm-unittest (Helm), kube-score, trivy (image + IaC), yamllint.

Project conventions

Kustomize layout: base/ holds environment-agnostic resources; overlays/{dev,staging,prod}/ patch them. One Kubernetes object per file (deployment.yaml, service.yaml), each listed in kustomization.yaml.
Helm layout: charts/<app>/{Chart.yaml,values.yaml,templates/}; ship a values.schema.json and lint with helm lint --strict. Pin apiVersion: v2 charts and dependency versions in Chart.yaml.

Recommended labels on every object — do not invent ad-hoc keys:

labels:
  app.kubernetes.io/name: payments-api
  app.kubernetes.io/instance: payments-api-prod
  app.kubernetes.io/version: "1.8.3"
  app.kubernetes.io/component: api
  app.kubernetes.io/part-of: payments
  app.kubernetes.io/managed-by: argocd

Selectors are immutable — set spec.selector.matchLabels to a small stable subset (app.kubernetes.io/name + instance) and never change it; changing it forces delete/recreate.
Every namespaced object sets an explicit metadata.namespace (or inherits it from the Kustomize/Helm release) — never rely on the caller's current context.
Format with yamllint (2-space indent, no tabs, no trailing whitespace); keys ordered apiVersion, kind, metadata, spec.
Set annotations for provenance, not config: kubectl.kubernetes.io/last-applied-configuration is managed by the tool — don't hand-write it.

Declarative & version-controlled

All cluster state lives in git and flows through Argo CD/Flux. kubectl edit, kubectl scale, kubectl patch, imperative kubectl create are forbidden in staging/prod — they drift from source of truth.
Prefer Server-Side Apply (kubectl apply --server-side --field-manager=<tool>) so field ownership is tracked; this is Helm 4's and Argo's default.
Never commit generated artifacts (helm template output) as the source; commit the chart/overlay and render in CI.
When a controller owns a field (HPA owns replicas, VPA/in-place owns resources), omit that field from the manifest or add ignoreDifferences in Argo so GitOps and the controller don't fight.

Deployments & workloads

Deployment for stateless; StatefulSet for anything needing stable identity, ordered rollout, or per-pod PersistentVolumeClaim (volumeClaimTemplates); DaemonSet for node agents; Job/CronJob (batch/v1, set backoffLimit, activeDeadlineSeconds, ttlSecondsAfterFinished) for batch.

Rolling update tuned explicitly; add minReadySeconds so a pod must stay healthy before it counts as available:

spec:
  revisionHistoryLimit: 5
  minReadySeconds: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0   # zero-downtime: never drop below desired

Sidecars are native init containers with restartPolicy: Always (GA) — they start before app containers, run for the pod lifetime, and support probes. Don't bolt sidecars into the main containers list where startup ordering matters.
Pin the image by immutable tag or digest; imagePullPolicy: IfNotPresent with a pinned tag:
```
image: registry.example.com/payments-api@sha256:9f2c...   # or :1.8.3, never :latest
```
Set automountServiceAccountToken: false at pod level unless the workload calls the API server.
Schedule intentionally: topologySpreadConstraints, nodeAffinity, and tolerations — not nodeName.

Resources — requests AND limits on every container

Every container declares both. No container ships without them; enforce a namespace LimitRange default so nothing lands unbounded, and a ResourceQuota per namespace.
Memory: set requests == limits (Guaranteed QoS for memory) — memory is incompressible; a limit below real usage means OOMKill, no limit means node-pressure eviction of neighbors.
CPU: always set a request (drives scheduling and HPA). Set a CPU limit too, but generous — a tight CPU limit causes CFS throttling and tail-latency spikes; measure before tightening.
```
resources:
  requests: { cpu: "250m", memory: "512Mi" }
  limits:   { cpu: "1",    memory: "512Mi" }
```
In-place pod resize (resizePolicy) mutates CPU/memory on a running pod without recreating it — GA in 1.35, but the field is beta and on-by-default since 1.33, so it is accepted across the whole 1.34–1.36 window and is safe to set at the 1.34 floor (only its full stability, e.g. memory-limit decreases, lands at 1.35). This is the one place the "no APIs newer than min-target" rule bends: the field predates the min as beta, so it doesn't drift older-but-in-window clusters.
```
resizePolicy:
  - resourceName: cpu
    restartPolicy: NotRequired      # apply live to the cgroup, no restart
  - resourceName: memory
    restartPolicy: RestartContainer # decrease needs a restart to reclaim
```

Health probes — liveness + readiness + startup

Three distinct probes with distinct endpoints. Never point them at the same handler.
- startupProbe guards slow boots and disables the other two until it passes. Budget = failureThreshold * periodSeconds. Use it instead of a long initialDelaySeconds.
- readinessProbe gates Service traffic and rollouts; it MAY check hard dependencies (DB, cache) so a pod that can't serve is pulled from Endpoints.
- livenessProbe checks only that the process is wedged; it MUST NOT check external dependencies — a DB blip would trigger a cluster-wide restart storm.
```
startupProbe:   { httpGet: { path: /healthz, port: 8080 }, periodSeconds: 5,  failureThreshold: 30 }
readinessProbe: { httpGet: { path: /readyz,  port: 8080 }, periodSeconds: 10, timeoutSeconds: 2, failureThreshold: 3 }
livenessProbe:  { httpGet: { path: /livez,   port: 8080 }, periodSeconds: 15, timeoutSeconds: 2, failureThreshold: 3 }
```
Prefer httpGet/grpc over exec (exec forks a process each period). Keep timeoutSeconds small and realistic; leave successThreshold: 1 for liveness/startup (only readiness may raise it).

Config & secrets

App config in ConfigMap, referenced via envFrom or projected volumes — never baked into the image, never hardcoded in the manifest.

No plaintext Secret data/stringData in git. Use ESO ExternalSecret (source of truth = Vault/AWS/GCP/Azure) or Sealed Secrets (kubeseal-encrypted, safe in a public repo).

apiVersion: external-secrets.io/v1
kind: ExternalSecret
spec:
  refreshInterval: 1h
  secretStoreRef: { name: vault-backend, kind: ClusterSecretStore }
  target: { name: payments-db }
  data:
    - secretKey: password
      remoteRef: { key: prod/payments/db, property: password }

Mutate ConfigMap/Secret by rev, not in place: use Kustomize configMapGenerator/secretGenerator (content-hash suffix) or a checksum annotation so pods roll on change. In-place edits to a mounted ConfigMap don't restart pods.
Consume secrets as env vars or files; never log them, never pass via image build args.

Graceful shutdown

Handle SIGTERM in the app: stop accepting new work, drain in-flight requests, close pools, exit 0.

Set terminationGracePeriodSeconds >= preStop sleep + max drain time (default is 30):

terminationGracePeriodSeconds: 45
lifecycle:
  preStop:
    exec: { command: ["/bin/sh","-c","sleep 10"] }   # let Endpoints/LB deregister before SIGTERM

The preStop sleep covers the race where the pod still receives traffic after termination begins because Endpoint removal is asynchronous. Combine with readinessProbe flipping to not-ready.
For long-running jobs, checkpoint before the grace period expires or Kubernetes sends SIGKILL.

Availability — PDB, HPA, spread, namespaces

PodDisruptionBudget (policy/v1) for every HA workload so voluntary disruptions (node drains, upgrades) can't take you to zero:

apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  maxUnavailable: 1        # or minAvailable; NEVER minAvailable == replicas (deadlocks drains)
  selector: { matchLabels: { app.kubernetes.io/name: payments-api } }

HorizontalPodAutoscaler (autoscaling/v2) with minReplicas >= 2; add behavior stabilization to prevent flapping. Do not also hardcode replicas in the Deployment — let the HPA own it.

spec:
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
  behavior:
    scaleDown: { stabilizationWindowSeconds: 300 }

Spread replicas across nodes and zones so one failure domain can't take the service down:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector: { matchLabels: { app.kubernetes.io/name: payments-api } }

One namespace per team/environment with ResourceQuota, LimitRange, and the PSA restricted label. Never deploy to default.

Testing

In CI, before merge: yamllint → kustomize build overlays/prod | kubeconform -strict -summary (schema, including CRDs via -schema-location) → policy check (conftest test against OPA/Rego, or kyverno test) → kube-score score (probe/resource/security lint) → trivy config and trivy image (misconfig + CVE).
Helm charts: helm lint --strict, helm-unittest for template assertions (asserts rendered manifests for given values), and helm template ... | conftest test - for policy on rendered output.
Cluster smoke tests: helm test hooks or a post-sync Job; verify the readiness endpoint and one real request path.
Pre-deploy dry run: kubectl apply --server-side --dry-run=server catches admission/CRD/quota rejections that offline validation misses.
Assert the invariants a reviewer would: probes present, resources present, runAsNonRoot: true, no :latest, PDB exists for HA. Encode these as policies so they fail the pipeline, not the on-call.

Security

Pod/container securityContext — the non-negotiable baseline (satisfies PSA restricted):

securityContext:            # pod
  runAsNonRoot: true
  runAsUser: 10001
  fsGroup: 10001
  seccompProfile: { type: RuntimeDefault }
containers:
  - name: api
    securityContext:        # container
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      privileged: false
      capabilities: { drop: ["ALL"] }

Give writable paths as emptyDir volumes when readOnlyRootFilesystem: true. Never privileged, never hostNetwork/hostPID/hostIPC, never hostPath for app data.

RBAC least privilege: a dedicated ServiceAccount per workload; scope with Role/RoleBinding (namespaced) over ClusterRole; never bind to cluster-admin; never grant "*" verbs/resources or secrets list cluster-wide.

NetworkPolicy default-deny, then allow explicitly. A namespace with no policy allows all traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]   # deny-all; pair with explicit allow rules

Enforce the above cluster-wide with PSA restricted labels plus ValidatingAdmissionPolicy/Kyverno so a bad manifest is rejected at admission, not caught in review.
Scan images for CVEs (trivy) and pin by digest; run on a distroless/minimal base; drop shells from prod images.

Do

Set requests+limits, three probes, and a hardened securityContext on every container.
Pin images by digest or immutable tag; render manifests in CI and reconcile via GitOps.
Give each workload its own ServiceAccount and a default-deny NetworkPolicy.
Ship a PDB and an HPA (minReplicas >= 2) with zone/node spread for anything user-facing.
Handle SIGTERM, add a preStop sleep, and size terminationGracePeriodSeconds to the real drain time.
Store secrets in ESO/Sealed Secrets; roll pods on config change via content-hash configMapGenerator.
Validate with kubeconform + policy + kube-score + trivy before merge; --dry-run=server before deploy.

Avoid

image: ...:latest or unpinned tags → pin @sha256: or a semver tag.
Missing resources or CPU/mem limits → enforce a LimitRange; memory request == limit.
No probes, or liveness pointed at a database → liveness = process health only; readiness gates traffic and may check deps.
runAsRoot/privileged/hostPath/hostNetwork → runAsNonRoot, drop ALL caps, readOnlyRootFilesystem.
Plaintext Secret in git → ESO or Sealed Secrets.
HA workload with no PDB, or minAvailable == replicas → maxUnavailable: 1.
Deprecated APIs: extensions/v1beta1 Deployment/Ingress, policy/v1beta1 PDB, autoscaling/v2beta* HPA → the v1/v2 GA versions.
kubectl edit/scale/patch in prod, or committing helm template output as source → GitOps from the chart/overlay.
Hardcoding replicas while an HPA manages it → omit the field or ignoreDifferences.
Helm 3 muscle memory: --atomic/--force → --rollback-on-failure/--force-replace; post-renderer paths → plugins.

When you code

Keep diffs small and reviewable — one workload or concern per change; show the rendered kustomize build / helm template diff, not just the source.
After every change run, in order: yamllint → kubeconform -strict → policy check (conftest/kyverno test) → kube-score → helm-unittest (if Helm). Paste the failures you fixed.
Before proposing to apply, run kubectl apply --server-side --dry-run=server and report what admission accepted/rejected.
Ask before: choosing Deployment vs StatefulSet when persistence is ambiguous; setting a CPU limit that risks throttling a latency-sensitive service; picking ESO vs Sealed Secrets; touching RBAC scope, NetworkPolicy, or PSA level; changing a Service selector or a StatefulSet volumeClaimTemplate (both are destructive).
Never widen RBAC, disable a probe, or drop a security field "to make it deploy" — fix the manifest so it passes admission.
State the target Kubernetes minor and confirm every API/field you use is GA at that version before writing it.