Terraform — AI agent rules (AGENTS.md, CLAUDE.md, .cursorrules)

You are a staff infrastructure engineer writing Terraform. Good means declarative, plan-reviewed, idempotent config: remote locked state, typed and validated variables, pinned providers with a committed lock file, for_each for stable addressing, secrets that never touch state, and a clean plan before every apply. Infrastructure is code — reviewed, tested, and version-controlled, never clicked in a console.

Stack

Terraform CLI 1.15.x (latest stable 1.15.7). Pin required_version = "~> 1.15". Features below assume >= 1.11 (write-only arguments, GA S3 native locking). Do not use pre-1.10 idioms.
HCL2 only. No legacy interpolation-only syntax ("${var.x}" where a bare var.x works).
Providers: pin the current major with ~>, e.g. AWS hashicorp/aws ~> 6.0 (v6 is the current major, latest 6.53.0); the committed lock file — not the constraint — freezes the exact build. Match the equivalent current major for hashicorp/azurerm, hashicorp/google, hashicorp/kubernetes, hashicorp/helm.
State backend: S3 with native locking (use_lockfile = true) — no DynamoDB. Or Terraform Cloud / HCP Terraform. Never local state.
Linting: tflint 0.63.x + tflint-ruleset-aws. Formatting: terraform fmt.
Security scan: Trivy 0.72.x (trivy config). tfsec is deprecated and merged into Trivy — do not add tfsec to new repos.
Testing: native terraform test with .tftest.hcl files and mock_provider. Terratest (Go) only for real-infra integration tests.
Secrets: ephemeral resources + write-only arguments (*_wo / *_wo_version), sourced from AWS Secrets Manager / SSM / Vault. Never plaintext in .tf or .tfvars.
Optional wrappers: Terragrunt 1.0.x for many-environment DRY; OpenTofu 1.12.x if the project standardized on the fork (same HCL, same rules apply).

Project conventions

Standard module file split — one concern per file:

modules/vpc/
  main.tf          # resources, data sources, locals
  variables.tf     # typed inputs, validated
  outputs.tf       # typed outputs
  versions.tf      # terraform{} + required_providers
  README.md

Root/environment layout — separate directories per environment, one state each:

live/
  prod/{main.tf,backend.tf,terraform.tfvars}
  staging/{main.tf,backend.tf,terraform.tfvars}
modules/            # reusable, environment-agnostic

Naming: resource local names snake_case; do not repeat the type in the name — aws_s3_bucket.assets, not aws_s3_bucket.assets_bucket. Variables/outputs snake_case, descriptive, singular for scalars and plural for collections.
Use default_tags in the provider (not per-resource tags copy-paste) for org-wide tags; add resource-specific tags with merge().
One provider configuration per file; use provider alias for multi-region/multi-account, passed explicitly via module providers = { aws = aws.us_east_1 }.
Prefer terraform_data over the deprecated null_resource. Prefer jsonencode()/yamlencode() over hand-built heredocs. Prefer templatefile() over the removed template_file data source.
Run terraform fmt -recursive and terraform validate before every commit; both are CI gates.

State

Remote, locked, encrypted. State holds resource attributes in plaintext including secrets — treat the backend bucket as a secrets store: private, SSE-KMS, versioning on, access-logged, TLS-only bucket policy.

S3 backend with native locking, no DynamoDB table:

terraform {
  backend "s3" {
    bucket       = "acme-tfstate-prod"
    key          = "vpc/terraform.tfstate"
    region       = "eu-west-1"
    encrypt      = true
    use_lockfile = true          # S3 conditional-write lock; DynamoDB is deprecated
    kms_key_id   = "arn:aws:kms:eu-west-1:…:key/…"
  }
}

One state file per environment and per bounded blast radius (network, data, app in separate states). Never one giant root state for the whole org.
Never commit *.tfstate, *.tfstate.backup, or .terraform/ — add them to .gitignore. Never local state for anything shared.
Read cross-stack values via terraform_remote_state data source or, preferably, published outputs consumed through SSM Parameter Store — do not hardcode ARNs across stacks.
Fix drift by import/config change, never by editing state JSON. Use terraform state mv / moved blocks for refactors, terraform force-unlock only for a confirmed stale lock.

Structure and modules

Modules are the DRY unit. A module has typed input variables, typed outputs, and no hardcoded environment values. Root modules wire modules together and own the backend + providers (child modules must not declare a backend or fixed provider).

Type every variable; use object types with optional() and defaults instead of many loose vars:

variable "subnets" {
  description = "Map of subnet name to CIDR and AZ."
  type = map(object({
    cidr_block = string
    az         = string
    public     = optional(bool, false)
  }))
  nullable = false
}

Validate inputs with validation blocks — fail fast in plan, not at apply:

variable "environment" {
  type = string
  validation {
    condition     = contains(["prod", "staging", "dev"], var.environment)
    error_message = "environment must be prod, staging, or dev."
  }
}

Environments: separate directories with per-env *.tfvars (preferred for clarity) OR workspaces for identical-shape ephemeral stacks. Do not mix strategies. Never branch behavior on terraform.workspace for prod-vs-staging config that differs structurally.
Pin module sources exactly — registry modules with version = "x.y.z", Git modules with ?ref=<tag-or-sha>, never a bare branch:
```
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 6.0"
}
```
Outputs are the module's contract: name them, description them, mark secret ones sensitive = true. Reference resources by attribute, never reconstruct ARNs with string interpolation.

Versioning

Every root config declares required_version and required_providers with version constraints in versions.tf:

terraform {
  required_version = "~> 1.15"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}

Commit .terraform.lock.hcl — it pins exact provider versions and checksums for reproducible installs. Regenerate deliberately with terraform providers lock -platform=linux_amd64 -platform=darwin_arm64 (record every platform CI/devs use) and terraform init -upgrade on intentional bumps only.
Provider constraint ~> 6.0 allows any 6.x minor/patch, blocks the 7.0 major. The lock file, not the constraint, is what guarantees the exact build — so pin the major here and let .terraform.lock.hcl freeze the build.
Never run with unpinned providers or a stale/absent lock file — that is how a silent breaking provider release reaches prod.

Variables and secrets

No hardcoded secrets, ever. Source them at apply time and keep them out of state:

Read with an ephemeral resource (not persisted to state or plan):

ephemeral "aws_secretsmanager_secret_version" "db" {
  secret_id = "prod/db/password"
}

Feed into a write-only argument (value never stored in state; bump the version to rotate):

resource "aws_db_instance" "main" {
  password_wo         = ephemeral.aws_secretsmanager_secret_version.db.secret_string
  password_wo_version = 1
}

Mark any variable or output carrying secrets sensitive = true so it is redacted in plan/apply output. Sensitivity is not encryption — it still hits state unless write-only/ephemeral.
*.tfvars containing secrets must not be committed. Commit non-secret *.tfvars (region, sizing, tags) only; keep secret tfvars gitignored or supply via TF_VAR_* env / -var-file from a secret store.
Use nullable = false on variables that must always have a value; set explicit defaults only when a sane default exists.

Resources

for_each over count for any keyed/named set. count uses positional indices, so removing the middle element re-creates everything after it; for_each keys addresses by a stable string:
```
resource "aws_iam_user" "team" {
  for_each = toset(var.usernames)
  name     = each.value
}
```
Reserve count for a true 0/1 conditional (count = var.enabled ? 1 : 0).
Let Terraform infer dependencies from references; add explicit depends_on only for hidden ordering (IAM policy must exist before the resource that assumes the role). Do not sprinkle depends_on defensively — it slows and coarsens the graph.
Use lifecycle deliberately:
- create_before_destroy = true for zero-downtime replacement (LB target groups, launch templates).
- prevent_destroy = true on stateful data stores (RDS, prod S3) to block accidental deletion.
- ignore_changes = [tags["LastModified"]] for attributes mutated out-of-band — scoped, never ignore_changes = all.
- replace_triggered_by to force replacement on an upstream change.
Encode invariants with precondition/postcondition (in lifecycle) and standalone check blocks for continuous assertions that warn without blocking apply.
Refactor addresses with moved blocks (rename/move without destroy/recreate) and adopt existing infra with declarative import blocks — never terraform import ad hoc into a config you then hand-edit:
```
import {
  to = aws_s3_bucket.assets
  id = "acme-assets-prod"
}
```
Remove resources from state without deleting real infra using a removed block.

Workflow

The loop is fmt → validate → lint → plan (reviewed) → apply a saved plan:

terraform fmt -recursive -check
terraform validate
tflint --recursive
trivy config .
terraform plan -out=tfplan          # human reviews this
terraform apply tfplan              # applies exactly what was reviewed

Always apply a saved plan file in CI so what ships equals what was reviewed. Never apply without reading the plan; never apply -auto-approve interactively against prod.
Keep blast radius small: change one stack/module per PR; plan output must be legible in review. If a plan shows unexpected replacements, stop and investigate before applying.
Use -target only for surgical recovery, never as normal workflow — it produces partial applies and skewed state.
No console/portal drift: all changes go through code. If someone clicked in the console, reconcile via import or by codifying, then re-plan to zero diff.
CI runs fmt-check, validate, tflint, trivy, and terraform test on PR; plan on PR; apply gated behind manual approval on merge to the environment branch.

Testing

terraform test is the default. Put *.tftest.hcl in tests/. Each run block executes a plan or apply and asserts with assert conditions:

run "sets_bucket_name" {
  command = plan
  assert {
    condition     = aws_s3_bucket.assets.bucket == "acme-assets-prod"
    error_message = "bucket name did not match expected value"
  }
}

Use command = plan for fast unit-style checks (no real resources) and command = apply for integration runs that create and then tear down real infra.
Mock providers to test logic without credentials or cloud calls:
```
mock_provider "aws" {}
```
Add mock_resource / mock_data / override_data for specific computed values.
Test variable validation rules with expect_failures. Test modules through their public interface (inputs → outputs), not internal resource wiring.
Terratest (Go) is for end-to-end validation of deployed infra (HTTP reachable, DNS resolves) — heavier, slower, real cost; keep it in a separate CI stage.

Security

State is sensitive: encrypted at rest (SSE-KMS), private bucket, versioning + MFA-delete on the state bucket, TLS-only, least-privilege IAM to the backend. A readable state file is a full secrets leak.
Secrets via ephemeral/write-only + a secret manager; never in .tf, .tfvars, variables' defaults, or committed anywhere. Rotate by bumping *_wo_version.
Least-privilege IAM: scope resource policies to specific ARNs and actions; no "Action": "*" / "Resource": "*". Scan for this with Trivy.
trivy config . in CI to catch open security groups (0.0.0.0/0 on 22/3389), public S3 buckets, unencrypted volumes, plaintext secrets. Fix findings; suppress only with a documented #trivy:ignore and a reason.
Enable deletion protection / prevent_destroy on prod data stores; enable provider-level encryption defaults (S3 bucket SSE, RDS storage_encrypted = true, EBS encryption by default).
Use OIDC federation for CI (GitHub Actions → AWS role assumption), not long-lived access keys checked into secrets. Scope the CI role to what the pipeline actually manages.

Do

Pin required_version, every provider, and every module source; commit .terraform.lock.hcl.
Use remote encrypted locked state (use_lockfile = true), one state per blast radius.
Type, description, and validation-check every variable; mark secrets sensitive.
Use for_each with a map/set for keyed resources; count only for 0/1 toggles.
Source secrets via ephemeral + write-only args from a secret manager.
Run fmt -check, validate, tflint, trivy config, and terraform test in CI.
plan -out=tfplan, review it, then apply tfplan.
Refactor with moved, adopt with import blocks, drop-without-destroy with removed.
Set default_tags on the provider; merge() for per-resource additions.

Avoid

Local state / state in git → remote S3+lockfile or HCP Terraform.
DynamoDB lock table on new code → native use_lockfile = true (DynamoDB args deprecated).
Committed secrets or secret .tfvars → ephemeral resources + write-only args + secret manager.
Unpinned providers / missing lock file → ~> constraints plus committed .terraform.lock.hcl.
count for named resources → for_each (positional indices cause cascade re-creation).
apply without reviewing a plan, -auto-approve on prod → saved plan file, reviewed, then applied.
terraform import CLI into hand-edited config / editing state JSON → declarative import blocks and state mv / moved.
null_resource, template_file data source, ignore_changes = all → terraform_data, templatefile(), scoped ignore lists.
Untyped any variables, -target as routine workflow, console drift → typed objects, full plans, code-only changes.

When you code

Make small, single-stack diffs. Do not refactor addressing and change behavior in the same PR.
Before finishing, run terraform fmt -recursive, terraform validate, tflint, trivy config ., and terraform test; paste the relevant plan summary.
Read the plan yourself: if it shows any replace/destroy you did not intend, stop and explain before proposing apply.
Never run apply (or suggest it) against real infrastructure without an explicit go-ahead and a reviewed plan. Never touch prod state or run force-unlock without confirmation.
Ask before: creating or reconfiguring a state backend, bumping a provider major, adding a new provider or module dependency, or anything with prevent_destroy/data-loss potential (RDS, S3, stateful replacements).
When adopting existing resources, propose import blocks and show the expected zero-diff plan rather than creating parallel duplicates.