Promptheus/rules53 rule sets · CC0Promptheus hub ↗

Workflow · Refactoring · Fowler 2e + Tidy First · ast-grep 0.44 · Stryker/PIT/mutmut

Refactoring

Green tests, tiny steps, one reason at a time.

refactoringclean-codemaintainability

Updated 5 Jul 2026 · CC0

AGENTS.mdrepo root

You are refactoring: changing the internal structure of code without changing its observable behavior. "Good" here means every commit is behavior-preserving, provably green, mechanically small, and separately revertable — you improve design in tiny verified steps, never mixing a cleanup with a feature or fix.

Stack

The discipline is language-agnostic; the toolchain is not. Use the current idiomatic tools:

  • Canon: Fowler, Refactoring 2nd ed (the named-move catalog); Kent Beck, Tidy First? (structure-vs-behavior separation, "two hats"). Refer to moves by their catalog names.
  • Automated refactorings first: your IDE / language server, never hand-editing. IntelliJ/Rider Shift+F6 (Rename), Ctrl+Alt+M (Extract Method), Ctrl+Alt+V (Extract Variable), Ctrl+Alt+N (Inline). VS Code F2 (Rename Symbol) + code-action "Extract to function/constant". These update every reference and respect scope/shadowing; a find-replace does not.
  • Structural search & codemod at scale: ast-grep 0.44 (ast-grep run -p '<pat>' -r '<rewrite>', rules in sgconfig.yml) — AST-aware, tree-sitter based, polyglot. Use Comby only for non-AST/whitespace-agnostic text or languages ast-grep lacks. Per-ecosystem codemods: jscodeshift, gofmt -r, rustfix/cargo clippy --fix, OpenRewrite (JVM).
  • Coverage (the safety net you measure before touching): coverage.py (Python), c8/native V8 (JS/TS), JaCoCo (JVM), go test -cover, cargo llvm-cov. Branch coverage, not line.
  • Mutation testing (prove the net actually catches regressions): StrykerJS 9.6, Stryker.NET 4.14 (JS/C#), PIT (JVM), mutmut (Python), cargo-mutants (Rust).
  • Characterization / golden-master for untested legacy: ApprovalTests (Java/Python/.NET/C++/JS), Verify (.NET), framework snapshots (toMatchSnapshot, insta for Rust).
  • Complexity metrics (find real smells, don't guess): radon cc/ruff (Python), ESLint complexity/max-depth/max-params, gocyclo, clippy::cognitive_complexity, PMD/SonarQube.
  • Dead-code / unused detection (locate what to delete, don't eyeball): knip (JS/TS — unused files, exports, types, deps; --fix/--fix-type exports,types auto-removes; supersedes the unmaintained ts-prune/depcheck), vulture (Python), cargo machete (Rust unused deps), unimport/deadcode (Go). Run before a cleanup pass, review, then delete.
  • Version control: Git. Atomic commits, Conventional Commits (refactor: vs feat:/fix:). git add -p to split, git mv for pure moves.

Project conventions

  • Match the repo's existing formatter/linter config exactly — refactoring must not introduce a reformat. Run the project's formatter (Prettier/Black/gofmt/rustfmt/ktlint) so the diff shows structure changes only, never whitespace churn.
  • Do a formatter-only commit first if the file is unformatted, so later refactor diffs are readable. Never reformat-and-refactor in one commit.
  • Keep the public API surface (exported names, signatures, file paths) stable within a refactor; if it must change, that is a separate deprecation-cycle change, not a refactor.
  • Commit message: refactor(scope): extract X from Y — name the catalog move. One move type per commit.
  • Branch small; a refactor PR is reviewed by "is this the same behavior?" not "is this a good feature?" Keep them separable.

Preconditions — never start without these

  • Tests green first. Run the full suite (or the affected module's) and confirm green before the first edit. A refactor started on red has no baseline. Never commit red.

  • Behavior is preserved, definitionally. If observable behavior (outputs, side effects, error types, ordering, performance contract) changes at all, it is not a refactor — split it out.

  • No tests? Add characterization tests first, in their own commit. Pin current behavior — including current bugs — with ApprovalTests/golden-master by feeding representative inputs and approving the actual output. You are freezing what is, not asserting what should be.

    # untested legacy: capture behavior before changing structure
    verify(renderInvoice(sampleOrder))   # approve the current output as the master
    
  • Verify the net has teeth. Before a nontrivial refactor of a module, run mutation testing on it (stryker run, pitest, mutmut run, cargo mutants). Surviving mutants = blind spots; add tests there first, or you are refactoring without a real safety net despite green coverage.

  • Separate the hats. Decide up front: this commit is structure OR behavior, never both. A PR may be SSSBBSB, but each commit is atomic and single-kind.

Small steps

  • One transformation at a time. Extract one function, run tests, commit. Then the next. Not five moves then a test run.
  • Run the test suite after every step. Use watch mode / a fast focused subset for the inner loop; full suite before commit. If a step goes red, revert that single step — don't debug forward on top of a broken refactor.
  • Commit every working (green) state. Small commits make git bisect and git revert precise; a 600-line "refactor" commit is unrevertable and unreviewable.
  • Prefer automated/IDE refactorings over manual edits — they are behavior-preserving by construction and touch all references. Reserve hand-editing for moves the tool can't do, and re-run tests immediately.
  • For codebase-wide mechanical changes, use one ast-grep/codemod pass, then read the entire diff — an AST rewrite can silently drop a guard or change evaluation order. Never trust a codemod unreviewed.

Common moves (use the catalog names)

  • Extract Function/Variable to name a concept or a magic literal; Inline when the indirection earns nothing. Extract when a fragment needs a comment to explain it — the function name replaces the comment.

  • Rename for intent. The single highest-ROI move. Use the IDE rename so all references update atomically; commit renames alone.

  • Replace nested conditional with guard clauses. Flatten early-return the exceptional/edge cases; the main path drops an indent level.

    // before: arrow code            // after: guard clauses
    if (a) { if (b) { return f() } } if (!a) return default
                                      if (!b) return default
                                      return f()
    
  • Replace conditional with polymorphism / Strategy when a switch/if on a type tag repeats in multiple places.

  • Introduce Parameter Object when 3+ args travel together (x, y, w, hRect). Attacks Long Parameter List and Data Clumps.

  • Replace Primitive with Object for primitive obsession (a bare string email/money/id that has invariants → a value type).

  • Delete dead code — don't comment it out, don't if (false). Git remembers. Remove unreachable branches and unused params/exports/files (let the linter, knip --fix (JS/TS), or vulture (Python) find them — never delete by eye).

  • Remove real duplication — rule of three. Two occurrences: wait. Three near-identical: unify. Duplication that merely looks alike but changes for different reasons is not duplication — leave it. "Duplication is far cheaper than the wrong abstraction" (Metz); premature DRY couples things that should evolve apart.

  • Preparatory refactoring: "make the change easy, then make the easy change" (Beck). Restructure first (separate commit), then add the feature (next commit).

Target real smells — measure, don't gold-plate

  • Refactor to serve a concrete goal: an incoming feature, a bug you keep hitting, or a metric over a threshold. Don't refactor code you're not otherwise touching. Aimless "cleanup" churns diffs and risk for no payoff.
  • Use metrics, not vibes, to locate smells: cyclomatic/cognitive complexity > ~10, function length > ~50 lines, nesting depth > 3, parameter count > 4. Feed the linter thresholds; refactor the outliers, not everything.
  • Real smells worth the change: Long Function, Long Parameter List, duplicated logic (rule of three), Feature Envy (a method using another object's data more than its own → Move Function), Primitive Obsession, deep nesting, Shotgun Surgery, Divergent Change.
  • Stop when the smell is gone. Do not keep polishing past the point that unblocks the goal — over-abstraction (speculative interfaces, one-implementation factories) is itself a smell.

Keep diffs reviewable

  • One kind of change per commit. Never mix a rename, an extract, and a move — the reviewer can't tell mechanical from meaningful.
  • Pure moves in their own commit. git mv a file, or move a function, with zero other edits, so Git's rename detection keeps history/blame. Move-and-edit in one commit destroys git log --follow.
  • Reformatting, renaming, and logic restructuring are three separate commits.
  • If a "refactor" diff shows changed test assertions or new behavior, it isn't a refactor — the tests are the tell. Green tests, unchanged, are the proof.

When NOT to refactor

  • Code you're about to delete or rewrite wholesale — don't polish the condemned.
  • No tests and no time/ability to add characterization tests — refactoring blind is how you ship silent regressions. Add the net first or don't touch it.
  • Mid-feature, on the deadline path, in code unrelated to the task — note it, do it after, in its own PR.
  • A measured hot path where the cleaner form regresses performance — verify with a benchmark; readability doesn't trump a proven latency budget.
  • Never a big-bang rewrite disguised as refactoring. Incremental, behavior-preserving, or it's a rewrite — call it that and manage the risk accordingly.

Testing

  • Test observable behavior through the public API/seam, not private internals. Tests coupled to internal structure break during refactoring and give false alarms — that coupling is itself the bug. If a refactor forces test changes, either your tests were structural or you changed behavior.
  • Keep the loop fast: watch mode + focused subset for inner-loop feedback, full suite before every commit.
  • For legacy without seams, introduce a seam minimally (sprout/wrap) to get a test point, then characterize — but that seam is a structural change: its own commit.
  • Coverage tells you what's exercised; mutation testing tells you whether it's actually asserted. Trust the latter before large refactors.
  • Snapshot/golden-master tests are for freezing legacy behavior, not a substitute for intent-revealing assertions on new code.

Security

  • Behavior preservation includes security behavior. Do not "simplify" auth/authz checks, input validation, output encoding, crypto calls, or SQL parameterization during a refactor — a subtle change there is a vulnerability, not a cleanup.
  • Never delete a validation/bounds/null check as "redundant." It may be defense in depth; removing it is a behavior change requiring its own review, not a refactor.
  • Preserve error-handling shape exactly — don't broaden a catch, don't start leaking stack traces/messages, don't swallow errors that were surfacing.
  • Automated codemods (ast-grep, jscodeshift) can silently drop a guard or reorder evaluation across the whole tree — review every hunk; run SAST (Semgrep/CodeQL) diff before/after to confirm no new findings.
  • Keep secrets/PII handling byte-identical; refactors must not move a secret into a log line or a new error path.

Do

  • Confirm green, then refactor; commit each green step with refactor: and the move name.
  • Add characterization/golden-master tests before touching untested code.
  • Use IDE/ast-grep automated refactorings; read the full generated diff.
  • Keep structure commits and behavior commits strictly separate (two hats).
  • Locate smells with complexity metrics; refactor the outliers to a concrete goal.
  • Apply rule of three before extracting shared abstractions.
  • Put pure moves/renames/reformats in their own commits to preserve blame and reviewability.
  • Revert the last step on red; never debug forward on a broken refactor.

Avoid

  • Refactoring without a passing test suite or a characterization net — the #1 way to ship silent regressions.
  • Mixing a refactor with a feature/fix in one commit or PR (refactor: diffs must show no behavior change).
  • Big-bang rewrites labeled "refactor" — go incremental or own that it's a rewrite.
  • Find-and-replace renames instead of semantic IDE rename (misses shadowing, hits strings/comments).
  • Move-and-edit in one commit (kills git log --follow rename tracking) — split them.
  • Commenting out or if (false)-gating dead code instead of deleting it — Git is your history.
  • Premature DRY: unifying two look-alikes that change for different reasons — prefer duplication over the wrong abstraction until the third occurrence.
  • Speculative generality (interfaces/factories/hooks with one caller) — remove it; it's a smell, not foresight.
  • Reformatting a file in the same commit as logic changes — separate the whitespace commit.
  • Trusting coverage % as the safety net — verify with mutation testing before large moves.

When you code

  • State the goal and scope of the refactor, and which move(s) you'll apply, before editing.
  • Run typecheck + lint + full test suite before you start (establish green) and after each step; paste the green result.
  • Keep diffs small and single-purpose; if a change grows past one move or one file's worth of one concern, stop and split the commit.
  • If the target code has no tests, ask/decide to add characterization tests first — do not refactor blind.
  • If a step turns tests red, revert that step and report; do not proceed on top of a broken state.
  • If you discover the "refactor" actually requires a behavior change (bug fix, API change), stop and separate it into its own commit/PR — surface the choice rather than smuggling it in.
  • Never touch auth, validation, crypto, or error-handling semantics under the banner of refactoring without explicitly flagging it as a behavior change.

Drop it in your repo

Save these rules as AGENTS.md, CLAUDE.md, .cursorrules, .windsurfrules or .github/copilot-instructions.md — your agent instantly codes to the same standard on Refactoring · Fowler 2e + Tidy First · ast-grep 0.44 · Stryker/PIT/mutmut.

Back to top ↑