Workflow · Refactoring · Fowler 2e + Tidy First · ast-grep 0.44 · Stryker/PIT/mutmut
Refactoring
Green tests, tiny steps, one reason at a time.
Updated 5 Jul 2026 · CC0
AGENTS.mdrepo rootYou are refactoring: changing the internal structure of code without changing its observable behavior. "Good" here means every commit is behavior-preserving, provably green, mechanically small, and separately revertable — you improve design in tiny verified steps, never mixing a cleanup with a feature or fix.
Stack
The discipline is language-agnostic; the toolchain is not. Use the current idiomatic tools:
- Canon: Fowler, Refactoring 2nd ed (the named-move catalog); Kent Beck, Tidy First? (structure-vs-behavior separation, "two hats"). Refer to moves by their catalog names.
- Automated refactorings first: your IDE / language server, never hand-editing. IntelliJ/Rider
Shift+F6(Rename),Ctrl+Alt+M(Extract Method),Ctrl+Alt+V(Extract Variable),Ctrl+Alt+N(Inline). VS CodeF2(Rename Symbol) + code-action "Extract to function/constant". These update every reference and respect scope/shadowing; a find-replace does not. - Structural search & codemod at scale: ast-grep 0.44 (
ast-grep run -p '<pat>' -r '<rewrite>', rules insgconfig.yml) — AST-aware, tree-sitter based, polyglot. Use Comby only for non-AST/whitespace-agnostic text or languages ast-grep lacks. Per-ecosystem codemods:jscodeshift,gofmt -r,rustfix/cargo clippy --fix, OpenRewrite (JVM). - Coverage (the safety net you measure before touching):
coverage.py(Python),c8/native V8 (JS/TS), JaCoCo (JVM),go test -cover,cargo llvm-cov. Branch coverage, not line. - Mutation testing (prove the net actually catches regressions): StrykerJS 9.6, Stryker.NET 4.14 (JS/C#), PIT (JVM),
mutmut(Python),cargo-mutants(Rust). - Characterization / golden-master for untested legacy: ApprovalTests (Java/Python/.NET/C++/JS),
Verify(.NET), framework snapshots (toMatchSnapshot,instafor Rust). - Complexity metrics (find real smells, don't guess):
radon cc/ruff(Python), ESLintcomplexity/max-depth/max-params,gocyclo,clippy::cognitive_complexity, PMD/SonarQube. - Dead-code / unused detection (locate what to delete, don't eyeball): knip (JS/TS — unused files, exports, types, deps;
--fix/--fix-type exports,typesauto-removes; supersedes the unmaintained ts-prune/depcheck),vulture(Python),cargo machete(Rust unused deps),unimport/deadcode(Go). Run before a cleanup pass, review, then delete. - Version control: Git. Atomic commits, Conventional Commits (
refactor:vsfeat:/fix:).git add -pto split,git mvfor pure moves.
Project conventions
- Match the repo's existing formatter/linter config exactly — refactoring must not introduce a reformat. Run the project's formatter (Prettier/Black/
gofmt/rustfmt/ktlint) so the diff shows structure changes only, never whitespace churn. - Do a formatter-only commit first if the file is unformatted, so later refactor diffs are readable. Never reformat-and-refactor in one commit.
- Keep the public API surface (exported names, signatures, file paths) stable within a refactor; if it must change, that is a separate deprecation-cycle change, not a refactor.
- Commit message:
refactor(scope): extract X from Y— name the catalog move. One move type per commit. - Branch small; a refactor PR is reviewed by "is this the same behavior?" not "is this a good feature?" Keep them separable.
Preconditions — never start without these
Tests green first. Run the full suite (or the affected module's) and confirm green before the first edit. A refactor started on red has no baseline. Never commit red.
Behavior is preserved, definitionally. If observable behavior (outputs, side effects, error types, ordering, performance contract) changes at all, it is not a refactor — split it out.
No tests? Add characterization tests first, in their own commit. Pin current behavior — including current bugs — with ApprovalTests/golden-master by feeding representative inputs and approving the actual output. You are freezing what is, not asserting what should be.
# untested legacy: capture behavior before changing structure verify(renderInvoice(sampleOrder)) # approve the current output as the masterVerify the net has teeth. Before a nontrivial refactor of a module, run mutation testing on it (
stryker run,pitest,mutmut run,cargo mutants). Surviving mutants = blind spots; add tests there first, or you are refactoring without a real safety net despite green coverage.Separate the hats. Decide up front: this commit is structure OR behavior, never both. A PR may be
SSSBBSB, but each commit is atomic and single-kind.
Small steps
- One transformation at a time. Extract one function, run tests, commit. Then the next. Not five moves then a test run.
- Run the test suite after every step. Use watch mode / a fast focused subset for the inner loop; full suite before commit. If a step goes red, revert that single step — don't debug forward on top of a broken refactor.
- Commit every working (green) state. Small commits make
git bisectandgit revertprecise; a 600-line "refactor" commit is unrevertable and unreviewable. - Prefer automated/IDE refactorings over manual edits — they are behavior-preserving by construction and touch all references. Reserve hand-editing for moves the tool can't do, and re-run tests immediately.
- For codebase-wide mechanical changes, use one
ast-grep/codemod pass, then read the entire diff — an AST rewrite can silently drop a guard or change evaluation order. Never trust a codemod unreviewed.
Common moves (use the catalog names)
Extract Function/Variable to name a concept or a magic literal; Inline when the indirection earns nothing. Extract when a fragment needs a comment to explain it — the function name replaces the comment.
Rename for intent. The single highest-ROI move. Use the IDE rename so all references update atomically; commit renames alone.
Replace nested conditional with guard clauses. Flatten early-return the exceptional/edge cases; the main path drops an indent level.
// before: arrow code // after: guard clauses if (a) { if (b) { return f() } } if (!a) return default if (!b) return default return f()Replace conditional with polymorphism / Strategy when a
switch/ifon a type tag repeats in multiple places.Introduce Parameter Object when 3+ args travel together (
x, y, w, h→Rect). Attacks Long Parameter List and Data Clumps.Replace Primitive with Object for primitive obsession (a bare
stringemail/money/id that has invariants → a value type).Delete dead code — don't comment it out, don't
if (false). Git remembers. Remove unreachable branches and unused params/exports/files (let the linter, knip--fix(JS/TS), orvulture(Python) find them — never delete by eye).Remove real duplication — rule of three. Two occurrences: wait. Three near-identical: unify. Duplication that merely looks alike but changes for different reasons is not duplication — leave it. "Duplication is far cheaper than the wrong abstraction" (Metz); premature DRY couples things that should evolve apart.
Preparatory refactoring: "make the change easy, then make the easy change" (Beck). Restructure first (separate commit), then add the feature (next commit).
Target real smells — measure, don't gold-plate
- Refactor to serve a concrete goal: an incoming feature, a bug you keep hitting, or a metric over a threshold. Don't refactor code you're not otherwise touching. Aimless "cleanup" churns diffs and risk for no payoff.
- Use metrics, not vibes, to locate smells: cyclomatic/cognitive complexity > ~10, function length > ~50 lines, nesting depth > 3, parameter count > 4. Feed the linter thresholds; refactor the outliers, not everything.
- Real smells worth the change: Long Function, Long Parameter List, duplicated logic (rule of three), Feature Envy (a method using another object's data more than its own → Move Function), Primitive Obsession, deep nesting, Shotgun Surgery, Divergent Change.
- Stop when the smell is gone. Do not keep polishing past the point that unblocks the goal — over-abstraction (speculative interfaces, one-implementation factories) is itself a smell.
Keep diffs reviewable
- One kind of change per commit. Never mix a rename, an extract, and a move — the reviewer can't tell mechanical from meaningful.
- Pure moves in their own commit.
git mva file, or move a function, with zero other edits, so Git's rename detection keeps history/blame. Move-and-edit in one commit destroysgit log --follow. - Reformatting, renaming, and logic restructuring are three separate commits.
- If a "refactor" diff shows changed test assertions or new behavior, it isn't a refactor — the tests are the tell. Green tests, unchanged, are the proof.
When NOT to refactor
- Code you're about to delete or rewrite wholesale — don't polish the condemned.
- No tests and no time/ability to add characterization tests — refactoring blind is how you ship silent regressions. Add the net first or don't touch it.
- Mid-feature, on the deadline path, in code unrelated to the task — note it, do it after, in its own PR.
- A measured hot path where the cleaner form regresses performance — verify with a benchmark; readability doesn't trump a proven latency budget.
- Never a big-bang rewrite disguised as refactoring. Incremental, behavior-preserving, or it's a rewrite — call it that and manage the risk accordingly.
Testing
- Test observable behavior through the public API/seam, not private internals. Tests coupled to internal structure break during refactoring and give false alarms — that coupling is itself the bug. If a refactor forces test changes, either your tests were structural or you changed behavior.
- Keep the loop fast: watch mode + focused subset for inner-loop feedback, full suite before every commit.
- For legacy without seams, introduce a seam minimally (sprout/wrap) to get a test point, then characterize — but that seam is a structural change: its own commit.
- Coverage tells you what's exercised; mutation testing tells you whether it's actually asserted. Trust the latter before large refactors.
- Snapshot/golden-master tests are for freezing legacy behavior, not a substitute for intent-revealing assertions on new code.
Security
- Behavior preservation includes security behavior. Do not "simplify" auth/authz checks, input validation, output encoding, crypto calls, or SQL parameterization during a refactor — a subtle change there is a vulnerability, not a cleanup.
- Never delete a validation/bounds/null check as "redundant." It may be defense in depth; removing it is a behavior change requiring its own review, not a refactor.
- Preserve error-handling shape exactly — don't broaden a
catch, don't start leaking stack traces/messages, don't swallow errors that were surfacing. - Automated codemods (
ast-grep, jscodeshift) can silently drop a guard or reorder evaluation across the whole tree — review every hunk; run SAST (Semgrep/CodeQL) diff before/after to confirm no new findings. - Keep secrets/PII handling byte-identical; refactors must not move a secret into a log line or a new error path.
Do
- Confirm green, then refactor; commit each green step with
refactor:and the move name. - Add characterization/golden-master tests before touching untested code.
- Use IDE/
ast-grepautomated refactorings; read the full generated diff. - Keep structure commits and behavior commits strictly separate (two hats).
- Locate smells with complexity metrics; refactor the outliers to a concrete goal.
- Apply rule of three before extracting shared abstractions.
- Put pure moves/renames/reformats in their own commits to preserve blame and reviewability.
- Revert the last step on red; never debug forward on a broken refactor.
Avoid
- Refactoring without a passing test suite or a characterization net — the #1 way to ship silent regressions.
- Mixing a refactor with a feature/fix in one commit or PR (
refactor:diffs must show no behavior change). - Big-bang rewrites labeled "refactor" — go incremental or own that it's a rewrite.
- Find-and-replace renames instead of semantic IDE rename (misses shadowing, hits strings/comments).
- Move-and-edit in one commit (kills
git log --followrename tracking) — split them. - Commenting out or
if (false)-gating dead code instead of deleting it — Git is your history. - Premature DRY: unifying two look-alikes that change for different reasons — prefer duplication over the wrong abstraction until the third occurrence.
- Speculative generality (interfaces/factories/hooks with one caller) — remove it; it's a smell, not foresight.
- Reformatting a file in the same commit as logic changes — separate the whitespace commit.
- Trusting coverage % as the safety net — verify with mutation testing before large moves.
When you code
- State the goal and scope of the refactor, and which move(s) you'll apply, before editing.
- Run typecheck + lint + full test suite before you start (establish green) and after each step; paste the green result.
- Keep diffs small and single-purpose; if a change grows past one move or one file's worth of one concern, stop and split the commit.
- If the target code has no tests, ask/decide to add characterization tests first — do not refactor blind.
- If a step turns tests red, revert that step and report; do not proceed on top of a broken state.
- If you discover the "refactor" actually requires a behavior change (bug fix, API change), stop and separate it into its own commit/PR — surface the choice rather than smuggling it in.
- Never touch auth, validation, crypto, or error-handling semantics under the banner of refactoring without explicitly flagging it as a behavior change.
Drop it in your repo
Save these rules as AGENTS.md, CLAUDE.md, .cursorrules, .windsurfrules or .github/copilot-instructions.md — your agent instantly codes to the same standard on Refactoring · Fowler 2e + Tidy First · ast-grep 0.44 · Stryker/PIT/mutmut.