E2E Testing (Playwright) — AI agent rules (AGENTS.md, CLAUDE.md, .cursorrules)

You write end-to-end tests with Playwright Test. Good here means a small suite of high-value user-journey tests that are deterministic, isolated, parallel-safe, and readable — they fail only when the product is actually broken, and the trace tells you why in one click.

Stack

@playwright/test 1.61.x (the test runner — never the bare playwright library for tests). Browsers are pinned to the runner version; install with npx playwright install --with-deps.
Node.js 24 LTS (22 maintenance still supported). Run tests on the same major in CI and locally.
TypeScript 6.0 (strict). Type test files; do not ship .js specs.
ESLint 10 (flat config is the only supported format — eslintrc is gone) + eslint-plugin-playwright 2.10 (playwright.configs['flat/recommended']) + Prettier 3.
Config lives in playwright.config.ts via defineConfig + devices. Use expect, test, Locator, Page, APIRequestContext from @playwright/test — no third-party assertion libs.
Reporters: html (local), blob (sharded CI, merged with npx playwright merge-reports), github or list in CI logs.

Project conventions

e2e/
  fixtures.ts            # test.extend — the ONLY import specs use for test/expect
  pages/                 # Page Objects: login.page.ts, checkout.page.ts
  specs/                 # *.spec.ts, one journey per file
  setup/auth.setup.ts    # storageState generation (setup project)
  data/                  # factories/builders for seed payloads
playwright.config.ts

Spec files: *.spec.ts, named by journey (checkout.spec.ts), not by page.
Every spec imports from ../fixtures, never directly from @playwright/test, so fixtures/POMs are always in scope.
Page Objects end in .page.ts and expose Locators + intent methods; setup files end in .setup.ts.
baseURL set in config → use root-relative paths (page.goto('/checkout')), never hardcoded hosts.
Prettier formats; ESLint enforces no-wait-for-timeout, no-force-option, no-conditional-in-test, expect-expect, no-focused-test, no-skipped-test (warn), valid-expect.

What to E2E

E2E is the top of the pyramid: few, expensive, high-signal. Cover critical revenue/trust journeys only — the flows that lose money or users if broken.

Test: signup/login, checkout/payment, core CRUD of the primary entity, permission boundaries (user A cannot see B's data), and one happy path per critical feature.
Do NOT E2E: field-level validation, every error message, every branch, formatting, computed logic. Push those down to unit/integration/component tests where they run in milliseconds.
One assertion focus per test — a journey has a clear success state. If a spec needs 8 expects across 5 pages, it is probably two tests or belongs lower in the pyramid.
Prefer component tests (@playwright/experimental-ct-*) or API tests for anything that does not require the full stack rendered in a real browser.
Target: the whole E2E suite runs in single-digit minutes on CI. If it does not, you are over-E2E-ing.

Selectors

Resolve elements the way a user or assistive tech does. Selectors must survive refactors of DOM structure, class names, and styling.

Priority order:

getByRole('button', { name: 'Place order' }) — role + accessible name. Default choice; also asserts accessibility.
getByLabel('Email'), getByPlaceholder, getByText, getByAltText, getByTitle for the remainder.
getByTestId('cart-total') when there is no stable user-facing text (add data-testid in app code; configure via use: { testIdAttribute: 'data-testid' }).

NEVER CSS/XPath tied to structure or styling: page.locator('.btn-primary'), div > span:nth-child(3), //button[2]. These break on cosmetic changes and read as noise.
Chain and filter instead of clever selectors: getByRole('listitem').filter({ hasText: 'Pro plan' }).getByRole('button', { name: 'Remove' }).
Scope to a region first: const row = page.getByRole('row', { name: 'INV-042' }); await row.getByRole('button', { name: 'Pay' }).click();.
Expect one match — a strict-mode violation (multiple matches) is a real bug in your selector, not something to paper over with .first(). Use .first()/.nth() only for genuinely repeated UI.
Assert accessibility structure with await expect(page.getByRole('main')).toMatchAriaSnapshot(...) (YAML snapshot) instead of brittle DOM checks.

Stability = zero flakiness

Playwright auto-waits for actionability (attached, visible, stable, enabled, receives events) before every action, and web-first assertions retry until they pass or time out. Lean entirely on that.

Assert with retrying matchers: await expect(locator).toBeVisible(), .toHaveText(), .toHaveValue(), .toHaveCount(), .toHaveURL(), .toBeEnabled(). They poll — no manual waiting.
NEVER await page.waitForTimeout(...) / sleep. A fixed sleep is either a hidden race or wasted time; it is banned by lint. Replace with an assertion on the state you were waiting for.
Assert on observable state, never on timing. Wait for the success toast/heading/URL, not "3 seconds after clicking".
Never use .click({ force: true }) to defeat actionability checks — a non-actionable element is a bug or a wrong selector. Fix the selector or wait for the real precondition.
Do not assert on non-retrying values mid-flight: expect(await locator.textContent()).toBe(...) reads once and flakes. Use await expect(locator).toHaveText(...).
For non-DOM conditions use expect.poll(async () => ...) or await expect(async () => { ... }).toPass() — never a while-loop with sleeps.
Control time with page.clock (install → fastForward/setFixedTime) instead of waiting for real timers/animations/polling intervals.
Wait for network you triggered via the response you care about: const resp = page.waitForResponse('**/api/orders'); await placeOrder(); await resp; — but prefer asserting the resulting UI.

Isolation and parallelism

Each test gets a fresh BrowserContext (clean cookies/storage) for free. Keep that isolation; run fullyParallel: true.

Every test is independent and idempotent: it can run alone, in any order, repeated, and in parallel with others. No test depends on another having run first.
No shared mutable state across tests — no module-level counters, no "created in test A, used in test B". beforeAll may set up read-only fixtures only.
Never test.describe.serial to share state or paper over ordering; reserve serial mode for genuinely sequential UI (a multi-step wizard within one journey) and know it disables parallelism for that block.
Each parallel worker must operate on its own data namespace. Derive uniqueness from test.info().parallelIndex or a UUID so workers never collide on the same record.
Do not depend on DB row order, auto-increment IDs, or "the first item" — another worker may have inserted rows.
Authentication: generate storageState once in a setup project and reuse it; do NOT log in through the UI in every test.

// playwright.config.ts
projects: [
  { name: 'setup', testMatch: /.*\.setup\.ts/ },
  { name: 'chromium', use: { ...devices['Desktop Chrome'], storageState: 'e2e/.auth/user.json' },
    dependencies: ['setup'] },
]

Test data

Seed and clean up through the fast path (API/DB/fixtures), never by clicking through the UI to arrange state.

Arrange state via request (the built-in APIRequestContext) or a seed helper hitting your backend/test-only endpoint — reserve the browser for the behavior under test.
Make every record unique per test: email: \u+${crypto.randomUUID()}@test.dev``. Never reuse a fixed email/SKU/slug across tests.
Own the lifecycle in a fixture: create in setup, tear down in the fixture's cleanup phase (after use) so it runs even on failure.

export const test = base.extend<{ order: Order }>({
  order: async ({ request }, use) => {
    const order = await createOrder(request, { sku: `SKU-${randomUUID()}` });
    await use(order);
    await deleteOrder(request, order.id); // runs even if the test fails
  },
});

Prefer per-test creation over one big shared seed script — shared seed data recreates order-dependence and cross-test coupling.
Point tests at a dedicated, disposable test environment/database. Never run destructive E2E against production or a shared dev DB.

Reliability and CI

Deterministic locally, resilient and diagnosable in CI.

Config baseline:

export default defineConfig({
  testDir: './e2e',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,           // fail if test.only is committed
  retries: process.env.CI ? 2 : 0,        // retries mask flake locally — keep them CI-only
  workers: process.env.CI ? '50%' : undefined,
  reporter: process.env.CI ? [['blob'], ['github']] : [['html']],
  use: {
    baseURL: process.env.BASE_URL ?? 'http://localhost:3000',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  webServer: { command: 'pnpm start', url: 'http://localhost:3000', reuseExistingServer: !process.env.CI },
});

trace: 'on-first-retry' gives a full time-travel trace for every CI failure at near-zero steady-state cost. Open with npx playwright show-trace.
Shard across CI machines (--shard=1/4) with the blob reporter, then npx playwright merge-reports --reporter=html ./blob-report for one combined HTML report.
Mock ALL third-party/external services (payment gateways, email, analytics, maps, LLM APIs). Real third parties make tests slow, flaky, rate-limited, and non-deterministic.

await page.route('**/api.stripe.com/**', route =>
  route.fulfill({ status: 200, json: { id: 'pi_test', status: 'succeeded' } }));

Record/replay your OWN backend with page.routeFromHAR('api.har', { url: '**/api/**', update: false }) when a real backend is unavailable; regenerate HARs deliberately.
Retries reduce noise; they do not license ignoring flake. A test that only passes on retry is a defect — quarantine with a tracked test.fixme and fix the root cause. Never @ts-ignore or loosen an assertion to make red go green.
Fail the build on test.only (forbidOnly) and on unexpected console/pageerror in critical specs (assert clean console where it matters).

Page Object Model

Encapsulate page structure and interactions so specs read as user intent and selectors live in one place.

export class CheckoutPage {
  constructor(private readonly page: Page) {}
  readonly total = this.page.getByTestId('cart-total');
  readonly placeOrder = this.page.getByRole('button', { name: 'Place order' });

  async goto() { await this.page.goto('/checkout'); }
  async pay(card: Card) {
    await this.page.getByLabel('Card number').fill(card.number);
    await this.placeOrder.click();
  }
}

POMs expose Locator properties (lazy, re-evaluated per use) and action methods; they do NOT expose raw CSS strings.
Keep assertions in the spec, not buried in POM methods — the POM does, the test verifies. (A small number of self-checking helpers like expectLoaded() are fine.)
Inject POMs as fixtures so specs get them typed and ready: checkout: async ({ page }, use) => use(new CheckoutPage(page)).
No waitForTimeout, no test logic, no data seeding inside POMs — those belong in fixtures.

Testing

Framework: Playwright Test (@playwright/test). Structure with test, test.describe, test.step (label multi-action phases so traces/reports read as a narrative).
Author selectors with npx playwright codegen, but hand-clean generated code to role-based locators and web-first assertions before committing.
Debug with UI mode (npx playwright test --ui) and the Trace Viewer; use --last-failed and --only-changed to iterate fast; --repeat-each=20 to hunt flake.
Tag suites for selective runs: test('...', { tag: '@smoke' }, ...) then --grep @smoke / --grep-invert.
Visual regression via await expect(page).toHaveScreenshot() with maxDiffPixelRatio; mask dynamic regions (mask: [locator]) and disable animations. Commit baselines per platform; generate them in CI, not on a dev laptop.
Keep unit/integration/component tests as the bulk of coverage; E2E only asserts the assembled system.

Security

Never commit credentials. Read secrets from env (process.env); load local .env via CI secrets or a git-ignored file. storageState JSON and *.har can contain live tokens/PII — git-ignore e2e/.auth/ and scrub HARs.
Do not point authenticated, data-mutating tests at production. Use a dedicated test tenant/environment.
Test authorization explicitly: a low-privilege storageState must get 403/hidden UI on privileged actions — assert the negative, do not assume it.
Set use: { ignoreHTTPSErrors: false } outside of controlled local self-signed setups; do not blanket-disable TLS verification.
When exercising untrusted content, keep default context isolation; never launch with --disable-web-security/--no-sandbox to make a test pass.
Rotate and scope test accounts to the test environment; never reuse real user credentials.

Do

Import test/expect from your fixtures.ts; build POMs and data as fixtures with automatic teardown.
Use role/label/text locators first, getByTestId as the deliberate fallback.
Assert exclusively with retrying web-first matchers (toBeVisible, toHaveText, toHaveURL, toHaveCount).
Seed and clean state via request/API with per-test unique data; log in once via a setup project + storageState.
Run fullyParallel, keep tests order-independent, mock every external service, enable trace: 'on-first-retry'.
Scope locators to a region, use test.step for readable traces, tag suites (@smoke) for fast subsets.

Avoid

page.waitForTimeout(...) / sleep → replace with a web-first assertion on the awaited state.
Brittle selectors (.locator('.btn'), nth-child, XPath) → getByRole/getByLabel/getByTestId.
expect(await locator.textContent()).toBe(...) (reads once, flakes) → await expect(locator).toHaveText(...).
.click({ force: true }) and manual waitForSelector → fix the selector / precondition and let auto-wait handle it.
Deprecated/legacy APIs: element handles page.$/page.$$, Locator.ariaRef() (removed 1.60), context videosPath/videoSize (removed 1.60) → locators + config use.video.
Logging in through the UI in every test → storageState from a setup project.
test.describe.serial to share state, cross-test data reuse, relying on ID/order → independent, uniquely-seeded tests.
Hitting real Stripe/email/analytics/LLM endpoints → page.route / routeFromHAR mocks.
E2E-ing validation/edge cases/every branch → push down to unit/component tests.
Retrying or loosening assertions to hide flake → root-cause it or test.fixme with a tracked issue.

When you code

Ship small diffs: one journey per spec, one concern per PR. Do not add fixtures/POMs "for later" — add them when a second test needs them.
Before proposing a test, confirm it belongs at E2E level (critical journey, needs the full stack). If a unit/component/API test covers it faster, write that instead and say so.
After writing, run npx playwright test --only-changed locally and --repeat-each=5 on new specs to catch flake before it reaches CI; run tsc --noEmit, ESLint, and Prettier.
Always attach a failing trace when reporting a failure; never mark a test green by weakening its assertion.
Ask before: adding a test-only backend endpoint or seed hook, introducing visual-regression baselines, changing global config (retries, workers, testIdAttribute), or pointing tests at a new environment.
If a requested behavior needs real third-party calls, flag it and propose a mock/contract-test boundary instead of wiring the live service.