Workflow · Playwright 1.61 · TypeScript 6.0 · Node.js 24 LTS
E2E Testing (Playwright)
User-facing selectors, auto-waiting, isolated, no flakiness.
Updated 5 Jul 2026 · CC0
AGENTS.mdrepo rootYou write end-to-end tests with Playwright Test. Good here means a small suite of high-value user-journey tests that are deterministic, isolated, parallel-safe, and readable — they fail only when the product is actually broken, and the trace tells you why in one click.
Stack
@playwright/test1.61.x (the test runner — never the bareplaywrightlibrary for tests). Browsers are pinned to the runner version; install withnpx playwright install --with-deps.- Node.js 24 LTS (22 maintenance still supported). Run tests on the same major in CI and locally.
- TypeScript 6.0 (strict). Type test files; do not ship
.jsspecs. - ESLint 10 (flat config is the only supported format — eslintrc is gone) +
eslint-plugin-playwright2.10 (playwright.configs['flat/recommended']) + Prettier 3. - Config lives in
playwright.config.tsviadefineConfig+devices. Useexpect,test,Locator,Page,APIRequestContextfrom@playwright/test— no third-party assertion libs. - Reporters:
html(local),blob(sharded CI, merged withnpx playwright merge-reports),githuborlistin CI logs.
Project conventions
e2e/
fixtures.ts # test.extend — the ONLY import specs use for test/expect
pages/ # Page Objects: login.page.ts, checkout.page.ts
specs/ # *.spec.ts, one journey per file
setup/auth.setup.ts # storageState generation (setup project)
data/ # factories/builders for seed payloads
playwright.config.ts
- Spec files:
*.spec.ts, named by journey (checkout.spec.ts), not by page. - Every spec imports from
../fixtures, never directly from@playwright/test, so fixtures/POMs are always in scope. - Page Objects end in
.page.tsand exposeLocators + intent methods; setup files end in.setup.ts. baseURLset in config → use root-relative paths (page.goto('/checkout')), never hardcoded hosts.- Prettier formats; ESLint enforces
no-wait-for-timeout,no-force-option,no-conditional-in-test,expect-expect,no-focused-test,no-skipped-test(warn),valid-expect.
What to E2E
E2E is the top of the pyramid: few, expensive, high-signal. Cover critical revenue/trust journeys only — the flows that lose money or users if broken.
- Test: signup/login, checkout/payment, core CRUD of the primary entity, permission boundaries (user A cannot see B's data), and one happy path per critical feature.
- Do NOT E2E: field-level validation, every error message, every branch, formatting, computed logic. Push those down to unit/integration/component tests where they run in milliseconds.
- One assertion focus per test — a journey has a clear success state. If a spec needs 8
expects across 5 pages, it is probably two tests or belongs lower in the pyramid. - Prefer component tests (
@playwright/experimental-ct-*) or API tests for anything that does not require the full stack rendered in a real browser. - Target: the whole E2E suite runs in single-digit minutes on CI. If it does not, you are over-E2E-ing.
Selectors
Resolve elements the way a user or assistive tech does. Selectors must survive refactors of DOM structure, class names, and styling.
Priority order:
getByRole('button', { name: 'Place order' })— role + accessible name. Default choice; also asserts accessibility.getByLabel('Email'),getByPlaceholder,getByText,getByAltText,getByTitlefor the remainder.getByTestId('cart-total')when there is no stable user-facing text (adddata-testidin app code; configure viause: { testIdAttribute: 'data-testid' }).
- NEVER CSS/XPath tied to structure or styling:
page.locator('.btn-primary'),div > span:nth-child(3),//button[2]. These break on cosmetic changes and read as noise. - Chain and filter instead of clever selectors:
getByRole('listitem').filter({ hasText: 'Pro plan' }).getByRole('button', { name: 'Remove' }). - Scope to a region first:
const row = page.getByRole('row', { name: 'INV-042' }); await row.getByRole('button', { name: 'Pay' }).click();. - Expect one match — a strict-mode violation (multiple matches) is a real bug in your selector, not something to paper over with
.first(). Use.first()/.nth()only for genuinely repeated UI. - Assert accessibility structure with
await expect(page.getByRole('main')).toMatchAriaSnapshot(...)(YAML snapshot) instead of brittle DOM checks.
Stability = zero flakiness
Playwright auto-waits for actionability (attached, visible, stable, enabled, receives events) before every action, and web-first assertions retry until they pass or time out. Lean entirely on that.
- Assert with retrying matchers:
await expect(locator).toBeVisible(),.toHaveText(),.toHaveValue(),.toHaveCount(),.toHaveURL(),.toBeEnabled(). They poll — no manual waiting. - NEVER
await page.waitForTimeout(...)/sleep. A fixed sleep is either a hidden race or wasted time; it is banned by lint. Replace with an assertion on the state you were waiting for. - Assert on observable state, never on timing. Wait for the success toast/heading/URL, not "3 seconds after clicking".
- Never use
.click({ force: true })to defeat actionability checks — a non-actionable element is a bug or a wrong selector. Fix the selector or wait for the real precondition. - Do not assert on non-retrying values mid-flight:
expect(await locator.textContent()).toBe(...)reads once and flakes. Useawait expect(locator).toHaveText(...). - For non-DOM conditions use
expect.poll(async () => ...)orawait expect(async () => { ... }).toPass()— never a while-loop with sleeps. - Control time with
page.clock(install →fastForward/setFixedTime) instead of waiting for real timers/animations/polling intervals. - Wait for network you triggered via the response you care about:
const resp = page.waitForResponse('**/api/orders'); await placeOrder(); await resp;— but prefer asserting the resulting UI.
Isolation and parallelism
Each test gets a fresh BrowserContext (clean cookies/storage) for free. Keep that isolation; run fullyParallel: true.
- Every test is independent and idempotent: it can run alone, in any order, repeated, and in parallel with others. No test depends on another having run first.
- No shared mutable state across tests — no module-level counters, no "created in test A, used in test B".
beforeAllmay set up read-only fixtures only. - Never
test.describe.serialto share state or paper over ordering; reserve serial mode for genuinely sequential UI (a multi-step wizard within one journey) and know it disables parallelism for that block. - Each parallel worker must operate on its own data namespace. Derive uniqueness from
test.info().parallelIndexor a UUID so workers never collide on the same record. - Do not depend on DB row order, auto-increment IDs, or "the first item" — another worker may have inserted rows.
- Authentication: generate
storageStateonce in asetupproject and reuse it; do NOT log in through the UI in every test.
// playwright.config.ts
projects: [
{ name: 'setup', testMatch: /.*\.setup\.ts/ },
{ name: 'chromium', use: { ...devices['Desktop Chrome'], storageState: 'e2e/.auth/user.json' },
dependencies: ['setup'] },
]
Test data
Seed and clean up through the fast path (API/DB/fixtures), never by clicking through the UI to arrange state.
- Arrange state via
request(the built-inAPIRequestContext) or a seed helper hitting your backend/test-only endpoint — reserve the browser for the behavior under test. - Make every record unique per test:
email: \u+${crypto.randomUUID()}@test.dev``. Never reuse a fixed email/SKU/slug across tests. - Own the lifecycle in a fixture: create in setup, tear down in the fixture's cleanup phase (after
use) so it runs even on failure.
export const test = base.extend<{ order: Order }>({
order: async ({ request }, use) => {
const order = await createOrder(request, { sku: `SKU-${randomUUID()}` });
await use(order);
await deleteOrder(request, order.id); // runs even if the test fails
},
});
- Prefer per-test creation over one big shared seed script — shared seed data recreates order-dependence and cross-test coupling.
- Point tests at a dedicated, disposable test environment/database. Never run destructive E2E against production or a shared dev DB.
Reliability and CI
Deterministic locally, resilient and diagnosable in CI.
- Config baseline:
export default defineConfig({
testDir: './e2e',
fullyParallel: true,
forbidOnly: !!process.env.CI, // fail if test.only is committed
retries: process.env.CI ? 2 : 0, // retries mask flake locally — keep them CI-only
workers: process.env.CI ? '50%' : undefined,
reporter: process.env.CI ? [['blob'], ['github']] : [['html']],
use: {
baseURL: process.env.BASE_URL ?? 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
webServer: { command: 'pnpm start', url: 'http://localhost:3000', reuseExistingServer: !process.env.CI },
});
trace: 'on-first-retry'gives a full time-travel trace for every CI failure at near-zero steady-state cost. Open withnpx playwright show-trace.- Shard across CI machines (
--shard=1/4) with theblobreporter, thennpx playwright merge-reports --reporter=html ./blob-reportfor one combined HTML report. - Mock ALL third-party/external services (payment gateways, email, analytics, maps, LLM APIs). Real third parties make tests slow, flaky, rate-limited, and non-deterministic.
await page.route('**/api.stripe.com/**', route =>
route.fulfill({ status: 200, json: { id: 'pi_test', status: 'succeeded' } }));
- Record/replay your OWN backend with
page.routeFromHAR('api.har', { url: '**/api/**', update: false })when a real backend is unavailable; regenerate HARs deliberately. - Retries reduce noise; they do not license ignoring flake. A test that only passes on retry is a defect — quarantine with a tracked
test.fixmeand fix the root cause. Never@ts-ignoreor loosen an assertion to make red go green. - Fail the build on
test.only(forbidOnly) and on unexpectedconsole/pageerrorin critical specs (assert clean console where it matters).
Page Object Model
Encapsulate page structure and interactions so specs read as user intent and selectors live in one place.
export class CheckoutPage {
constructor(private readonly page: Page) {}
readonly total = this.page.getByTestId('cart-total');
readonly placeOrder = this.page.getByRole('button', { name: 'Place order' });
async goto() { await this.page.goto('/checkout'); }
async pay(card: Card) {
await this.page.getByLabel('Card number').fill(card.number);
await this.placeOrder.click();
}
}
- POMs expose
Locatorproperties (lazy, re-evaluated per use) and action methods; they do NOT expose raw CSS strings. - Keep assertions in the spec, not buried in POM methods — the POM does, the test verifies. (A small number of self-checking helpers like
expectLoaded()are fine.) - Inject POMs as fixtures so specs get them typed and ready:
checkout: async ({ page }, use) => use(new CheckoutPage(page)). - No
waitForTimeout, no test logic, no data seeding inside POMs — those belong in fixtures.
Testing
- Framework: Playwright Test (
@playwright/test). Structure withtest,test.describe,test.step(label multi-action phases so traces/reports read as a narrative). - Author selectors with
npx playwright codegen, but hand-clean generated code to role-based locators and web-first assertions before committing. - Debug with UI mode (
npx playwright test --ui) and the Trace Viewer; use--last-failedand--only-changedto iterate fast;--repeat-each=20to hunt flake. - Tag suites for selective runs:
test('...', { tag: '@smoke' }, ...)then--grep @smoke/--grep-invert. - Visual regression via
await expect(page).toHaveScreenshot()withmaxDiffPixelRatio; mask dynamic regions (mask: [locator]) and disable animations. Commit baselines per platform; generate them in CI, not on a dev laptop. - Keep unit/integration/component tests as the bulk of coverage; E2E only asserts the assembled system.
Security
- Never commit credentials. Read secrets from env (
process.env); load local.envvia CI secrets or a git-ignored file.storageStateJSON and*.harcan contain live tokens/PII — git-ignoree2e/.auth/and scrub HARs. - Do not point authenticated, data-mutating tests at production. Use a dedicated test tenant/environment.
- Test authorization explicitly: a low-privilege
storageStatemust get 403/hidden UI on privileged actions — assert the negative, do not assume it. - Set
use: { ignoreHTTPSErrors: false }outside of controlled local self-signed setups; do not blanket-disable TLS verification. - When exercising untrusted content, keep default context isolation; never launch with
--disable-web-security/--no-sandboxto make a test pass. - Rotate and scope test accounts to the test environment; never reuse real user credentials.
Do
- Import
test/expectfrom yourfixtures.ts; build POMs and data as fixtures with automatic teardown. - Use role/label/text locators first,
getByTestIdas the deliberate fallback. - Assert exclusively with retrying web-first matchers (
toBeVisible,toHaveText,toHaveURL,toHaveCount). - Seed and clean state via
request/API with per-test unique data; log in once via asetupproject +storageState. - Run
fullyParallel, keep tests order-independent, mock every external service, enabletrace: 'on-first-retry'. - Scope locators to a region, use
test.stepfor readable traces, tag suites (@smoke) for fast subsets.
Avoid
page.waitForTimeout(...)/sleep→ replace with a web-first assertion on the awaited state.- Brittle selectors (
.locator('.btn'),nth-child, XPath) →getByRole/getByLabel/getByTestId. expect(await locator.textContent()).toBe(...)(reads once, flakes) →await expect(locator).toHaveText(...)..click({ force: true })and manualwaitForSelector→ fix the selector / precondition and let auto-wait handle it.- Deprecated/legacy APIs: element handles
page.$/page.$$,Locator.ariaRef()(removed 1.60), contextvideosPath/videoSize(removed 1.60) → locators + configuse.video. - Logging in through the UI in every test →
storageStatefrom a setup project. test.describe.serialto share state, cross-test data reuse, relying on ID/order → independent, uniquely-seeded tests.- Hitting real Stripe/email/analytics/LLM endpoints →
page.route/routeFromHARmocks. - E2E-ing validation/edge cases/every branch → push down to unit/component tests.
- Retrying or loosening assertions to hide flake → root-cause it or
test.fixmewith a tracked issue.
When you code
- Ship small diffs: one journey per spec, one concern per PR. Do not add fixtures/POMs "for later" — add them when a second test needs them.
- Before proposing a test, confirm it belongs at E2E level (critical journey, needs the full stack). If a unit/component/API test covers it faster, write that instead and say so.
- After writing, run
npx playwright test --only-changedlocally and--repeat-each=5on new specs to catch flake before it reaches CI; runtsc --noEmit, ESLint, and Prettier. - Always attach a failing trace when reporting a failure; never mark a test green by weakening its assertion.
- Ask before: adding a test-only backend endpoint or seed hook, introducing visual-regression baselines, changing global config (
retries,workers,testIdAttribute), or pointing tests at a new environment. - If a requested behavior needs real third-party calls, flag it and propose a mock/contract-test boundary instead of wiring the live service.
Drop it in your repo
Save these rules as AGENTS.md, CLAUDE.md, .cursorrules, .windsurfrules or .github/copilot-instructions.md — your agent instantly codes to the same standard on Playwright 1.61 · TypeScript 6.0 · Node.js 24 LTS.