Playwright: E2E Testing and Agent Verification
Playwright: E2E Testing and Agent Verification
AI coding tools ship code fast. Without an external check, "done" just means the model said so. Steve Kinney's Playwright course on Master.dev frames the fix clearly: build verification infrastructure — lint and types for static proof, Playwright for behavioral proof. This post distills what I took from that course into one reference page.
Where Playwright sits in the pyramid
Unit tests (Jest) and component tests (React Testing Library) catch building blocks in isolation. Playwright sits at the top of the testing pyramid — fewer tests, slower runs, highest confidence.
/\
/ \ E2E — Playwright
/----\
/ \ Integration — RTL + API mocks
/--------\
/ \ Unit — Jest
/--------------\
The more your tests resemble the way your software is used, the more confidence they can give you.
Playwright does not replace Jest or RTL. It closes the gap they leave open: wrong redirects after login, API contract drift in production builds, layout regressions, and third-party auth mis-wiring.
What you get: real browsers (Chromium, Firefox, WebKit), auto-waiting locators, traces and screenshots on failure, and network interception for deterministic runs.
Getting started
npm init playwright@latest # or add to an existing project
npx playwright install # browser binaries
npx playwright test
npx playwright test --ui # visual runner — best way to learn
npx playwright codegen http://localhost:3000
Key files: playwright.config.ts, tests/, and npm scripts. Most configs use webServer to start the dev server before tests and reuseExistingServer locally so you are not spawning duplicate processes.
A minimal test follows the same rhythm as RTL: navigate, interact, assert.
import { test, expect } from '@playwright/test';
test('sign in shows welcome message', async ({ page }) => {
await page.goto('/');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page.getByText('Welcome')).toBeVisible();
});
The accessibility tree (and why it matters)
Playwright queries the browser's accessibility tree — the same structure assistive technology uses. Tests find elements by role, label, and name, not arbitrary CSS.
Implication: semantic HTML (button, label, heading) produces stable tests. div with an onClick handler produces brittle ones.
Locators auto-wait until an element is actionable (visible, stable, enabled). In most cases you can skip manual sleep() calls.
Locator priority
Same philosophy as React Testing Library — stay user-facing:
getByRole— buttons, links, headings, checkboxes, textboxesgetByLabel— form fieldsgetByPlaceholdergetByTextgetByAltText/getByTitlegetByTestId— last resort
await page.getByRole('listitem').filter({ hasText: '1984' }).click();
await page.getByRole('navigation').getByRole('link', { name: 'Home' }).click();
Stay high in the hierarchy. getByRole('listitem') beats chaining CSS into nested divs. When duplicate text appears on the page, narrow with .filter() or scope to a parent.
Codegen records clicks and typing into test code — useful as a starting point, but always edit the output. Generated selectors are often brittle.
Configuring projects and dev servers
playwright.config.ts is where you wire the environment:
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests',
fullyParallel: true,
retries: process.env.CI ? 2 : 0,
use: {
baseURL: 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
},
webServer: {
command: 'npm run dev',
url: 'http://localhost:3000',
reuseExistingServer: !process.env.CI,
},
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
],
});
Projects let you run subsets — different browsers, viewports, or auth profiles. Make sure baseURL and webServer.url match.
Authentication without logging in every test
Logging in before every test is slow and brittle. Playwright's answer is storageState: serialize cookies and localStorage once, reuse everywhere.
// tests/auth.setup.ts
import { test as setup, expect } from '@playwright/test';
const authFile = 'playwright/.auth/user.json';
setup('authenticate', async ({ page }) => {
await page.goto('/login');
await page.getByLabel('Email').fill('test@example.com');
await page.getByLabel('Password').fill('secret');
await page.getByRole('button', { name: 'Sign in' }).click();
await page.waitForURL('/dashboard');
await page.context().storageState({ path: authFile });
});
// playwright.config.ts — setup project runs first
projects: [
{ name: 'setup', testMatch: /.*\.setup\.ts/ },
{
name: 'chromium',
use: { storageState: 'playwright/.auth/user.json' },
dependencies: ['setup'],
},
],
Add playwright/.auth to .gitignore — those files contain session tokens.
OAuth and Google login are painful in CI. Prefer test accounts, API-based auth, or a test-environment bypass. Wait for auth to finish with waitForURL or an assertion on a post-login element — cookies often set across redirects.
For tests that mutate shared server state in parallel, use one account per worker. For read-only suites, a single shared account is fine.
Per-test override when you need a different role:
test.use({ storageState: 'playwright/.auth/admin.json' });
Network isolation: HAR files and route mocking
External APIs are slow, flaky, and non-deterministic. Playwright offers two main strategies.
HAR record and playback
A HAR (HTTP Archive) is a snapshot of network traffic. Record once, replay in CI forever.
// Record (run once, then commit the HAR)
await page.routeFromHAR('hars/search-books.har', {
url: '**/api/**',
update: true,
});
// Replay (CI and local)
await page.routeFromHAR('hars/search-books.har', {
url: '**/api/**',
update: false,
});
CLI alternative:
npx playwright open --save-har=example.har --save-har-glob="**/api/**" https://example.com
Route interception
For fine-grained control — especially error states a HAR cannot represent:
// Full mock — no network call
await page.route('**/api/v1/fruits', async (route) => {
await route.fulfill({ json: [{ name: 'Strawberry', id: 21 }] });
});
// Fetch real response, patch the body
await page.route('**/api/v1/fruits', async (route) => {
const response = await route.fetch();
const json = await response.json();
json.push({ name: 'Loquat', id: 100 });
await route.fulfill({ response, json });
});
// Simulate failure
await route.fulfill({ status: 404, body: 'Not found' });
| Approach | Best for |
|---|---|
| HAR | Many endpoints, realistic traffic snapshot |
route.fulfill | Single endpoint, error states, speed |
route.fetch + patch | Real headers with tweaked JSON body |
Visual regression, traces, and debugging
Screenshots
await expect(page).toHaveScreenshot('homepage.png');
await expect(page.getByRole('main')).toHaveScreenshot('main-panel.png');
Update baselines after intentional UI changes: npx playwright test --update-snapshots.
Playwright compares runtime DOM renders — not Figma files. Use component tests for design-system checks; E2E screenshots for integrated pages.
Traces
Traces are the highest-value artifact when a test fails — for you and for an AI agent.
npx playwright show-trace path/to/trace.zip
# or upload to trace.playwright.dev
A trace bundles DOM snapshots, network log, console output, a screenshot filmstrip, and the source line for each action. On CI, trace: 'on-first-retry' keeps overhead low while capturing failures.
What to feed an agent on failure: test name, assertion error, trace path, and the relevant spec file — not the entire monorepo.
The agent verification loop
The course's through-line is turning tests into an external loop agents cannot argue with:
- Code changes (human or agent)
npx playwright testruns (hook or CI)- On failure → trace + stderr → agent prompt
- Agent patches → re-run until green
Success criteria must be objective: all tests pass, not "looks good."
Git hooks
Husky + lint-staged is a common pattern:
- Pre-commit: lint, format, smoke tests (
--grep @smoke) - Pre-push or CI: full Playwright suite
Same loop for humans and agents. Fast feedback locally; thorough proof before merge.
Playwright MCP vs CLI
| CLI | MCP (Playwright server) | |
|---|---|---|
| Best for | CI, hooks, deterministic runs | Agent exploring a live browser |
| Trust level | High — same result every time | Agent-driven — verify with CLI |
npx playwright init-agents scaffolds three agents (official docs):
- Planner — explores the app → Markdown test plan in
specs/ - Generator — plan → executable tests
- Healer — runs the suite and repairs failing tests
Chained: Planner → Generator → Healer → CLI verifies.
Rule I took from the course: MCP explores and drafts. CLI green is the merge gate.
The full stack at a glance
| Layer | Tool |
|---|---|
| Static analysis | ESLint, TypeScript |
| Unit / integration | Jest, React Testing Library |
| End-to-end | Playwright |
| Auth speed | storageState + setup project |
| Network stability | HAR + page.route |
| Debug / agent context | Traces, screenshots, HTML report |
| Enforcement | Git hooks + CI |
| Agent scaffolding | init-agents (Planner, Generator, Healer) |
What I am doing next
- Add Playwright to a real app (not just the course demo)
- Write three to five smoke tests on critical user paths
- Add
auth.setup.tswherever login is required - Enable
trace: 'on-first-retry'in CI - Wire a pre-push hook for the full suite
Senior work was never just typing code. It is architecting systems you can trust — and Playwright is one of the pieces that makes that possible when agents are in the loop.
Further reading
- Playwright: Automated Testing & AI Workflows — Steve Kinney, Master.dev
- Playwright documentation
- Locators
- Authentication
- Mock APIs / HAR
- Trace viewer
- Test agents