PRD-to-PR pipeline | Agentic Workflow

The default shipping path: human writes a PRD, agents do the rest, human approves a screenshot. This page describes each stage's input, output, gate, and storage location.

Pipeline overview

The runner persists every transition into prd_to_pr_runs (SQLite at .compound-state/agent-service.db) so a process restart can read the row's current_stage and resume — except for spawned claude -p children, which are killed on restart and do not auto-resume (see "Outstanding" callout above; fix is detached: true in spawn options).

Stage 0 — Intake normalization

The orchestrator may receive a rough chat request, a Cursor plan under .cursor/plans/, or an already-written wiki PRD. Before spawning implementation agents it normalizes the input into a portable PRD:

If the input is .cursor/plans/<feature>.md, convert it into wiki/plans/<feature>-YYYY-MM-DD.md.
If the input is chat-only, draft the PRD in wiki/plans/ before any spec or code work.
Preserve the original source path in frontmatter sources:.
Assign a stable feature slug used by the spec, worktree, branch, and dossier.
Create an isolated worktree/branch automatically for feature execution; no separate human branch approval is required for PRD-driven workflows.

Stage 0 is allowed to ask a blocking question only when the requested product behavior is ambiguous enough to change the spec. It should not ask for permission to create the worktree.

Stage 1 — PRD

A wiki/plans/<feature>-YYYY-MM-DD.md file. Frontmatter must declare implementation_status: planned (or similar) and the body must open with a visible status callout per SCHEMA.md.

A good PRD answers:

What's the user-facing change?
What's the success metric (and the failure metric)?
What's explicitly out of scope?

The PRD is the only place a human is required. Everything downstream consumes it.

Stage 2 — Spec

The Architect agent reads the PRD and emits wiki/specs/<feature>.md. The spec is machine-checkable Gherkin-style acceptance:

## Acceptance Contract

| ID     | Requirement                                        | Status          | Evidence | Notes |
| ------ | -------------------------------------------------- | --------------- | -------- | ----- |
| AC-001 | Best trial is shown above the leaderboard.         | not-implemented |          |       |
| AC-002 | Promote button is disabled for in-progress trials. | not-implemented |          |       |

Feature: Lab promotion banner
  Scenario: AC-001 Best trial is shown above the leaderboard
    Given a service area with at least one completed lab trial
    And one trial is pinned in ServiceAreaBestLabScenario
    When the user opens /admin/optimization-lab/<id>
    Then the BestTrialBanner is visible
    And it shows the pinned trial's metrics

  Scenario: AC-002 Promote button is disabled for in-progress trials
    Given a trial whose solution.status is solving_active
    Then the Promote button is disabled
    And hovering shows "Trial still solving"

The spec is the coordination object between agents. It bounds what counts as done. Every non-deferred AC-* row must appear in at least one generated test file.

Stage 3 — Test stubs

The Test-writer agent reads the spec and writes failing tests:

Vitest for unit + integration coverage of resolvers, services, hooks.
Playwright for the acceptance scenarios named in the spec.

The pre-commit assertion: every test stub must run and fail with an informative error (typically "X is not implemented"). This catches accidentally tautological tests before the editor stage starts.

The acceptance assertion: every non-deferred AC-* row from the spec must be mapped to at least one generated test. If the test-writer drops an accepted requirement, stage 3 fails before implementation starts.

Stage 4 — Implementation

The Editor agent (cheaper Sonnet model) iterates on the diff until the test stubs pass. Constraints:

Worktree per feature (./scripts/git/worktree-add.sh <name> <branch>) — never edit main directly for PRD-driven work. The pipeline may create this worktree automatically.
No new files outside what the spec implies (no premature abstractions).
No comments unless the why is non-obvious.
React-hook-form + Zod + Form* primitives for any form work; resolver layout per .cursor/rules/appcaire-monorepo.mdc.

The editor is allowed to fail. If after MAX_ITERATIONS the spec's tests aren't green, it surfaces the failure to the orchestrator instead of grinding forever.

Stage 5 — Self-review

The orchestrator runs the appropriate subagents in parallel:

resolver-reviewer — if apps/dashboard-server/src/graphql/resolvers/ was touched.
dashboard-reviewer — if apps/dashboard/src/ was touched.
perf-reviewer — if any list resolver, field resolver, or loop-over-Prisma was touched.

Findings are categorised P1 (block) / P2 (must address) / P3 (acknowledge). The Editor re-enters with the findings as an additional input.

Stage 6 — Verification

The Verifier agent runs:

yarn type-check, yarn lint, yarn test:dashboard-stack.
The spec-named Vitest files again, so earlier editor convergence cannot go stale before approval.
A playwright test pass that captures the failure dossier (trace + screenshot + console + DOM snapshot + machine-readable summary).
The dossier is written to docs/dossiers/<feature>/ and committed alongside the diff.

This stage cannot be short-circuited. Without a dossier and complete acceptanceCoverage, the PR cannot merge. See verification-and-evidence.md.

Stage 7 — Reviewer loop

After git push, the Reviewer-feedback agent polls gh api repos/{org}/{repo}/pulls/<n>/comments until Codex / CodeRabbit have completed their review. Any unaddressed P1 or P2 comment re-enters the Editor stage. This codifies the standing memory rule on Codex review polling.

Stage 8 — Merge + evidence

When (a) CI is green, (b) the dossier is present, (c) acceptanceCoverage.complete=true, and (d) the reviewer loop is settled, the orchestrator queues the PR with gh pr merge --auto --squash against the merge group. After merge, the screenshot from the dossier is sent to the human channel (Telegram via interface-agent).

The human's job at this point is to look at the screenshot and acceptance matrix, then either approve or reject. A run with missing accepted-scope evidence is incomplete_scope, not done.

Storage layout

<repo>/
├── wiki/plans/<feature>-YYYY-MM-DD.md      ← PRD (human-authored)
├── wiki/specs/<feature>.md                 ← Architect output (Gherkin)
├── docs/dossiers/<feature>/                ← Verifier output
│   ├── trace.zip
│   ├── screenshot.png
│   ├── console.log
│   └── summary.json                        ← Machine-readable test outcomes + acceptanceCoverage
└── .compound-state/agent-service.db        ← Throughput + cost metrics per stage

Everything except the dossier is committed to the feature branch. The dossier is committed for auditability and to make rebuild deterministic.

Cross-references

Vision and mandate — the four commitments this pipeline implements.
Agent roles and model routing — which model each stage uses.
Spec as contract — why the spec is the coordination object.
Verification and evidence — dossier format.
Reviewer feedback loop — Codex polling.
Throughput and business signals — features/sec/token measurement.
Merge flow & PR velocity — the merge queue this pipeline targets.