Evidence Journey Diagnosis Playbook Review
Private review copy

AI coding workflow coaching report.

A mobile-readable visual brief distilled from the PKM archaeology package. It keeps the core lesson visible: the tools did useful work, but durable progress came from source-of-truth discipline, review closure, and bounded agent orchestration.

Evidence base

Full-corpus pass, high-impact reads.

The source package inventories everything and deep-reads the documents that explain the pattern. This page shows the decision surface, not the private transcript bodies.

01 / Ledger first

The report starts with source counts, not conclusions.

PKM Agent verification, transcript index counts, Wiki surfaces, postmortems, and key operating docs were represented in a single JSONL ledger. That ledger prevents the old failure mode: agents rediscovering the same terrain and smoothing uncertainty away.

SOURCE_LEDGER.jsonl METHOD.md EVIDENCE_INDEX.md
02 / Classified

Nine major work classes were sampled.

Planning, PRD, implementation, GitHub/CI, multi-agent, Markdown, waste, success, and review workflows all have at least five ledger examples.

03 / Privacy

Cloud copy is intentionally lighter.

This public-by-link page excludes raw transcript bodies, tarballs, logs, private ledger rows, and local file bodies. It is for review on the go, not publication.

noindex nofollow
04 / Candid

The answer is not "the model failed."

The report treats the failure as an operating-model issue: orchestration sprawl, context rehydration, and missing stop protocols.

Learning journey

Ali was learning the software loop while shipping inside it.

The corpus reads like a founder learning the mechanics of software development by forcing real artifacts through agents, repos, PRs, and docs.

Mental model

From chat intent to system of record.

The evolution is not just product progress. It is a shift from "ask the model" to "make the repo, Wiki, issue, PR, dashboard, and ledger carry the truth."

plans PRDs GitHub CI/CD Markdown OS
01

Product instinct becomes PRD.

Ambiguous ideas become acceptance behavior, non-goals, and operating assumptions that agents can execute.

02

Markdown becomes control plane.

Plans, source ledgers, handoffs, and decisions start carrying state across sessions instead of chat memory.

03

GitHub becomes scoreboard.

PRs, review comments, CI, and deploy previews create the real "done or not done" surface.

04

Agents get lane ownership.

Claude Code and Codex work better when one implements and one reviews, not when both coordinate independently.

05

Review closure becomes a habit.

Actionable comments are fixed, replied to, resolved, and reflected in dashboard truth after merge.

Root-cause lens

The waste came from the operating model.

The tools produced real work. The expensive part was making them repeatedly rebuild context, coordinate through overlapping substrates, and act when they should have stopped.

primary diagnosis

Not bad reasoning. Bad coordination contract.

When the prompt carries the database, each new agent must reload the world. When three orchestration layers all think they own the repo, the system creates motion without a single scoreboard.

What worked

  • Durable Markdown plans and source ledgers.
  • Repo-grounded archaeology after crashes.
  • Project-summary packages with path-traceable claims.
  • Data audits before product claims.
  • Review closure before merge and truth-up after merge.

What hurt

  • Mega-prompts and repeated context rehydration.
  • Multiple orchestration substrates in one repo.
  • No no-op or blocked protocol.
  • Worktree and repo sprawl during active learning.
  • Green CI treated as readiness without human inspection.
Tool posture

Use Claude Code and Codex as different muscles.

This is corpus-derived, not a permanent product claim. Recheck current docs before making public guidance.

Claude Code

Long local implementation.

  • Best for broad repo navigation and multi-file edits.
  • Useful for scaffolding, refactors, and local implementation loops.
  • Powerful with subagents when lane ownership is explicit.
  • Needs bounded context and clear stop conditions.
Codex

Repo-grounded recovery and review.

  • Strong for archaeology, synthesis, and skeptical second passes.
  • Useful for PR review closure, CI triage, and crash recovery.
  • Works well when the source of truth is a repo file, issue, PR, or ledger.
  • Should not be made a second independent coordinator for the same repo lane.
Builder playbook

A small operating system for better AI coding.

The playbook is the practical output: how to brief agents, bound work, inspect outputs, and decide when to stop.

Reusable brief

Start every serious session here.

Objective:
[One concrete outcome.]

Repo / folder:
[Exact path.]

Current source of truth:
[Plan, PRD, issue, PR, dashboard, or Wiki page.]

Allowed scope:
[Files/folders the agent may read and edit.]

Validation:
[Commands, browser checks, data samples, or review steps.]

Stop condition:
[When to stop and hand back.]

If blocked:
Return BLOCKED with exact missing input, evidence, and next action.
01
One current plan. Archive stale plans or explicitly mark them superseded.
02
One orchestration owner. Use helpers, not multiple captains.
03
Ledger before synthesis. Do not let a broad corpus become vibes.
04
CI plus inspection. Green checks are necessary, not sufficient.
05
No-op is allowed. Agents must say when the right move is not acting.
Review note

Good for mobile review. Not ready for public publishing.

This Cloudflare copy is a visual review layer over the private package. It intentionally leaves out raw transcripts, private file bodies, and detailed ledger rows. Treat it as a portable founder review surface, then make promotion/sanitization decisions separately.

Cloudflare Pages X-Robots-Tag private review