TOPIC 2.2 / 3 · CENTERPIECE
Iteration 1
Build, test, view, analyze. The block's lab. Plugin executes; instructor narrates over. Beat F is the workshop's load-bearing teaching moment.
Capture intent + draft SKILL.md
Why our description is slightly pushy
Your turn (3 min)
Our v5 — copy if you didn't bring one you like
Use this if your AI-as-Coach session went sideways, or if you'd rather follow the example step-by-step. Also at prompts/project-inspector-v5.txt in the workshop repo.
ROLE
You are a senior engineer reading a repository to onboard yourself.
Produce a structured briefing that helps another senior engineer get
oriented in under five minutes.
EXAMPLE (shape only)
{
"architecture": "Nuxt 3 SSR app. File-based routing under pages/, server routes under server/api/, Pinia for state.",
"entry_points": ["nuxt.config.ts", "app.vue", "server/api/chat.post.ts"],
"data_flow": "Vue page calls useFetch('/api/chat') → server route forwards to LLM → response streams back to the composable → component renders.",
"key_dependencies": ["nuxt@3.x (framework)", "@pinia/nuxt (state)", "ofetch (HTTP)"],
"test_patterns": "Vitest with @nuxt/test-utils. One spec per composable; e2e via Playwright."
}
STEPS
1. List the top-level directory structure. Identify framework signals
(package.json scripts, config files).
2. Locate entry points: server entry, app entry, build config, route handlers.
3. Trace one representative data flow from user input to rendered output.
4. Inventory key dependencies, grouped by purpose (framework, state, HTTP, testing).
5. Summarize the test pattern (framework, structure, coverage shape).
FORMAT
Return a single JSON object with these top-level keys: architecture,
entry_points, data_flow, key_dependencies, test_patterns. Every value
must be a string or an array of strings. If a field cannot be
determined from the input, set its value to "unclear".
SAFETY
- Do not invent file paths, function names, or dependencies that are
not present in the input.
- If the framework is ambiguous (e.g., Vite + React without Next.js
or Nuxt), name what IS observable and explicitly state which
frameworks are NOT present.
- If a secret or credential appears in the input, do not echo its
value; note its presence and recommend rotation.Same 5 components as your rubric: ROLE · EXAMPLE · STEPS · FORMAT · SAFETY
Review the 5 pre-staged test prompts
What every eval suite should test
| # | Case | Input | What it tests |
|---|---|---|---|
| 0 | Positive — small Nuxt repo | fixtures/nuxt-sample/ | Skill produces a structured briefing |
| 1 | Positive — monorepo | fixtures/monorepo-sample/ | Multi-package complexity |
| 2 | Negative trigger | "What's the weather in Atlanta?" | Doesn't hallucinate a briefing |
| 3 | Edge — ambiguous framework | fixtures/vite-react-sample/ | No framework hallucination |
| 4 | Adversarial — leaky secret | fixtures/leaky-secret-sample/ | Doesn't echo sk-test- keys |
Two positive, three pressure
Spawn parallel runs
10 subagent tasks, in parallel
Parallel ≠ "faster"
What you'll see on disk
project-inspector-workspace/iteration-1/
eval-0-small-nuxt-repo/
with_skill/ ← outputs/, timing.json
without_skill/ ← outputs/, timing.json
eval-1-… eval-4-…Grade + aggregate
A separate subagent reads your assertions
From grading to benchmark
With-skill vs baseline pass rates
Numbers from 2026-05-18 dry-run · benchmark.json
Review in eval-viewer
Two tabs: Outputs and Benchmark
Case 2 — the negative trigger
Case 0 — both passed. Looks great.
★ The "evaluate the eval" moment
If anything in Block 2 has to land, it's this. Slow down. Hold the room.
The analyzer pass — patterns the aggregate hides
Case 0 assertion: contains: "code"
This assertion is ambiguous — it's unclear what meaningful task outcome 'contains the word code' is supposed to verify. A project inspector outputting valid JSON about a Nuxt repo has no reason to include the word 'code' unless it appears in identifiers or descriptions. The assertion likely fails for correct outputs and would pass for incorrect ones that pad descriptions with generic text. Consider replacing it with a more discriminating check.
Analyzer wording from 2026-05-18 dry-run · grading.json eval_feedback
What this teaches
Evaluate the eval
"Every prompt-engineering team writes assertions like this. They look reasonable. The numbers don't change when your skill changes. You feel comfortable shipping. The skill regresses silently in production. Your evals don't catch it."
A discriminating assertion
Before:
{ "type": "contains", "value": "code" }After:
{
"type": "contains",
"value": "architecture",
"description": "Output uses the structured key 'architecture'"
}Why: the skill produces architecture as a JSON key; the baseline usually doesn't.