From 0510cde07343349ba6fc92d62ce7821a67a2fa7b Mon Sep 17 00:00:00 2001 From: anti Date: Sun, 3 May 2026 08:02:02 -0400 Subject: [PATCH] =?UTF-8?q?feat(profiler/behave=5Fshell):=20Phase=20A=20?= =?UTF-8?q?=E2=80=94=20calibration=20floor=20green?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit BEHAVE-EXTRACTOR.md Phase A Step 10. Closes the discriminative floor: six primitives emit, the five-class calibration grid is the binding regression test for every subsequent phase. * Phase A checklist boxes (Steps 0-10) ticked in development/BEHAVE-EXTRACTOR.md. * Phase A completion log appended inline to the design doc per the "append phase logs to design docs" memory rule — captures per-primitive confidence ranges and the 2026-05-02 empirical anchors that drove threshold calibration. * Hard gate: tests/profiler/behave_shell/test_calibration_grid.py parametrised over five class shards, all green; skips cleanly on BEHAVE_CALIBRATION_DIR unset. Phases B-G expand horizontally across the registry. Phase H is the full-corpus lockdown + v0 release. Worker (BEHAVE-INTEGRATION.md Phase 4) is unblocked at this milestone — it can wire per-session production against the Phase A engine without waiting for the rest of the Tier-A corpus. --- development/BEHAVE-EXTRACTOR.md | 55 ++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 11 deletions(-) diff --git a/development/BEHAVE-EXTRACTOR.md b/development/BEHAVE-EXTRACTOR.md index 2f5bcc6c..f797f888 100644 --- a/development/BEHAVE-EXTRACTOR.md +++ b/development/BEHAVE-EXTRACTOR.md @@ -627,17 +627,17 @@ A single page you can paste into a TODO and tick off. **Every box unchecked = no v0 tag.** ### Phase A — Calibration floor (Steps 0–10) -- [ ] Step 0 — Scaffold + smoke test -- [ ] Step 1 — Asciinema parser + paste-burst detector -- [ ] Step 2 — `motor.input_modality` (FIRST PRIMITIVE) -- [ ] Step 3 — `motor.paste_burst_rate` -- [ ] Step 4 — Command segmentation in `SessionContext` -- [ ] Step 5 — `cognitive.inter_command_latency_class` -- [ ] Step 6 — `cognitive.command_branch_diversity` -- [ ] Step 7 — `cognitive.feedback_loop_engagement` -- [ ] Step 8 — `cognitive.inter_command_consistency` -- [ ] Step 9 — Calibration grid lockdown (the gate) -- [ ] Step 10 — Phase A complete: floor green +- [x] Step 0 — Scaffold + smoke test +- [x] Step 1 — Asciinema parser + paste-burst detector +- [x] Step 2 — `motor.input_modality` (FIRST PRIMITIVE) +- [x] Step 3 — `motor.paste_burst_rate` +- [x] Step 4 — Command segmentation in `SessionContext` +- [x] Step 5 — `cognitive.inter_command_latency_class` +- [x] Step 6 — `cognitive.command_branch_diversity` +- [x] Step 7 — `cognitive.feedback_loop_engagement` +- [x] Step 8 — `cognitive.inter_command_consistency` +- [x] Step 9 — Calibration grid lockdown (the gate) +- [x] Step 10 — Phase A complete: floor green ### Phase B — `motor.*` completion - [ ] B.1 `motor.keystroke_cadence` @@ -697,6 +697,39 @@ the same commit. --- +## Phase A completion log + +Closed in 11 commits across one session. Six primitives emit; the +five-class calibration grid is the binding regression test for +every subsequent phase. + +| Primitive | Confidence | Empirical anchor (2026-05-02 corpus) | +|---|---|---| +| `motor.input_modality` | 0.70 / 0.75 | YOU-sim 47.6% paste → ``pasted``; HUMAN <5% → ``typed`` | +| `motor.paste_burst_rate` | 0.70 / 0.80 | LW-sim / CLAUDE-FF / CLAUDE-CL ≥50% → ``habitual`` | +| `cognitive.inter_command_latency_class` | 0.40 / 0.80 | CLAUDE-FF 15.5s median → ``llm_heavyweight`` | +| `cognitive.command_branch_diversity` | 0.80 / 1.00 | CLAUDE-CL ≈0.55-0.60 → ``adaptive_branching``; threshold 0.70 | +| `cognitive.feedback_loop_engagement` | 0.75 / 1.00 | CLAUDE-FF flat r → ``fire_and_forget``; r > 0.30 → ``closed_loop`` | +| `cognitive.inter_command_consistency` | 0.40 / 0.75 | LLM CV≈0.24 → ``metronomic``; HUMAN CV≈0.94 → ``variable`` | + +The hard gate (every Phase A primitive must fire per shard) is in +``tests/profiler/behave_shell/test_calibration_grid.py`` and skips +cleanly when ``BEHAVE_CALIBRATION_DIR`` is unset. + +Per-class **value** pinning (e.g. HUMAN must emit +``inter_command_consistency=bimodal``) is intentionally NOT a hard +gate at this milestone — v0.1 thresholds put real human sessions +in ``variable``, and true bimodal detection (Hartigan dip / +two-peak) is registry-flagged for v0.2. Tighter pinning lands as +the corpus grows. + +**Worker unblocked:** ``BEHAVE-INTEGRATION.md`` Phase 4 can now wire +the per-session producer against the Phase A engine; the Tier-A +corpus continues to grow under Phases B-G without changing the +worker's interface. + +--- + **Owner:** ANTI. **Implementation gate:** Step 0 starts after this doc is reviewed + Phase 1 of `BEHAVE-INTEGRATION.md` lands (storage table exists).