test(profiler/behave_shell): H.2 calibration grid full sweep

Run the five-class calibration grid (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL) against the 2026-05-02 shards.

* Hard gate green for 27 primitives across all 5 shards.
* environmental.keyboard_layout moved from hard gate to
  PHASE_F_CONDITIONAL_PRIMITIVES — short SSH-recon corpus maxes at
  ~90 typed letters per session, well below the LAYOUT_MIN_TYPED_LETTERS
  (200) floor. The 200-floor stays per the per-phase "v0 ships when
  honest" rule; longer-text corpora will surface the layout signal.
* Three primitives never fire on the 2026-05-02 corpus, all already
  conditional and all expected:
  - cognitive.error_resilience.frustration_typing
  - environmental.locale
  - environmental.keyboard_layout

No D / F / G threshold re-tunes needed; only the keyboard_layout
binding-set move. Phase H step log appended to BEHAVE-EXTRACTOR.md
with per-class observation counts.
This commit is contained in:
2026-05-08 18:33:51 -04:00
parent ac04751c18
commit 9ebaca410a
2 changed files with 91 additions and 6 deletions

View File

@@ -59,10 +59,10 @@ PHASE_ABCDEFG_PRIMITIVES: frozenset[str] = frozenset({
"temporal.escalation_pattern",
"temporal.lifecycle_markers.landing_ritual",
# Phase F — environmental.* output-stream block + carry-over E.4
# (locale is conditional, see PHASE_F_CONDITIONAL_PRIMITIVES)
# (locale and keyboard_layout are conditional see
# PHASE_F_CONDITIONAL_PRIMITIVES)
"environmental.shell_type",
"environmental.terminal_multiplexer",
"environmental.keyboard_layout",
"environmental.numpad_usage",
"temporal.lifecycle_markers.exit_behavior",
# Phase G — operational.* + emotional_valence.* (hard subset)
@@ -85,12 +85,18 @@ PHASE_D_CONDITIONAL_PRIMITIVES: frozenset[str] = frozenset({
"cognitive.error_resilience.fallback_to_man",
})
# Phase F primitives conditional on shard content. ``environmental.locale``
# fires only when the shard's output contains an env / locale dump
# (LANG=, LC_ALL=, LC_CTYPE=). It's tracked here, not in the per-shard
# hard gate.
# Phase F primitives conditional on shard content.
# * ``environmental.locale`` fires only when the shard's output contains
# an env / locale dump (LANG=, LC_ALL=, LC_CTYPE=).
# * ``environmental.keyboard_layout`` requires LAYOUT_MIN_TYPED_LETTERS
# (200) typed letters per session — short SSH-recon shards (the
# 2026-05-02 calibration corpus) max out around 90 typed letters
# per session because most input is pasted rather than typed.
# v0 keeps the 200-floor honesty rather than tuning to pass; longer-
# text corpora will surface it.
PHASE_F_CONDITIONAL_PRIMITIVES: frozenset[str] = frozenset({
"environmental.locale",
"environmental.keyboard_layout",
})
# Phase G primitives that ride sample-size floors and may legitimately