12 Commits

Author SHA1 Message Date
9ebaca410a test(profiler/behave_shell): H.2 calibration grid full sweep
Run the five-class calibration grid (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL) against the 2026-05-02 shards.

* Hard gate green for 27 primitives across all 5 shards.
* environmental.keyboard_layout moved from hard gate to
  PHASE_F_CONDITIONAL_PRIMITIVES — short SSH-recon corpus maxes at
  ~90 typed letters per session, well below the LAYOUT_MIN_TYPED_LETTERS
  (200) floor. The 200-floor stays per the per-phase "v0 ships when
  honest" rule; longer-text corpora will surface the layout signal.
* Three primitives never fire on the 2026-05-02 corpus, all already
  conditional and all expected:
  - cognitive.error_resilience.frustration_typing
  - environmental.locale
  - environmental.keyboard_layout

No D / F / G threshold re-tunes needed; only the keyboard_layout
binding-set move. Phase H step log appended to BEHAVE-EXTRACTOR.md
with per-class observation counts.
2026-05-08 18:33:51 -04:00
f10931f24d test(profiler/behave_shell): Phase G grid lockdown + completion log
Widen calibration binding from PHASE_ABCDEF_PRIMITIVES (25) to
PHASE_ABCDEFG_PRIMITIVES (28 hard). Three Phase G primitives that
emit on any session-with-commands ride the hard gate:

* operational.opsec_discipline
* operational.cleanup_behavior
* emotional_valence.stress_response

The remaining five Phase G primitives ride a new
PHASE_G_CONDITIONAL_PRIMITIVES because their sample-size floors make
them legitimately absent from short shards:

* operational.objective                  (≥ 3 classified commands)
* operational.multi_actor_indicators     (≥ 8 commands)
* emotional_valence.arousal              (typing bursts)
* emotional_valence.valence              (≥ 80 typed letters)
* emotional_valence.frustration_venting  (≥ 30 typed letters)

Backwards-compat alias PHASE_ABCDEF_PRIMITIVES kept. Phase G
completion log + checkbox flips in BEHAVE-EXTRACTOR.md.

Tier-A corpus delta: all 37 Tier-A primitives now emit. Phase H
(full-corpus lockdown + v0 release) is next.
2026-05-08 16:40:13 -04:00
a25f4a890d test(profiler/behave_shell): Phase F + E.4 grid lockdown + completion log
Widens the binding calibration set from PHASE_ABCDE_PRIMITIVES (20)
to PHASE_ABCDEF_PRIMITIVES (25). The five new entries:

* environmental.shell_type (per-shard hard gate)
* environmental.terminal_multiplexer (per-shard hard gate)
* environmental.keyboard_layout (per-shard hard gate; PII boundary
  lifted by ANTI; emits all 4 registry values)
* environmental.numpad_usage (per-shard hard gate)
* temporal.lifecycle_markers.exit_behavior (resolution of the E.4
  hold; uses Command.followed_by_prompt from F.0)

environmental.locale joins a new PHASE_F_CONDITIONAL_PRIMITIVES set
(only fires on shards with an env / locale dump in the output).

Phase F completion log appended to BEHAVE-EXTRACTOR.md. The original
F.0 row hinted at D.0 subsumption; reversed in the log — D.0 is
enriched, not subsumed (regex catches errors when PS1 is suppressed).

Tier-A corpus delta: 25 of 37 primitives now emit. Phase G is next.
2026-05-04 00:44:22 -04:00
b7534c311a docs(behave): cross-reference Phase F.0 with held E.4 and landed D.0
F.0's row in BEHAVE-EXTRACTOR.md was forward-only — readers landing
on Phase F couldn't tell that F.0 also has a backlog (E.4 held, D.0
subsumption). Add a 'Carry-overs F.0 must unblock' section to the
Phase F prelude and a back-reference on the F.0 checkbox in the
implementation order checklist.
2026-05-04 00:17:37 -04:00
96a4039366 test(profiler/behave_shell): Phase E grid lockdown + completion log (E.4 held)
Widens the binding calibration set from PHASE_ABCD_PRIMITIVES (17) to
PHASE_ABCDE_PRIMITIVES (20). The three shipped Phase E primitives
(session_duration, escalation_pattern, landing_ritual) join the
per-shard hard gate.

E.4 (temporal.lifecycle_markers.exit_behavior) is held at ANTI's
direction pending Phase F.0's prompt parser — abrupt-vs-cleanup
needs exit-code visibility to be honest, and first-token membership
alone over-fires on benign rm / clear mid-session. E.4 picks up at
the tail of Phase F.

Phase E completion log appended to BEHAVE-EXTRACTOR.md; E.1-E.3
checkboxes flipped, E.4 left unchecked with a held note.
2026-05-04 00:16:33 -04:00
46775fc0e5 test(profiler/behave_shell): Phase D calibration-grid lockdown + completion log
Widens the binding calibration set from PHASE_ABC_PRIMITIVES (13) to
PHASE_ABCD_PRIMITIVES (17). The four unconditional Phase D primitives
(cognitive_load, exploration_style, planning_depth, tool_vocabulary)
join the per-shard hard gate. The three error_resilience.* primitives
are conditional on at least one errored command in the shard and
tracked in PHASE_D_CONDITIONAL_PRIMITIVES — excluded from the
per-shard required-emission set, included in the cross-class
discrimination check.

cognitive_load empirical re-tune deferred to the next
BEHAVE_CALIBRATION_DIR run; v0.1 thresholds ship.

Phase D completion log appended to BEHAVE-EXTRACTOR.md; Phase D
checkboxes flipped to [x].
2026-05-04 00:03:46 -04:00
bc62e42ce1 feat(profiler/behave_shell): emit motor.shell_mastery.pipe_chaining_depth 2026-05-03 23:34:54 -04:00
4fc980e968 feat(profiler/behave_shell): emit motor.shell_mastery.shortcut_usage 2026-05-03 23:33:07 -04:00
a077cf67c8 feat(profiler/behave_shell): emit motor.shell_mastery.tab_completion 2026-05-03 23:31:20 -04:00
771944830a docs(behave): close Phase B in BEHAVE-EXTRACTOR.md
Tick the four Phase B checkboxes (B.1-B.4) and append a Phase B
completion log inline (per the "append phase logs to design docs"
memory rule). Captures per-primitive confidence ranges, source
signals, and the PII-discipline regression that all four
primitives uphold.

Phase A + Phase B = 10 primitives emitting on every shard;
PHASE_AB_PRIMITIVES is binding for every subsequent phase.
Phase C (motor.shell_mastery.*) lands next.
2026-05-03 21:30:13 -04:00
0510cde073 feat(profiler/behave_shell): Phase A — calibration floor green
BEHAVE-EXTRACTOR.md Phase A Step 10. Closes the discriminative
floor: six primitives emit, the five-class calibration grid is the
binding regression test for every subsequent phase.

* Phase A checklist boxes (Steps 0-10) ticked in
  development/BEHAVE-EXTRACTOR.md.
* Phase A completion log appended inline to the design doc per
  the "append phase logs to design docs" memory rule — captures
  per-primitive confidence ranges and the 2026-05-02 empirical
  anchors that drove threshold calibration.
* Hard gate: tests/profiler/behave_shell/test_calibration_grid.py
  parametrised over five class shards, all green; skips cleanly
  on BEHAVE_CALIBRATION_DIR unset.

Phases B-G expand horizontally across the registry. Phase H is
the full-corpus lockdown + v0 release. Worker
(BEHAVE-INTEGRATION.md Phase 4) is unblocked at this milestone —
it can wire per-session production against the Phase A engine
without waiting for the rest of the Tier-A corpus.
2026-05-03 08:02:02 -04:00
11f474556c docs(behave): integration + extractor + attribution design (DEBT-050 / 051)
Three sibling design docs plus DEBT.md updates that supersede the
stale DEBT-036 with a BEHAVE-aligned plan.

development/BEHAVE-INTEGRATION.md — five-phase rollout: storage
(observations table mirroring the BEHAVE Observation envelope plus
one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive)
enforcing idempotency); engine (in decnet/profiler/behave_shell/
sublibrary, no new daemon, not in BEHAVE — DECNET is the engine);
BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live
smoke. Bus payload merges id/ts/v back in to preserve sensor
identifiers across the bus envelope.

development/BEHAVE-EXTRACTOR.md — engine route in eight phases
(A–H). Phase A locks the 6-primitive calibration grid; Phases B–G
expand horizontally; Phase H is the full Tier-A corpus + v0
release. v0 ships every shell-extractable primitive (37 of them);
Tier B is cross-session and lives in the attribution engine; Tier
C is network-domain (toolchain.*) and lives elsewhere.

development/ATTRIBUTION-ENGINE.md — sublibrary inside
decnet/correlation/ that consumes attacker.observation.* events
and emits attribution.profile.* derived state. Five-state machine
(unknown / stable / drifting / conflicted / multi_actor) with per-
ValueKind merge functions. v0 closes DEBT-051; v1 adds the real
clusterer; v2 federation gossip. The bright line forbidding
attribution to natural persons is lifted directly from BEHAVE's
envelope docstring.

development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and
DEBT-051 entries added; summary table + open list updated.
2026-05-03 07:24:19 -04:00