Commit Graph

9 Commits

Author SHA1 Message Date
0737fcfe93 feat(profiler/behave_shell): emit motor.motor_stability
BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled
implementation — the prototype doesn't ship this primitive at all.

* _features/motor.py:motor_stability(ctx) emits one Observation
  in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1.
* Tremor proxy: fraction of within-burst IATs below
  TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs).
  ≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor
  twitch / stuck-key).
* Otherwise median burst CV decides: < CV_STEADY_MAX → steady,
  else → variable. Confidence 0.70 / 0.60 / 0.65.
* No typing bursts or fewer than 5 within-burst IATs → skip emit.
* Calibration grid widened to include motor.motor_stability; green
  across all five shards.

Tests cover all three buckets + skip paths.
2026-05-03 21:25:54 -04:00
d90c8b70ce feat(profiler/behave_shell): emit motor.keystroke_cadence
BEHAVE-EXTRACTOR.md Phase B Step B.1.

* SessionContext gains typing_bursts: tuple[tuple[float, ...], ...]
  built by _split_typing_bursts(iats) — splits at gaps > IKI_THINK_MAX_S
  (1.5s) and drops bursts of fewer than 3 IATs. Mirrors prototype's
  _split_into_bursts at BEHAVE/prototype_extractors/shell/extract.py:275.
* _features/motor.py:keystroke_cadence(ctx) emits one Observation
  in {steady, bursty, hunt_and_peck, machine}. Median CV across
  typing bursts; mean IKI < IKI_MACHINE_MAX_S paired with CV <
  CV_MACHINE_MAX → machine. Confidence 0.85/0.70/0.65/0.60 per the
  prototype's calibration history.
* < MIN_INPUTS_FOR_CADENCE inputs or zero typing bursts → skip
  emission. v0.1 emits only the burst-CV variant; the prototype's
  NAIVE session-CV variant is parked for v0.2.
* Calibration grid widened (PHASE_A_PRIMITIVES → PHASE_AB_PRIMITIVES)
  to include motor.keystroke_cadence. Grid green across all five
  shards.

Tests: too-few-inputs → no emit, all-think-pauses → no burst → no
emit, uniform IATs → steady, sub-5ms → machine, mixed-pace → bursty,
extreme bimodal → hunt_and_peck.
2026-05-03 21:24:13 -04:00
842b7de950 feat(profiler/behave_shell): emit cognitive.inter_command_consistency
BEHAVE-EXTRACTOR.md Phase A Step 8. Dispersion / bimodality of
inter-command pauses. HUMAN-bimodal vs LLM-metronomic.

* _features/cognitive.py:inter_command_consistency(ctx) emits one
  Observation in {metronomic, variable, bimodal}.
* CV = stdev / mean of ctx.inter_cmd_iats. CV < 0.40 → metronomic
  (LLM-pure; corpus anchor 0.24); CV ≥ 1.50 → bimodal heuristic
  (LLM-assisted human; v0.1 placeholder, true bimodal via Hartigan
  dip is registry-flagged for v0.2); else → variable (human;
  corpus anchor 0.94).
* < 2 IATs or zero mean → skip emission. < 5 commands halves
  confidence (0.40 vs 0.75) per sample-size honesty.

Tests: too-few IATs → no emission, uniform → metronomic,
human-like dispersion → variable, extreme bursts+gaps → bimodal,
low-sample-count → reduced confidence.

Step 8 closes the six-primitive calibration floor for Phase A.
Step 9 (calibration grid lockdown) is the gate that pins it.
2026-05-03 07:56:49 -04:00
2f8c107e70 feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement
BEHAVE-EXTRACTOR.md Phase A Step 7. The orthogonal axis — does the
operator's pause-after-command correlate with bytes of output they
just saw? Splits HUMAN/CLAUDE-CL (closed_loop) from LW-sim/CLAUDE-FF
(fire_and_forget); cuts ACROSS the LLM/human axis.

* _features/cognitive.py:feedback_loop_engagement(ctx) emits one
  Observation in {closed_loop, fire_and_forget, unknown}.
* Pearson correlation between ctx.output_per_cmd[i] and
  ctx.inter_cmd_iats[i] (paired by construction in Step 4); via
  statistics.correlation with constant-series fallback to "unknown".
* r > FEEDBACK_CORRELATION_MIN (0.30) → closed_loop; otherwise
  (zero, negative, or undefined) → fire_and_forget.
* First primitive that depends on output events: zero output events
  in the shard or fewer than FEEDBACK_MIN_PAIRS (5) pairs → emit
  "unknown" at confidence 1.0 (the absence-of-data is itself a
  high-confidence answer). Zero-command session skips entirely.

Tests: no-output → unknown, few-pairs → unknown, strong positive r
→ closed_loop, constant pace → fire_and_forget/unknown,
negative r → fire_and_forget.
2026-05-03 07:55:38 -04:00
3fc6ea5f75 feat(profiler/behave_shell): emit cognitive.command_branch_diversity
BEHAVE-EXTRACTOR.md Phase A Step 6. Content-based playbook-vs-
adaptive split. Splits CLAUDE-FF (linear_playbook, ~10 distinct
tools) from CLAUDE-CL (adaptive_branching, 5-6 tools with curl
re-invoked) per the 2026-05-02 empirical anchor.

* _features/cognitive.py:command_branch_diversity(ctx) emits one
  Observation in {linear_playbook, adaptive_branching, unknown}.
* unique_first_token_hashes / total_commands ratio. ≥ 0.80 →
  linear_playbook, otherwise adaptive_branching (the doc instructs
  bias-to-adaptive in the middle band — that's the discriminative
  signal we actually want).
* < 5 commands → "unknown" at confidence 1.0 (the absence of data
  is itself a high-confidence answer per the registry's allowed
  vocabulary). Zero-command session skips emission entirely.

Tests cover unique-tokens → linear, repeated-tokens → adaptive,
middle band → adaptive (bias), under-floor → unknown @ 1.0, plus
PII regression: raw tokens never appear in the serialised
observation.
2026-05-03 07:54:13 -04:00
e52a0e0381 feat(profiler/behave_shell): emit cognitive.inter_command_latency_class
BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's
thinking pace between commands. Splits LW-sim / CLAUDE-FF /
CLAUDE-CL.

* _features/cognitive.py:inter_command_latency_class(ctx) emits one
  Observation in {instant, typing_speed, deliberate,
  llm_lightweight, llm_heavyweight, long}, computed as the median
  of ctx.inter_cmd_iats bucketed against the prototype thresholds
  (v0.2 split: lightweight 2-8s, heavyweight 8-30s).
* Sample-size honesty: < 5 commands halves confidence (0.40 vs
  0.80) per BEHAVE-EXTRACTOR.md.
* Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE,
  plus parked Step 6/7/8 thresholds for the next three commits)
  added to _thresholds.py.

Tests cover all six buckets at empirically-anchored IATs (15s ≈
Claude Opus driving recon via tmux send-keys), plus the
single-command no-IAT and low-sample-count paths.
2026-05-03 07:52:39 -04:00
6763fceb0b feat(profiler/behave_shell): emit motor.paste_burst_rate
BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as
motor.input_modality but coarser-bucketed: this is the *habit*
signal (does the operator reach for paste at all?), where
input_modality is the dominant-channel signal.

* _features/motor.py:paste_burst_rate(ctx) emits one Observation
  per session in {none, occasional, habitual} with confidence
  0.70 / 0.70 / 0.80.
* Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10,
  PASTE_RATE_HABITUAL_MIN=0.50.

Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions
paste habitually, real humans rarely paste.

Tests: pure-typed → none; 1-paste-in-10 → occasional;
paste-majority → habitual; output-only → no observation; habitual
confidence > occasional confidence.
2026-05-03 07:49:03 -04:00
879f5e731b feat(profiler/behave_shell): emit motor.input_modality
BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked
first because it has the highest discriminative value (HUMAN vs
everyone) and the simplest implementation (paste-event ratio over
total inputs).

* _features/motor.py:input_modality(ctx) emits one Observation
  per session in {typed, pasted, mixed} with confidence 0.75 / 0.70.
* _features/_emit.py centralises the make_observation helper so
  every feature module gets the same Window/source/evidence_ref
  boilerplate without copy-paste.
* Thresholds inherited from the prototype's calibration history
  (MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05).
* Zero-input session skips emission — registry doesn't admit
  "unknown" here.

Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed,
output-only session → no observation, full envelope round-trip.
2026-05-03 07:47:38 -04:00
f8eae04e5d feat(profiler/behave_shell): scaffold extract_session entry point
BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton
(__init__/extract/_parse/_ctx/_thresholds/_features) with empty
FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4
has a stable import path before any primitive lands.

extract_session() builds a SessionContext once and fans the
registered feature functions across it; at Step 0 that fan-out is
empty and the function yields nothing. Step 1 (asciinema parser +
paste-burst detector) and Step 2 (motor.input_modality) land next.

Smoke suite asserts the empty contract: empty stream → no
observations, single event → t_start == t_end, multi-event → events
routed into input_events / output_events by kind, evidence_ref
defaults to "session:<sid>" or honours an explicit override.
2026-05-03 07:42:09 -04:00