Files
DECNET/decnet/profiler/behave_shell/_thresholds.py
anti 0737fcfe93 feat(profiler/behave_shell): emit motor.motor_stability
BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled
implementation — the prototype doesn't ship this primitive at all.

* _features/motor.py:motor_stability(ctx) emits one Observation
  in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1.
* Tremor proxy: fraction of within-burst IATs below
  TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs).
  ≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor
  twitch / stuck-key).
* Otherwise median burst CV decides: < CV_STEADY_MAX → steady,
  else → variable. Confidence 0.70 / 0.60 / 0.65.
* No typing bursts or fewer than 5 within-burst IATs → skip emit.
* Calibration grid widened to include motor.motor_stability; green
  across all five shards.

Tests cover all three buckets + skip paths.
2026-05-03 21:25:54 -04:00

101 lines
5.6 KiB
Python

"""Numeric thresholds for BEHAVE-SHELL primitive classification.
Each constant cites its calibration source. When the registry's
``notes:`` field disagrees with a constant here, the registry is
authoritative — fix the constant, re-run the calibration grid.
Empirical thresholds inherited from the BEHAVE prototype extractor
(``BEHAVE/prototype_extractors/shell/extract.py``); see lines 40-90 of
that file for the calibration history. Any change here must keep the
five-class grid green.
"""
from __future__ import annotations
# ── paste-burst detection (Step 1) ──────────────────────────────────────────
# A single input event with ≥ PASTE_MIN_CHARS_PER_EVENT chars is the
# paste-class proxy used by the prototype; xterm-kitty / iTerm / VS Code
# pastes arrive as one bulk write.
PASTE_MIN_CHARS_PER_EVENT: int = 4
# Consecutive paste-class events arriving within this IAT collapse into
# one PasteBurst record. 200ms is the prototype's IKI burst cap.
PASTE_BURST_MAX_IAT_S: float = 0.20
# ── motor.input_modality (Step 2) ───────────────────────────────────────────
# Paste-event ratio thresholds. ≥ 40% paste events → "pasted" (LLM-driven);
# ≤ 5% → "typed" (human at the keyboard); in between → "mixed".
# Lowered from 0.5 after the 47.6% case in sessions-2026-05-02-with-llm.jsonl
# was clearly LLM-driven but missed the 0.5 floor.
MODALITY_PASTED_MIN: float = 0.40
MODALITY_TYPED_MAX: float = 0.05
# ── motor.paste_burst_rate (Step 3) ─────────────────────────────────────────
# Same paste-event ratio re-bucketed for the "how often does the operator
# paste" axis. Coarser than input_modality on purpose: this primitive is the
# habit signal, input_modality is the dominant-channel signal.
PASTE_RATE_HABITUAL_MIN: float = 0.50
PASTE_RATE_OCCASIONAL_MIN: float = 0.10
# ── cognitive.inter_command_latency_class (Step 5) ──────────────────────────
# Bucket edges (seconds) for the median inter-command IAT. Prototype
# values; v0.2 splits the original llm_roundtrip 2-8s band into
# llm_lightweight (orchestrated agents w/ small models / terse prompts) and
# llm_heavyweight (reasoning-class agents in tool loops with text
# generation between calls). Empirical anchor: Claude Opus driving recon
# via tmux send-keys produced a median of 15.5s.
INTER_CMD_INSTANT_MAX: float = 0.30
INTER_CMD_TYPING_MAX: float = 1.50
INTER_CMD_DELIBERATE_MAX: float = 2.00
INTER_CMD_LLM_LIGHTWEIGHT_MAX: float = 8.00
INTER_CMD_LLM_HEAVYWEIGHT_MAX: float = 30.00
# Sample-size floor for inter-command IAT primitives. Below this we
# halve the confidence per BEHAVE-EXTRACTOR.md "sample-size honesty".
MIN_COMMANDS_FOR_FULL_CONFIDENCE: int = 5
# ── cognitive.command_branch_diversity (Step 6) ─────────────────────────────
# unique_first_tokens / total_commands ratio. Prototype's empirical
# split (sessions-2026-05-02-* corpus): CLAUDE-CL chasing one finding
# ≈ 0.55-0.60 (adaptive), HUMAN exploring filesystem ≈ 0.65-0.70
# (adaptive), YOU-sim / CLAUDE-FF scripted recon ≈ 0.75+ (linear).
BRANCH_DIVERSITY_LINEAR_MIN: float = 0.70 # >= → linear_playbook
# ── cognitive.feedback_loop_engagement (Step 7) ─────────────────────────────
# Pearson r threshold for "the operator's pause grew with the volume of
# preceding output". |r| > this → significant; sign carries direction.
FEEDBACK_CORRELATION_MIN: float = 0.30
# Need at least this many (output_bytes, next_pause) pairs to even
# attempt a correlation. Below this the answer is "unknown".
FEEDBACK_MIN_PAIRS: int = 5
# ── cognitive.inter_command_consistency (Step 8) ────────────────────────────
# CV (stdev / mean) of inter-command IATs. Empirical (this corpus):
# human session CV=0.94 → variable; LLM-simulated CV=0.24 → metronomic;
# anything beyond 1.5 is heuristically "bimodal" (real bimodal detection
# via Hartigan dip is filed for v0.2).
PAUSE_CV_METRONOMIC_MAX: float = 0.40
PAUSE_CV_BIMODAL_MIN: float = 1.50
# ── motor.keystroke_cadence (Step B.1) ──────────────────────────────────────
# Typing bursts split at gaps > IKI_THINK_MAX_S so think-pauses between
# commands don't inflate the within-burst CV. Mirrors the prototype's
# _split_into_bursts (BEHAVE/prototype_extractors/shell/extract.py:275-286).
IKI_THINK_MAX_S: float = 1.50
# Sub-human floor for the "machine" classification — only paired with a
# pathologically uniform CV, since real humans never produce sub-5ms IATs
# in a sustained burst.
IKI_MACHINE_MAX_S: float = 0.005
CV_MACHINE_MAX: float = 0.05
CV_STEADY_MAX: float = 0.50
CV_BURSTY_MAX: float = 1.50
# Need this many input events before we'll claim a cadence at all.
MIN_INPUTS_FOR_CADENCE: int = 5
# ── motor.motor_stability (Step B.2) ────────────────────────────────────────
# Tremor proxy: fraction of within-burst IATs below TREMOR_FAST_FLOOR_S
# (30 ms — physiologically implausible double-press floor; humans can't
# reliably produce IATs below ~50 ms in sustained typing). High rate
# of sub-floor IATs flags double-press / motor twitch / stuck-key.
TREMOR_FAST_FLOOR_S: float = 0.030
TREMOR_RATE_MIN: float = 0.10 # ≥10% sub-floor → tremor