high_agitated when any of:
* caps_run_max ≥ 5
* bang_run_max ≥ 3
* fastest typing burst median IAT < 0.06s with ≥ 30 IATs total
low_calm when slowest qualifying burst median IAT > 0.30s with ≥ 30
IATs. Else medium_engaged. Confidence hard-capped at 0.5; 0.30 below
AROUSAL_MIN_IATS.
Compare median intra-command IATs of the two temporal halves of the
session. ≥ MULTI_ACTOR_HALF_MIN_COMMANDS (4) per half required;
relative delta > MULTI_ACTOR_HANDOFF_DELTA (0.5) → handoff_detected.
team_coordinated is Tier B (cross-session); never emitted from a
single session. Confidence 0.55 with both halves ≥ 8 commands; 0.40
otherwise.
* careful — operator hits OPSEC_HISTORY_TOKENS AND tail-K commands
include _CLEANUP_TOKEN_HASHES (re-imported from temporal.py).
* learning — history hit without cleanup-tail follow-through.
* careless — no history-clearing vocabulary at all.
Confidence 0.45 (small lexicon, soft); 0.30 below
MIN_COMMANDS_FOR_FULL_CONFIDENCE.
Phase G shared infrastructure (no primitive yet emitted):
* New `_intent.py` — five precomputed first-token-hash sets (recon /
exfil / persistence / lateral / destructive) with documented
precedence, plus opsec-history and three lexeme sets (positive /
negative / obscenity) for the typed-text counter pass. Stop words
that collide with registry value vocabulary (`no`, `hell`, `ok`)
are deliberately excluded — the PII regression test catches such
collisions.
* `_typed_char_histograms()` extended with five integer counters
populated in the same single-pass walk: `obscenity_hits`,
`positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
`bang_run_max`. Longest-suffix match against bounded lexicon
(`LEXEME_MAX_LEN`); paste-class events excluded.
* `SessionContext` widened by the same five fields. Drives G.5
(valence), G.6 (arousal), G.8 (frustration_venting) without retaining
raw operator text.
* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
caught by pre-commit pip-audit). Adjust ftp template type-ignore
code from attr-defined to misc to match the new Twisted typing.
PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
Resolves the E.4 hold from Phase E. F.0's Command.followed_by_prompt
gives us the exit-code proxy (prompt-after-last-command) we couldn't
get in Phase E.
Logic: last command without trailing prompt → abrupt; first_token_hash
in {exit, logout, quit, logoff} → graceful; any of the last K=3
commands' first_token_hash in {history, unset, rm, shred, clear, kill}
→ cleanup; else → graceful (clean Ctrl-D / window close).
Sliding-window scan over single-char digit input events. A run of
NUMPAD_RUN_MIN (4) consecutive digit events whose pairwise IATs are
all ≤ NUMPAD_FAST_IAT_S (50ms) → detected. Otherwise → not_detected.
Skips below NUMPAD_MIN_TYPED_CHARS (50) typed chars. Confidence cap
0.50 per the registry's weak-signal flag.
ANTI authorised dropping the PII boundary for this primitive. ctx
gains typed_unigram_counts / typed_bigram_counts / typed_letter_count
populated during the existing single-pass input walk (paste-class
events excluded).
Two-axis classifier:
* layout-artefact unigrams take priority — q rate above floor with
low English saturation → azerty; z above floor with y below → qwertz
* fallback to English-bigram saturation: ≥ floor → qwerty, else other
Sample-size floor 200 typed letters; bigram histogram capped at
top-64 to bound memory. Confidence cap stays moderate (0.40-0.55) —
heuristic discriminator.
Searches ANSI-stripped output for LANG / LC_ALL / LC_CTYPE envvar
substrings emitted by env / locale / printenv. Highest-priority key
wins (LC_ALL > LANG > LC_CTYPE); POSIX value normalised to BCP-47:
en_US.UTF-8 → en-US, pt_BR.UTF-8 → pt-BR, C/POSIX → und. Free-string
registry value emitted directly.
PII discipline: only the parsed locale value enters observations;
surrounding output is read once for matching and dropped.
Scans RAW output (multiplexer escapes are themselves ANSI; never
strip first) for tmux markers (DCS passthrough, focus-reporting,
window-title with tmux marker) and screen markers (DCS, screen-OSC).
Detected → tmux/screen at 0.85; otherwise → none at 0.55. Skips
emission entirely when no commands — silence on a pure-echo or
empty session, per the smoke gates.
When both detected (nested mux), prefer tmux.
Adds PromptLine dataclass + extract_prompt_lines() helper. PromptLine
carries ts, suffix_char ($/#/%/>), raw_line (ANSI-stripped, capped),
is_root flag. Populated during the existing single-pass output-window
walk; SessionContext gains prompt_lines, Command gains
followed_by_prompt.
PII trade-off (ANTI-authorised at Phase F): PS1 text retained on ctx
so F.1 / F.3 / E.4 can read it. Capped at PROMPT_LINE_MAX_CHARS=256.
Observations still only carry derived primitive values.
D.0's regex error helpers stay alongside (NOT subsumed) — they fire
even when PS1 echo is suppressed. F.0 enriches D.0 rather than
replacing it.
Inspect the first N commands; if at least K of their first_token_hashes
match the recon-survey vocabulary (uname/id/whoami/pwd/hostname/w/who),
emit present, else absent. Hashes precomputed at module load; PII-safe.
v0.1 N=5, K=2.
Bin commands into non-overlapping windows of width
max(ESCALATION_WINDOW_MIN_S, duration_s / ESCALATION_WINDOW_TARGET).
CV of per-window counts + zero-window fraction classify bursty /
sustained / erratic. v0.1; corpus re-tune deferred.
Bucket ctx.duration_s against SESSION_DURATION_SHORT_MAX (60s) /
MEDIUM_MAX (600s) / LONG_MAX (3600s); else marathon. Direct
measurement, confidence 0.85. Skip emission only when no commands
and zero duration. New _features/temporal.py module opens Phase E.
For each errored command, check whether the next command's
first_token_hash is in {man, help, info} (precomputed at module
load). At least one match → present, else absent. The --help / -h
flag forms aren't first tokens; v0.2 will reconsider once arg-token
hashing is justified by corpus.
Compares median within-command IAT for commands following an errored
command vs commands following a successful one. Relative absolute delta
buckets to low / moderate / high. Skips when either group is empty
(no errors, or no clean baseline). v0.1; D.8 re-tunes.
Modal response across Command.errored=True commands:
* same first_token_hash on next command → rerun
* different first_token_hash → switch
* no next command → abort
Tiebreak in registry order. The fourth registry value 'modify'
requires within-command arg diffing (PII boundary); deferred to v0.2.
Distribution of inter-command IATs bucketed against IKI_THINK_MAX_S
(deep) and INTER_CMD_INSTANT_MAX (reactive); fall-through is shallow.
v0.1 thresholds; D.8 re-tunes.
Two-axis classification over the first_token_hash sequence:
repetition_rate (drilling) vs backtrack_rate (jumping among prior
tools). chaotic/targeted/methodical buckets. v0.1 thresholds; D.8
re-tunes.
Composite over three [0, 1]-clipped sub-signals (chunking variance,
error rate from D.0's Command.errored, pace variability), mean-aggregated
and bucketed against COGNITIVE_LOAD_LOW_MAX / COGNITIVE_LOAD_MEDIUM_MAX.
Components missing data drop out of the mean rather than zeroing it.
v0.1 thresholds; D.8 re-tunes once D.2-D.7 are stable. Confidence
held at 0.60 (composite over soft sub-signals) and halved below the
5-command sample-size floor.
Lifts the error-signal slice of F.0 forward as a D.0 prelude. ANSI
strip + canonical bash/sh error fingerprints classify each command's
post-execution output window; Command gains errored / output_bytes
fields. PII discipline preserved — only a bool and an int leave the
helper, the stripped output text is dropped on return.
Drives D.1 (cognitive_load error_rate term) and D.5–D.7 (error_resilience
family). Phase F.0 will subsume this with PS1 + exit-code parsing.
BEHAVE-EXTRACTOR.md Phase B Step B.3. Replaces the prototype's
two-line "0 vs >0 backspaces" placeholder with a backspace-timing
classifier that honours the registry's full vocabulary.
* SessionContext gains backspace_count, backspace_iats (IAT from
each backspace back to the preceding non-backspace input event),
and kill_line_count (^U / ^W). Built by _scan_correction_signals,
which retains only counts and timing aggregates — no character
data leaves the helper, in line with the BEHAVE PII discipline.
* _features/motor.py:error_correction(ctx) emits one Observation
in {immediate, deferred, absent, route_around}.
- 0 backspaces + ≥1 ^U/^W → route_around (rewrite, not correct)
- 0 backspaces + 0 kill-lines → absent
- backspaces with median IAT ≤ 500 ms → immediate
- slower → deferred
Confidence 0.65 / 0.65 / 0.55 / 0.55.
* < 3 inputs → skip emit.
* Calibration grid widened to include motor.error_correction;
green across all five shards.
Tests cover all four buckets, the < 3 inputs skip, and the PII
regression (raw command body never appears in the serialised
observation).
BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled
implementation — the prototype doesn't ship this primitive at all.
* _features/motor.py:motor_stability(ctx) emits one Observation
in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1.
* Tremor proxy: fraction of within-burst IATs below
TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs).
≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor
twitch / stuck-key).
* Otherwise median burst CV decides: < CV_STEADY_MAX → steady,
else → variable. Confidence 0.70 / 0.60 / 0.65.
* No typing bursts or fewer than 5 within-burst IATs → skip emit.
* Calibration grid widened to include motor.motor_stability; green
across all five shards.
Tests cover all three buckets + skip paths.
BEHAVE-EXTRACTOR.md Phase B Step B.1.
* SessionContext gains typing_bursts: tuple[tuple[float, ...], ...]
built by _split_typing_bursts(iats) — splits at gaps > IKI_THINK_MAX_S
(1.5s) and drops bursts of fewer than 3 IATs. Mirrors prototype's
_split_into_bursts at BEHAVE/prototype_extractors/shell/extract.py:275.
* _features/motor.py:keystroke_cadence(ctx) emits one Observation
in {steady, bursty, hunt_and_peck, machine}. Median CV across
typing bursts; mean IKI < IKI_MACHINE_MAX_S paired with CV <
CV_MACHINE_MAX → machine. Confidence 0.85/0.70/0.65/0.60 per the
prototype's calibration history.
* < MIN_INPUTS_FOR_CADENCE inputs or zero typing bursts → skip
emission. v0.1 emits only the burst-CV variant; the prototype's
NAIVE session-CV variant is parked for v0.2.
* Calibration grid widened (PHASE_A_PRIMITIVES → PHASE_AB_PRIMITIVES)
to include motor.keystroke_cadence. Grid green across all five
shards.
Tests: too-few-inputs → no emit, all-think-pauses → no burst → no
emit, uniform IATs → steady, sub-5ms → machine, mixed-pace → bursty,
extreme bimodal → hunt_and_peck.
BEHAVE-EXTRACTOR.md Phase A Step 9 — the gate. Runs the pure
engine against each of the five 2026-05-02 calibration shards and
pins the contract that all subsequent Phase B-G PRs must keep
green: every Phase A primitive (motor.input_modality,
motor.paste_burst_rate, cognitive.inter_command_latency_class,
cognitive.command_branch_diversity, cognitive.feedback_loop_engagement,
cognitive.inter_command_consistency) fires at least once per shard.
* tests/profiler/behave_shell/test_calibration_grid.py
parametrized over (shard_file, class_label) for HUMAN / YOU-sim /
LW-sim / CLAUDE-FF / CLAUDE-CL. Skips entirely when
BEHAVE_CALIBRATION_DIR is unset (CI provides the path; local dev
doesn't have to).
* Plus a discrimination-smoke check: at least one primitive
produces different majority values across present classes —
catches the "constant-output regression" failure mode where the
engine quietly degenerates to a stub.
Calibration tweak: BRANCH_DIVERSITY_LINEAR_MIN dropped from 0.80 to
0.70 to align with the prototype's empirical anchors (CLAUDE-CL ≈
0.55-0.60 adaptive; YOU-sim / CLAUDE-FF scripted recon ≈ 0.75+
linear). Test for the middle band re-pinned at the new boundary.
Per-class value pinning (e.g. HUMAN must emit
inter_command_consistency=bimodal) is intentionally NOT a hard gate
yet — v0.1 thresholds put real human sessions in "variable", and
true bimodal detection (Hartigan dip / two-peak) is registry-flagged
for v0.2. Tighter pinning lands as the corpus grows.
BEHAVE-EXTRACTOR.md Phase A Step 7. The orthogonal axis — does the
operator's pause-after-command correlate with bytes of output they
just saw? Splits HUMAN/CLAUDE-CL (closed_loop) from LW-sim/CLAUDE-FF
(fire_and_forget); cuts ACROSS the LLM/human axis.
* _features/cognitive.py:feedback_loop_engagement(ctx) emits one
Observation in {closed_loop, fire_and_forget, unknown}.
* Pearson correlation between ctx.output_per_cmd[i] and
ctx.inter_cmd_iats[i] (paired by construction in Step 4); via
statistics.correlation with constant-series fallback to "unknown".
* r > FEEDBACK_CORRELATION_MIN (0.30) → closed_loop; otherwise
(zero, negative, or undefined) → fire_and_forget.
* First primitive that depends on output events: zero output events
in the shard or fewer than FEEDBACK_MIN_PAIRS (5) pairs → emit
"unknown" at confidence 1.0 (the absence-of-data is itself a
high-confidence answer). Zero-command session skips entirely.
Tests: no-output → unknown, few-pairs → unknown, strong positive r
→ closed_loop, constant pace → fire_and_forget/unknown,
negative r → fire_and_forget.
BEHAVE-EXTRACTOR.md Phase A Step 6. Content-based playbook-vs-
adaptive split. Splits CLAUDE-FF (linear_playbook, ~10 distinct
tools) from CLAUDE-CL (adaptive_branching, 5-6 tools with curl
re-invoked) per the 2026-05-02 empirical anchor.
* _features/cognitive.py:command_branch_diversity(ctx) emits one
Observation in {linear_playbook, adaptive_branching, unknown}.
* unique_first_token_hashes / total_commands ratio. ≥ 0.80 →
linear_playbook, otherwise adaptive_branching (the doc instructs
bias-to-adaptive in the middle band — that's the discriminative
signal we actually want).
* < 5 commands → "unknown" at confidence 1.0 (the absence of data
is itself a high-confidence answer per the registry's allowed
vocabulary). Zero-command session skips emission entirely.
Tests cover unique-tokens → linear, repeated-tokens → adaptive,
middle band → adaptive (bias), under-floor → unknown @ 1.0, plus
PII regression: raw tokens never appear in the serialised
observation.
BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's
thinking pace between commands. Splits LW-sim / CLAUDE-FF /
CLAUDE-CL.
* _features/cognitive.py:inter_command_latency_class(ctx) emits one
Observation in {instant, typing_speed, deliberate,
llm_lightweight, llm_heavyweight, long}, computed as the median
of ctx.inter_cmd_iats bucketed against the prototype thresholds
(v0.2 split: lightweight 2-8s, heavyweight 8-30s).
* Sample-size honesty: < 5 commands halves confidence (0.40 vs
0.80) per BEHAVE-EXTRACTOR.md.
* Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE,
plus parked Step 6/7/8 thresholds for the next three commits)
added to _thresholds.py.
Tests cover all six buckets at empirically-anchored IATs (15s ≈
Claude Opus driving recon via tmux send-keys), plus the
single-command no-IAT and low-sample-count paths.
BEHAVE-EXTRACTOR.md Phase A Step 4. Pure refactor inside _ctx.py —
no new feature emits. Lays the shared utility for the three
cognitive primitives next in line (Steps 5-7).
* Command dataclass (frozen): start_ts, end_ts, first_token_hash.
PII-safe by construction — only the first whitespace-delimited
token of the command is retained, and only as a sha256 hash
(decnet/profiler/behave_shell/_parse.py:hash_token).
* _segment_commands walks input events char-by-char, splits on
\r / \n, hashes the first token, drops the rest.
* SessionContext gains commands, inter_cmd_iats, output_per_cmd.
output_per_cmd[i] counts bytes between commands[i].end_ts and
commands[i+1].start_ts — the natural pairing for Step 7
(feedback_loop_engagement).
Tests: empty / unterminated streams, single command (CR + LF
terminators), paste-with-newline, multi-command IAT pairing,
output-byte counting between boundaries, blank-line skip,
first-token-only PII discipline.
BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as
motor.input_modality but coarser-bucketed: this is the *habit*
signal (does the operator reach for paste at all?), where
input_modality is the dominant-channel signal.
* _features/motor.py:paste_burst_rate(ctx) emits one Observation
per session in {none, occasional, habitual} with confidence
0.70 / 0.70 / 0.80.
* Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10,
PASTE_RATE_HABITUAL_MIN=0.50.
Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions
paste habitually, real humans rarely paste.
Tests: pure-typed → none; 1-paste-in-10 → occasional;
paste-majority → habitual; output-only → no observation; habitual
confidence > occasional confidence.
BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked
first because it has the highest discriminative value (HUMAN vs
everyone) and the simplest implementation (paste-event ratio over
total inputs).
* _features/motor.py:input_modality(ctx) emits one Observation
per session in {typed, pasted, mixed} with confidence 0.75 / 0.70.
* _features/_emit.py centralises the make_observation helper so
every feature module gets the same Window/source/evidence_ref
boilerplate without copy-paste.
* Thresholds inherited from the prototype's calibration history
(MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05).
* Zero-input session skips emission — registry doesn't admit
"unknown" here.
Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed,
output-only session → no observation, full envelope round-trip.
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that
Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will
consume:
* parse_shard_line / parse_shard turn a shard JSONL line/file into
AsciinemaEvents, skipping headers and malformed records.
* PasteBurst dataclass + _detect_paste_bursts group consecutive
paste-class input events (len(d) >= 4 chars per the prototype's
empirical floor) into contiguous bursts, splitting on IAT gaps
larger than PASTE_BURST_MAX_IAT_S (200ms).
* SessionContext now carries iats and paste_bursts derivations.
* Threshold constants harvested from
BEHAVE/prototype_extractors/shell/extract.py — calibrated against
the five 2026-05-02 shards.
Tests cover pure-typed, pure-pasted, mixed streams; close vs far
paste events; typed events breaking a burst; PasteBurst immutability;
and the JSON parser's junk handling.
BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton
(__init__/extract/_parse/_ctx/_thresholds/_features) with empty
FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4
has a stable import path before any primitive lands.
extract_session() builds a SessionContext once and fans the
registered feature functions across it; at Step 0 that fan-out is
empty and the function yields nothing. Step 1 (asciinema parser +
paste-burst detector) and Step 2 (motor.input_modality) land next.
Smoke suite asserts the empty contract: empty stream → no
observations, single event → t_start == t_end, multi-event → events
routed into input_events / output_events by kind, evidence_ref
defaults to "session:<sid>" or honours an explicit override.