docs: reframe BEHAVE-SHELL as a spec, not a DECNET component — add stylometry/lexicometry scope, BEHAVE-TEXT/EYENET cross-reference

2026-05-10 04:31:36 -04:00
parent 58915d8115
commit 5a39211645
1 changed files with 254 additions and 377 deletions
--- a/BEHAVE-SHELL.md
+++ b/BEHAVE-SHELL.md
@@ -1,95 +1,106 @@
 # BEHAVE-SHELL

-BEHAVE-SHELL is DECNET's behavioural biometrics engine for interactive shell
-sessions.  It transforms raw PTY recordings into 37 attribution primitives
-that fingerprint *how* an operator works — their motor patterns, cognitive
-style, OPSEC habits, and emotional state — independently of what IP address
-or tooling they use.
+BEHAVE-SHELL is a **behavioural biometrics specification** for interactive
+shell sessions.  It defines a set of attribution primitives — observable,
+computable signals — that characterise *how* an operator works at a terminal,
+independently of what IP address, credential, or tooling they use.

-The primitives feed the [Identity-Resolution](Identity-Resolution) attribution
-state machine, which accumulates evidence across sessions to answer: *is this
-the same hands?*
+The spec was born out of DECNET's need to correlate attackers across sessions
+and IP changes, but it is not DECNET-specific.  Any system that records PTY
+sessions can implement BEHAVE-SHELL extraction and feed the resulting
+primitives into an attribution engine.  DECNET is the reference implementation.
+
+A sibling specification, **BEHAVE-TEXT**, defines equivalent primitives for
+written text — stylometry, lexicometry, and discourse structure — and is
+implemented by [EYENET](https://github.com/xmartlab/eyenet).
+
+---
+
+## Scope
+
+BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus.  The
+current specification covers three broad domains:
+
+| Domain | What it captures |
+|---|---|
+| **Motor biometrics** | Keystroke timing, error correction, paste vs. type habits, shell mastery signals |
+| **Cognitive / behavioural** | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure |
+| **Stylometry / lexicometry** | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions |
+
+The emotional valence cluster (`valence`, `arousal`, `stress_response`,
+`frustration_venting`) sits at the boundary of motor and stylometric signal —
+it measures both typing speed changes and lexical content after stress events.

 ---

 ## Design principles

- **Pure extraction library.** `extract_session()` takes an iterable of
-  asciinema events and yields `Observation` envelopes.  No I/O, no DB access,
-  no bus calls.  The worker owns all side effects.
- **PII by design.** Command text is never stored in plain form — only the
+- **Extraction is pure.** The spec defines a function
+  `extract_session(events) → Observations` that takes an iterable of timestamped
+  PTY events and yields structured observations.  No I/O.  No database.
+  No side effects.  Implementations are free to run this in any context.
+
+- **PII by design.** Command text is never stored in plain form.  Only the
  SHA-256 of the first token is retained.  Output is reduced to a byte count
  and an error verdict.  Prompt lines are ANSI-stripped and capped at 256
-  characters.
- **Idempotent persistence.** `UniqueConstraint(evidence_ref, primitive)`
-  on the observations table means replaying a shard never duplicates rows.
- **Confidence capping.** Emotional-valence features carry a hard confidence
-  cap of 0.50 — they contribute, but never dominate an attribution decision.
+  characters.  Raw bigram/unigram counts are used for layout fingerprinting —
+  not the text itself.
+
+- **Confidence is explicit.** Every observation carries a confidence value
+  [0.0–1.0].  Features that are inherently noisier have hard confidence caps
+  (emotional valence: 0.50).  Attribution engines must propagate confidence
+  rather than treating all observations as equal.
+
+- **Skip conditions over imputation.** A feature that cannot be computed on a
+  given session (e.g. `error_resilience` features when no errors occurred)
+  yields no observation rather than a default value.  Attribution engines
+  treat absence of an observation differently from an `unknown` state.

 ---

-## Data flow
+## Input format

-```
-PTY session
-    │
-    ▼
-sessrec.c — writes JSONL shard per session
-    │  {"sid": id, "t": ts, "ch": "i"|"o", "d": data}
-    │  Non-UTF-8 bytes handled via surrogateescape
-    ▼
-attacker.session.ended bus event
-    │
-    ▼
-_handler.handle_session_ended()
-    │  Reads shard from disk → parse_shard_line() → AsciinemaEvent tuples
-    ▼
-build_session_context()   (_ctx.py, ~573 lines)
-    │  Seven derivation steps (see below)
-    ▼
-extract_session()   (extract.py)
-    │  Fan-out across 37 registered feature functions (FEATURES registry)
-    │  Each yields 0..N Observation envelopes
-    ▼
-Upsert ObservationRow  →  publish attacker.observation.*
-    │
-    ▼
-attribution_worker  (attribution_worker.py)
-    │  Consumes attacker.observation.> bus events
-    │  Runs aggregate() per (identity_uuid, primitive)
-    ▼
-AttributionStateRow   state ∈ {unknown, stable, drifting, conflicted, multi_actor}
+BEHAVE-SHELL operates on **asciinema-compatible event streams**: sequences of
+`(t: float, ch: "i"|"o", d: str)` tuples representing timestamped input and
+output chunks from a PTY session.  `"i"` is operator input; `"o"` is terminal
+output.  Non-UTF-8 bytes are handled via surrogateescape.
+
+The DECNET implementation records these as JSONL shards via `sessrec.c`:
+
+```json
+{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
+{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}
 ```

 ---

 ## Session context derivation

-`build_session_context()` performs a single-pass walk over the raw event
-stream and produces a `SessionContext` that all 37 feature functions read.
-The seven derivation steps, in order:
+Before feature extraction, a single-pass walk over the event stream builds a
+`SessionContext` — a set of derived signals that all feature functions share.
+The derivation steps, in order:

-| Step | What it computes |
+| Step | Output |
 |---|---|
-| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars within 200 ms) into `paste_bursts` |
-| **Typing-burst segmentation** | Splits the keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]` (dropped if < 3 IATs) |
-| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line sequences (`0x15`, `0x17`); records IATs between each backspace and the preceding keystroke |
-| **Per-command intra-typing IATs** | For each command, extracts keystroke inter-arrival times from that command's span only |
-| **Command segmentation** | Splits on `\r`/`\n`; per command records `first_token_hash` (SHA-256), tab count, readline shortcut count, and pipe count |
-| **Inter-command IAT gaps** | Time between consecutive commands |
-| **Error detection** | Scans output between commands for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
-| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix after ANSI stripping; caps at 256 chars |
-| **Keyboard layout fingerprinting** | Builds unigram and bigram histograms from typed letters |
+| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts` |
+| **Typing-burst segmentation** | Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs |
+| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke |
+| **Per-command intra-typing IKIs** | For each command, IKIs from that command's span only |
+| **Command segmentation** | Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count |
+| **Inter-command IKI gaps** | Time between consecutive commands |
+| **Error detection** | Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
+| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars |
+| **Keyboard layout fingerprinting** | Unigram and bigram histograms from typed letters |
 | **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run |

-### Key data structures
+### Key structures

 ```
 SessionContext
  sid: str
  t_start, t_end, duration_s: float
-  input_events, output_events: tuple[AsciinemaEvent]
-  iats: tuple[float]                      # inter-keystroke intervals
+  input_events, output_events: tuple[Event]
+  iats: tuple[float]                       # inter-keystroke intervals
  paste_bursts: tuple[PasteBurst]
  typing_bursts: tuple[tuple[float]]
  backspace_count, kill_line_count: int
@@ -104,7 +115,7 @@ SessionContext

 Command
  start_ts, end_ts: float
-  first_token_hash: str    # SHA-256 of first token only
+  first_token_hash: str    # SHA-256, first token only
  tab_count, shortcut_count, pipe_count: int
  errored: bool
  output_bytes: int
@@ -121,507 +132,373 @@ PromptLine

 ## The 37 primitives

-### Motor (9) — muscle memory and physical interaction style
+### Motor (9)

-These primitives capture *how* an operator's fingers interact with the
-keyboard — patterns that persist across sessions, accounts, and even
-operating systems.
+Motor primitives capture muscle memory and physical interaction patterns.
+They are among the most stable signals across sessions and across different
+machines used by the same operator.

-#### 1. `input_modality`
+#### `input_modality`
 Values: `typed` | `pasted` | `mixed`

-Ratio of paste events to total input events.  ≥40 % pasted and ≤5 %
-typed → `pasted`; ≤5 % pasted → `typed`; otherwise `mixed`.
+Ratio of paste events to total input events.  ≥40 % pasted and ≤5 % typed
+→ `pasted`.  ≤5 % pasted → `typed`.  Otherwise `mixed`.

 A script kiddie running pre-written one-liners pastes habitually.  A
 seasoned operator types most commands from memory.

-#### 2. `paste_burst_rate`
+#### `paste_burst_rate`
 Values: `none` | `occasional` | `habitual`

-Coarser bucketing of the paste ratio.  ≥50 % → `habitual`,
-≥10 % → `occasional`.
+Coarser paste-ratio bucketing.  ≥50 % → `habitual`, ≥10 % → `occasional`.

-#### 3. `keystroke_cadence`
+#### `keystroke_cadence`
 Values: `steady` | `bursty` | `hunt_and_peck` | `machine`

 Median coefficient of variation (CV) of within-burst inter-keystroke
-intervals (IKIs).
+intervals (IKIs):

 | CV | Mean IKI | Label |
 |---|---|---|
-| < 0.30 | < 30 ms | `machine` |
-| < 0.45 | any | `steady` |
-| < 0.70 | any | `bursty` |
+| < 0.30 | < 30 ms | `machine` — inhumanly uniform |
+| < 0.45 | any | `steady` — trained touch typist |
+| < 0.70 | any | `bursty` — thinks between phrases |
 | ≥ 0.70 | any | `hunt_and_peck` |

-`machine` catches automated input that passes as human visually but has
-inhumanly uniform inter-key timing.
-
-#### 4. `motor_stability`
+#### `motor_stability`
 Values: `steady` | `variable` | `tremor`

-Fraction of IKIs below the tremor floor (30 ms).  ≥20 % → `tremor`
-(physiological or tool-simulated).  Otherwise the median CV classifies
-`steady` vs `variable`.
+Fraction of IKIs below 30 ms.  ≥20 % → `tremor` (physiological or
+tool-simulated).  Otherwise CV classifies `steady` vs `variable`.

-#### 5. `error_correction`
+#### `error_correction`
 Values: `immediate` | `deferred` | `absent` | `route_around`

 Timing of backspace relative to the preceding keystroke.  Median ≤500 ms
-→ `immediate` (noticed fast, muscle-memory correction).  Median > 500 ms
-→ `deferred` (reads output then corrects).  Zero backspaces but kill-line
-present → `route_around` (ctrl-u / ctrl-w).  No corrections at all →
-`absent`.
+→ `immediate`.  Median > 500 ms → `deferred`.  No backspaces but kill-line
+present → `route_around` (ctrl-u / ctrl-w).  Nothing → `absent`.

-#### 6. `command_chunking`
+#### `command_chunking`
 Values: `fluent` | `fragmented` | `single_command`

 Median CV of per-command intra-typing IKIs.  < 0.40 → `fluent` (commands
-typed as rehearsed phrases).  Otherwise `fragmented`.  Only one command
-in session → `single_command`.
+typed as rehearsed phrases).

-#### 7. `shell_mastery.tab_completion`
+#### `shell_mastery.tab_completion`
 Values: `none` | `occasional` | `habitual`

-Fraction of commands containing at least one `0x09` (tab) keystroke.
-0 → `none`, < 50 % → `occasional`, ≥ 50 % → `habitual`.
+Fraction of commands containing ≥1 tab keystroke.  0 → `none`,
+< 50 % → `occasional`, ≥50 % → `habitual`.

-Operators who tab-complete heavily know the filesystem; those who never do
-either memorise paths or are running a prepared script.
-
-#### 8. `shell_mastery.shortcut_usage`
+#### `shell_mastery.shortcut_usage`
 Values: `none` | `moderate` | `heavy`

-Readline control-byte count (ctrl-a, ctrl-e, ctrl-r, etc.) per command.
-< 0.05 → `none`, < 0.15 → `moderate`, ≥ 0.15 → `heavy`.
+Readline control-byte count per command.  < 0.05 → `none`,
+< 0.15 → `moderate`, ≥0.15 → `heavy`.

-#### 9. `shell_mastery.pipe_chaining_depth`
+#### `shell_mastery.pipe_chaining_depth`
 Values: `shallow` | `moderate` | `deep`

 Median pipe count per command.  ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`.

 ---

-### Cognitive (11) — decision-making and planning style
+### Cognitive (11)

-These primitives capture *how* an operator thinks — their command repertoire,
-response to failure, and how much they read output before acting.
+Cognitive primitives capture decision-making style, planning depth, and how
+the operator processes feedback.

-#### 10. `inter_command_latency_class`
+#### `inter_command_latency_class`
 Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long`

 Median inter-command pause bucketed against calibrated thresholds:

-| Threshold | Label | What it suggests |
+| Threshold | Label | Interpretation |
 |---|---|---|
 | ≤ 0.30 s | `instant` | Scripted or replay |
 | ≤ 1.50 s | `typing_speed` | Commands prepared, typing only |
-| ≤ 2.00 s | `deliberate` | Reads output before next command |
-| ≤ 8.00 s | `llm_lightweight` | May be consulting a fast LLM / notes |
+| ≤ 2.00 s | `deliberate` | Reads output before acting |
+| ≤ 8.00 s | `llm_lightweight` | Consulting a fast LLM or notes |
 | ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference |
-| > 30.00 s | `long` | Long pauses — possibly interrupted or cautious |
+| > 30.00 s | `long` | Interrupted or cautious |

-`llm_lightweight` and `llm_heavyweight` were calibrated against Claude
-Free (fast) and Claude (slow) assisted operator sessions — a novel class
-of adversary DECNET is designed to detect.
+The `llm_*` thresholds were calibrated against real sessions of Claude-assisted
+operators — a novel adversary class BEHAVE-SHELL is explicitly designed to
+detect.

-#### 11. `command_branch_diversity`
+#### `command_branch_diversity`
 Values: `linear_playbook` | `adaptive_branching` | `unknown`

-Unique first-token / total command ratio.  < 5 commands → `unknown`.
-≥ 70 % unique → `linear_playbook` (each command is different — following
-a prepared list).  < 70 % → `adaptive_branching` (repeating tools,
-iterating on a problem).
+Unique first-token ratio.  < 5 commands → `unknown`.  ≥70 % unique →
+`linear_playbook` (following a prepared list).  < 70 % →
+`adaptive_branching` (iterating on a problem).

-#### 12. `feedback_loop_engagement`
+#### `feedback_loop_engagement`
 Values: `closed_loop` | `fire_and_forget` | `unknown`

 Pearson correlation between per-command output bytes and the following
 inter-command pause.  r > 0.30 → `closed_loop` (pauses longer when there
-is more output to read).  Otherwise `fire_and_forget`.  Requires ≥5
-command/output/pause triples.
+is more to read).  Requires ≥5 triples.

-#### 13. `inter_command_consistency`
+#### `inter_command_consistency`
 Values: `metronomic` | `variable` | `bimodal`

 CV of inter-command IKIs.  < 0.40 → `metronomic` (scripts, beacons).
-> 1.50 → `bimodal` (two distinct paces — often short commands interleaved
-with long waits for a compile or download).  Otherwise `variable`.
+> 1.50 → `bimodal` (short commands interleaved with long waits for
+compiles or downloads).

-#### 14. `cognitive_load`
+#### `cognitive_load`
 Values: `low` | `medium` | `high`

-Composite score: mean of (intra-typing CV / 1.0, error rate, pause CV / 1.5).
-< 0.33 → `low`, < 0.67 → `medium`, otherwise `high`.
+Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).

-High cognitive load across multiple sessions on the same identity is a
-signal of an operator working outside their comfort zone — new target OS,
-unfamiliar tooling, or time pressure.
-
-#### 15. `exploration_style`
+#### `exploration_style`
 Values: `methodical` | `targeted` | `chaotic`

-`repetition_rate` = 1 − unique/total commands.
-`backtrack_rate` = fraction of commands that jump back to a previously used
-tool category.  Backtrack ≥30 % → `chaotic`.  Repetition ≥50 % → `targeted`
-(narrow focus, known objective).  Otherwise `methodical`.
+`backtrack_rate` ≥30 % → `chaotic`.  `repetition_rate` ≥50 % → `targeted`.

-#### 16. `planning_depth`
+#### `planning_depth`
 Values: `deep` | `reactive` | `shallow`

-`deep_pause_frac` = fraction of inter-command IKIs > 2.0 s.
-`reactive_frac` = fraction ≤ 0.30 s.  ≥40 % deep pauses → `deep`.
-≥50 % reactive → `reactive`.  Otherwise `shallow`.
+Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).

-#### 17. `tool_vocabulary`
+#### `tool_vocabulary`
 Values: `narrow` | `moderate` | `broad`

-Distinct first-token count (absolute).  ≤3 → `narrow`, ≥10 → `broad`.
+Distinct first-token count.  ≤3 → `narrow`, ≥10 → `broad`.

-#### 18. `error_resilience.retry_tactic`
+#### `error_resilience.retry_tactic`
 Values: `retry_same` | `pivot` | `fallback`

-Post-error behaviour: does the operator retry the same command, switch to
-a different approach, or fall back to reconnaissance?  Skipped if no errors
-occurred in the session.
+Post-error behaviour pattern.  Skipped if no errors.

-#### 19. `error_resilience.frustration_typing`
+#### `error_resilience.frustration_typing`
 Values: `low` | `moderate` | `high`

 Delta between median intra-IKI after an error vs. after a success.
-< 10 % delta → `low`, < 30 % → `moderate`, ≥30 % → `high`.

-Fast typing after errors suggests frustration; slow typing suggests
-deliberation.
-
-#### 20. `error_resilience.fallback_to_man`
+#### `error_resilience.fallback_to_man`
 Values: `present` | `absent`

-After an error, does the next command start with `man`, `help`, or `info`?
-Skipped if no errors.  `present` indicates an operator consulting
-documentation — less automated, less rehearsed.
+After an error, does the next command start with `man`/`help`/`info`?

 ---

-### Temporal (4) — session rhythm and pacing
+### Temporal (4)

-#### 21. `session_duration`
+#### `session_duration`
 Values: `short` | `medium` | `long` | `marathon`

-| Duration | Label |
-|---|---|
-| < 60 s | `short` — single recon or scan |
-| < 600 s | `medium` — targeted interaction |
-| < 3600 s | `long` — sustained operation |
-| ≥ 3600 s | `marathon` — extended presence / slow-burn APT |
+< 60 s / < 600 s / < 3600 s / ≥ 3600 s.

-#### 22. `escalation_pattern`
+#### `escalation_pattern`
 Values: `bursty` | `sustained`

-Dynamic window analysis (window width = max(10 s, duration / target)).
-CV and zero-window fraction classify whether activity clusters into bursts
-separated by idle periods, or maintains a consistent level throughout.
+Dynamic window analysis of activity density over the session lifetime.

-#### 23. `landing_ritual`
+#### `landing_ritual`
 Values: `cleanup` | `exploration` | `passive`

-First ~5 commands classified by intent tokens.  `cleanup` if the operator
-immediately starts removing evidence; `exploration` if they run
-reconnaissance commands (`id`, `whoami`, `uname`, `ls`); `passive` if
-they do nothing that reveals intent.
+Intent of the first ~5 commands.

-#### 24. `exit_behavior`
+#### `exit_behavior`
 Values: `cleanup` | `standard` | `anomalous`

-Last ~5 commands.  `cleanup` if history/log deletion or `exit`/`logout`
-appears.  `anomalous` if the session ends abruptly with no recognisable
-closing pattern.
+Intent of the last ~5 commands.

 ---

-### Environmental (5) — operator's local setup
+### Environmental (5)

-These are stable across an operator's career and change only when they
-switch machines or retool.
+Environmental primitives are stable across an operator's career — they change
+only when the operator switches machines or deliberately retools.

-#### 25. `shell_type`
+#### `shell_type`
 Values: `bash` | `sh` | `zsh` | `fish` | `unknown`

-Detected from PS1 prompt regex patterns after ANSI stripping.
+Detected from PS1 prompt regex patterns.

-#### 26. `terminal_multiplexer`
+#### `terminal_multiplexer`
 Values: `tmux` | `screen` | `none`

-Detected from PS1 markers and characteristic escape sequences.
+Detected from PS1 markers and escape sequences.

-#### 27. `locale`
+#### `locale`
 Values: `en-US` | `en` | `other` | `unknown`

 Language-specific keywords in prompt lines and error messages.

-#### 28. `keyboard_layout`
+#### `keyboard_layout`
 Values: `qwerty` | `dvorak` | `colemak` | `other`

-Bigram frequency analysis of the typed character stream.  Operators who
-touch-type on Dvorak produce a statistically distinct bigram distribution
-that persists even when typing non-English commands.
+Bigram frequency analysis of the typed character stream.  An operator who
+touch-types on Dvorak produces a statistically distinct bigram distribution
+that persists even when typing non-English commands — this is a pure
+stylometric signal derived from motor habit.

-#### 29. `numpad_usage`
+#### `numpad_usage`
 Values: `occasional` | `frequent` | `none`

-Keystroke pattern detection for numpad-originated digits.
-
 ---

-### Operational (4) — mission and OPSEC posture
+### Operational (4)

-#### 30. `objective`
+#### `objective`
 Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive`

-Token-based intent classification of command first-tokens.  Majority vote
-across classified tokens; precedence order applied for ties.  Skipped if
-fewer than 3 classified tokens.
+Token-based intent classification.  Majority vote; skipped if < 3
+classified tokens.

 Example token mappings:
 - `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat`
 - `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync`
- `persistence`: `crontab`, `echo`, `tee`, `systemctl`, `rc.local`
+- `persistence`: `crontab`, `echo >> ~/.bashrc`, `systemctl enable`
 - `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec`
 - `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill`

-#### 31. `opsec_discipline`
+#### `opsec_discipline`
 Values: `careful` | `learning` | `careless`

-Presence of history-disabling tokens (`unset HISTFILE`, `HISTSIZE=0`,
-`history -c`) and cleanup activity in the session tail.  Both → `careful`.
-History-only → `learning` (knows to cover tracks but forgets cleanup).
-Neither → `careless`.
+Presence of history-disabling tokens and cleanup activity.  Both →
+`careful`.  History only → `learning`.  Neither → `careless`.

-#### 32. `cleanup_behavior`
+#### `cleanup_behavior`
 Values: `thorough` | `partial` | `none`

-Distinct cleanup tokens in the last 5 commands.  ≥3 → `thorough`,
-1–2 → `partial`, 0 → `none`.
+Distinct cleanup tokens in the session tail.  ≥3 → `thorough`,
+1–2 → `partial`.

-#### 33. `multi_actor_indicators`
+#### `multi_actor_indicators`
 Values: `solo` | `handoff_detected`

-Splits commands at the session's temporal midpoint and compares the median
-intra-IKI of each half.  If the delta exceeds 50 % and both halves have
-≥4 commands, `handoff_detected` is emitted — the session was likely shared
-between two operators (e.g. initial access handed to a post-exploitation
-specialist).
+Splits commands at the session midpoint and compares median intra-IKI of
+each half.  Delta > 50 % with both halves having ≥4 commands →
+`handoff_detected`.  Suggests the session was shared between two operators
+(initial access handed to a post-exploitation specialist, or a shared
+credential).

 ---

-### Emotional valence (4) — stress and cognitive state
+### Emotional valence (4)

-These features have a hard confidence cap of **0.50** — they contribute to
-attribution but cannot dominate it.  They require ≥80 typed letters to emit.
+These primitives sit at the boundary of motor and stylometric signal.  They
+require ≥80 typed letters and carry a hard confidence cap of **0.50** —
+they contribute to attribution but cannot dominate it.

-#### 34. `valence`
+#### `valence`
 Values: `positive` | `neutral` | `negative`

-Lexical positive/negative token counts.  `positive` if positive count >
-(negative + obscenity) and ≥2 positive tokens.
+Lexical positive/negative token counts.  `positive` requires positive count
+> (negative + obscenity) with ≥2 positive tokens.

-#### 35. `arousal`
+#### `arousal`
 Values: `low_calm` | `medium_engaged` | `high_agitated`

 `high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest
-IKI < 60 ms on ≥30 keystrokes.  `low_calm` if slowest IKI > 300 ms.
-Otherwise `medium_engaged`.
+IKI < 60 ms on ≥30 keystrokes.

-#### 36. `stress_response`
+#### `stress_response`
 Values: `none` | `eustress_positive` | `distress_negative`

-Post-error vs baseline typing speed ratio.  ≥1.20 → `eustress` (types
-faster under pressure — experienced).  ≤ 1/1.20 → `distress` (types
-slower — less experienced or genuinely stressed).
+Post-error vs baseline typing speed ratio.  ≥1.20 → `eustress` (experienced,
+types faster under pressure).  ≤ 1/1.20 → `distress`.

-#### 37. `frustration_venting`
+#### `frustration_venting`
 Values: `low` | `moderate` | `high`

-Post-error frustration token count plus obscenity count.
+Post-error frustration token count plus obscenity count.  A purely
+lexicometric signal.

 ---

-## Attribution state machine
+## Attribution

-Primitives feed a per-`(identity_uuid, primitive)` state machine in
-`decnet/correlation/attribution/aggregate.py`.
+BEHAVE-SHELL does not define how observations are aggregated — that is the
+responsibility of the implementing system's attribution engine.  The DECNET
+reference implementation uses a five-state machine per
+`(identity_uuid, primitive)`:

-### States
+| State | Condition |
+|---|---|
+| `unknown` | < 3 observations |
+| `stable` | Recent N agree, no drift from older N |
+| `drifting` | Recent N agree but differ from older N |
+| `conflicted` | Recent N are split |
+| `multi_actor` | `conflicted` + cross-session alternation |

-| State | Meaning | Condition |
-|---|---|---|
-| `unknown` | Insufficient data | < 3 observations |
-| `stable` | Consistent value | Recent N agree AND no drift from older N |
-| `drifting` | Recently changed | Recent N agree BUT differ from older N |
-| `conflicted` | Contradictory values | Recent N are split (high CV) |
-| `multi_actor` | Multiple operators | `conflicted` + cross-session alternation |
-
-Window size N = 5 (categorical primitives).  EWMA is used for numeric
-primitives (Phase 3).
-
-### Multi-actor detection
-
-The attribution worker runs a `_multi_actor_tick` every 60 seconds.  For
-every `(identity, primitive)` pair in `conflicted` state, it checks whether
-the alternation pattern across sessions is consistent with a credential
-being shared between two distinct operators.  When ≥2 primitives
-independently flag `multi_actor` for the same identity, the bus emits:
-
-```
-attribution.profile.multi_actor_suspected
-  {identity_uuid, primitives: [...], evidence_summary, confidence, ts}
-```
-
-`confidence` is capped at 0.60 — cross-primitive agreement is the real
-signal, but a hard cap prevents over-alarming on noisy primitives.
-
---
-
-## Database tables
-
-### `ObservationRow`
-
-One row per `(evidence_ref, primitive)`.  `evidence_ref` is the session
-shard identifier — the `UniqueConstraint` makes re-processing idempotent.
-
-| Column | Type | Description |
-|---|---|---|
-| `id` | UUID PK | |
-| `identity_uuid` | FK → `attacker_identities` | |
-| `attacker_uuid` | FK → `attackers` | Direct link for pre-clusterer path |
-| `evidence_ref` | TEXT | Shard ID |
-| `primitive` | TEXT | e.g. `keystroke_cadence` |
-| `value` | TEXT | Categorical label or serialised numeric |
-| `confidence` | FLOAT | 0.0–1.0 |
-| `observed_at` | DATETIME | Session end time |
-
-### `AttributionStateRow`
-
-One row per `(identity_uuid, primitive)`.  Updated by the attribution
-worker each time a new observation arrives.
-
-| Column | Type | Description |
-|---|---|---|
-| `identity_uuid` | FK → `attacker_identities` | |
-| `primitive` | TEXT | |
-| `state` | TEXT | `unknown`/`stable`/`drifting`/`conflicted`/`multi_actor` |
-| `current_value` | TEXT | Most recent or EWMA value |
-| `confidence` | FLOAT | |
-| `observation_count` | INT | Total observations aggregated |
-| `last_observation_ts` | DATETIME | |
-
---
-
-## Key thresholds
-
-All calibration constants live in `decnet/profiler/behave_shell/_thresholds.py`
-(416 lines).  The values below are the defaults; they can be overridden per
-deployment without touching feature code.
-
-| Constant | Value | Used by |
-|---|---|---|
-| `PASTE_MIN_CHARS_PER_EVENT` | 4 | Paste detection |
-| `PASTE_BURST_MAX_IAT_S` | 0.20 | Paste burst grouping |
-| `MODALITY_PASTED_MIN` | 0.40 | `input_modality` |
-| `CV_STEADY_MAX` | 0.45 | `keystroke_cadence` |
-| `TREMOR_FAST_FLOOR_S` | 0.030 | `motor_stability` |
-| `IKI_THINK_MAX_S` | 2.0 | Typing-burst split |
-| `INTER_CMD_INSTANT_MAX` | 0.30 s | `inter_command_latency_class` |
-| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | LLM-assisted detection |
-| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | LLM-assisted detection |
-| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | `command_branch_diversity` |
-| `FEEDBACK_CORRELATION_MIN` | 0.30 | `feedback_loop_engagement` |
-| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | `inter_command_consistency` |
-| `PAUSE_CV_BIMODAL_MIN` | 1.50 | `inter_command_consistency` |
-| `SESSION_DURATION_SHORT_MAX` | 60 s | `session_duration` |
-| `SESSION_DURATION_MEDIUM_MAX` | 600 s | `session_duration` |
-| `SESSION_DURATION_LONG_MAX` | 3600 s | `session_duration` |
-| `MIN_OBSERVATIONS_FOR_STATE` | 3 | Attribution state machine |
-| `CATEGORICAL_WINDOW_N` | 5 | Attribution window |
-| `MULTI_ACTOR_TICK_SECS` | 60 | Multi-actor tick |
-| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | All `emotional_valence` features |
+Window N = 5 for categorical primitives.  When ≥2 primitives independently
+reach `multi_actor` for the same identity, the engine emits a
+`multi_actor_suspected` signal — a strong indicator of a shared credential
+or a compromised operator account.

 ---

 ## Calibration

-The system was calibrated against five behavioural classes across 15 sessions
-(424 total observations):
+The reference thresholds were calibrated against five behavioural classes
+across 15 sessions (424 total observations):

 | Class | Sessions | Observations | Description |
 |---|---|---|---|
-| `HUMAN` | 1 | 34 | Human operator, no assistance |
+| `HUMAN` | 1 | 34 | Human operator, unassisted |
 | `YOU-sim` | 2 | 59 | Human-simulated scripted attacker |
 | `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator |
-| `CLAUDE-FF` | 3 | 84 | Claude (fast/free tier) assisted |
-| `CLAUDE-CL` | 4 | 111 | Claude (standard tier) assisted |
+| `CLAUDE-FF` | 3 | 84 | Claude (fast) assisted |
+| `CLAUDE-CL` | 4 | 111 | Claude (standard) assisted |

-All classes emit ≥27 distinct primitives (pass threshold).
-
-The `inter_command_latency_class` thresholds `llm_lightweight` (≤8 s) and
-`llm_heavyweight` (≤30 s) were derived from timing measurements of these
-sessions — DECNET can distinguish a human-with-fast-LLM from an unassisted
-human in a single session with moderate confidence, and with high confidence
-across 3+ sessions.
+All classes emit ≥27 distinct primitives.  The `inter_command_latency_class`
+LLM buckets are the primary discriminator between unassisted and
+LLM-assisted operators in single-session analysis; cross-session attribution
+uses the full primitive set.

 ---

-## Testing
+## Key thresholds (reference implementation)

-```bash
-# Offline smoke test — 5 shards, mock bus, must emit ≥27 distinct per class
-scripts/behave_shell/smoke.sh
+All constants live in `_thresholds.py`.

-# Live round-trip — replay calibration shards through a running DECNET
-scripts/behave_shell/replay_calibration.py
-```
+| Constant | Value |
+|---|---|
+| `PASTE_MIN_CHARS_PER_EVENT` | 4 |
+| `PASTE_BURST_MAX_IAT_S` | 0.20 |
+| `IKI_THINK_MAX_S` | 2.0 (typing-burst split) |
+| `TREMOR_FAST_FLOOR_S` | 0.030 |
+| `CV_STEADY_MAX` | 0.45 |
+| `INTER_CMD_INSTANT_MAX` | 0.30 s |
+| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s |
+| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s |
+| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 |
+| `FEEDBACK_CORRELATION_MIN` | 0.30 |
+| `PAUSE_CV_METRONOMIC_MAX` | 0.40 |
+| `PAUSE_CV_BIMODAL_MIN` | 1.50 |
+| `SESSION_DURATION_SHORT_MAX` | 60 s |
+| `SESSION_DURATION_MEDIUM_MAX` | 600 s |
+| `SESSION_DURATION_LONG_MAX` | 3600 s |
+| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 |
+| `MIN_OBSERVATIONS_FOR_STATE` | 3 |
+| `CATEGORICAL_WINDOW_N` | 5 |

 ---

-## File reference
+## DECNET implementation

-```
-decnet/profiler/behave_shell/
-  __init__.py               Public API: extract_session()
-  extract.py                Entry point — fans out to FEATURES registry (51 lines)
-  _ctx.py                   SessionContext builder (573 lines)
-  _parse.py                 Asciinema JSONL parsing (272 lines)
-  _handler.py               Bus subscriber — disk I/O, persistence, publish (235 lines)
-  _intent.py                Token → intent classification (115 lines)
-  _thresholds.py            All calibration constants (416 lines)
-  _features/
-    __init__.py             FEATURES registry — list of 37 functions (104 lines)
-    motor.py                Primitives 1–9 (422 lines)
-    cognitive.py            Primitives 10–20 (593 lines)
-    temporal.py             Primitives 21–24 (237 lines)
-    environmental.py        Primitives 25–29 (352 lines)
-    operational.py          Primitives 30–33 (218 lines)
-    emotional_valence.py    Primitives 34–37 (223 lines)
+In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every
+`attacker.session.ended` bus event.  The worker reads the PTY shard from disk,
+runs `extract_session()`, and upserts one `ObservationRow` per primitive per
+session.  A `UniqueConstraint(evidence_ref, primitive)` makes re-processing
+idempotent.

-decnet/correlation/
-  attribution_worker.py     Bus loop: consume observations, run tick
-  attribution/
-    aggregate.py            State machine: unknown→stable→drifting→conflicted→multi_actor
-    _thresholds.py          Attribution-layer thresholds
+The attribution worker consumes `attacker.observation.*` bus events and
+maintains one `AttributionStateRow` per `(identity_uuid, primitive)`.

-decnet/web/db/models/
-  observations.py           ObservationRow schema
-  attribution_state.py      AttributionStateRow schema
-```
+Source: `decnet/profiler/behave_shell/` (~3 868 lines across 12 files).

 ---

-## Related pages
+## See also

- [Fingerprinting](Fingerprinting) — all fingerprint layers, including the
-  BEHAVE-SHELL summary
- [Identity-Resolution](Identity-Resolution) — how observations are clustered
-  into attacker identities and how state machine transitions propagate
- [Service-Personas](Service-Personas) — enabling session recording and
-  BEHAVE-SHELL per service
+- **BEHAVE-TEXT** — sibling spec for written-text stylometry and lexicometry,
+  implemented by [EYENET](https://github.com/xmartlab/eyenet)
+- [Fingerprinting](Fingerprinting) — all DECNET fingerprint layers
+- [Identity-Resolution](Identity-Resolution) — how observations feed the
+  identity clusterer