From 5a39211645dc25c5269dd567c5dbcb8494ae247f Mon Sep 17 00:00:00 2001 From: anti Date: Sun, 10 May 2026 04:31:36 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20reframe=20BEHAVE-SHELL=20as=20a=20spec,?= =?UTF-8?q?=20not=20a=20DECNET=20component=20=E2=80=94=20add=20stylometry/?= =?UTF-8?q?lexicometry=20scope,=20BEHAVE-TEXT/EYENET=20cross-reference?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- BEHAVE-SHELL.md | 631 +++++++++++++++++++----------------------------- 1 file changed, 254 insertions(+), 377 deletions(-) diff --git a/BEHAVE-SHELL.md b/BEHAVE-SHELL.md index a916b53..4ccd88b 100644 --- a/BEHAVE-SHELL.md +++ b/BEHAVE-SHELL.md @@ -1,95 +1,106 @@ # BEHAVE-SHELL -BEHAVE-SHELL is DECNET's behavioural biometrics engine for interactive shell -sessions. It transforms raw PTY recordings into 37 attribution primitives -that fingerprint *how* an operator works — their motor patterns, cognitive -style, OPSEC habits, and emotional state — independently of what IP address -or tooling they use. +BEHAVE-SHELL is a **behavioural biometrics specification** for interactive +shell sessions. It defines a set of attribution primitives — observable, +computable signals — that characterise *how* an operator works at a terminal, +independently of what IP address, credential, or tooling they use. -The primitives feed the [Identity-Resolution](Identity-Resolution) attribution -state machine, which accumulates evidence across sessions to answer: *is this -the same hands?* +The spec was born out of DECNET's need to correlate attackers across sessions +and IP changes, but it is not DECNET-specific. Any system that records PTY +sessions can implement BEHAVE-SHELL extraction and feed the resulting +primitives into an attribution engine. DECNET is the reference implementation. + +A sibling specification, **BEHAVE-TEXT**, defines equivalent primitives for +written text — stylometry, lexicometry, and discourse structure — and is +implemented by [EYENET](https://github.com/xmartlab/eyenet). + +--- + +## Scope + +BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The +current specification covers three broad domains: + +| Domain | What it captures | +|---|---| +| **Motor biometrics** | Keystroke timing, error correction, paste vs. type habits, shell mastery signals | +| **Cognitive / behavioural** | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure | +| **Stylometry / lexicometry** | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions | + +The emotional valence cluster (`valence`, `arousal`, `stress_response`, +`frustration_venting`) sits at the boundary of motor and stylometric signal — +it measures both typing speed changes and lexical content after stress events. --- ## Design principles -- **Pure extraction library.** `extract_session()` takes an iterable of - asciinema events and yields `Observation` envelopes. No I/O, no DB access, - no bus calls. The worker owns all side effects. -- **PII by design.** Command text is never stored in plain form — only the +- **Extraction is pure.** The spec defines a function + `extract_session(events) → Observations` that takes an iterable of timestamped + PTY events and yields structured observations. No I/O. No database. + No side effects. Implementations are free to run this in any context. + +- **PII by design.** Command text is never stored in plain form. Only the SHA-256 of the first token is retained. Output is reduced to a byte count and an error verdict. Prompt lines are ANSI-stripped and capped at 256 - characters. -- **Idempotent persistence.** `UniqueConstraint(evidence_ref, primitive)` - on the observations table means replaying a shard never duplicates rows. -- **Confidence capping.** Emotional-valence features carry a hard confidence - cap of 0.50 — they contribute, but never dominate an attribution decision. + characters. Raw bigram/unigram counts are used for layout fingerprinting — + not the text itself. + +- **Confidence is explicit.** Every observation carries a confidence value + [0.0–1.0]. Features that are inherently noisier have hard confidence caps + (emotional valence: 0.50). Attribution engines must propagate confidence + rather than treating all observations as equal. + +- **Skip conditions over imputation.** A feature that cannot be computed on a + given session (e.g. `error_resilience` features when no errors occurred) + yields no observation rather than a default value. Attribution engines + treat absence of an observation differently from an `unknown` state. --- -## Data flow +## Input format -``` -PTY session - │ - ▼ -sessrec.c — writes JSONL shard per session - │ {"sid": id, "t": ts, "ch": "i"|"o", "d": data} - │ Non-UTF-8 bytes handled via surrogateescape - ▼ -attacker.session.ended bus event - │ - ▼ -_handler.handle_session_ended() - │ Reads shard from disk → parse_shard_line() → AsciinemaEvent tuples - ▼ -build_session_context() (_ctx.py, ~573 lines) - │ Seven derivation steps (see below) - ▼ -extract_session() (extract.py) - │ Fan-out across 37 registered feature functions (FEATURES registry) - │ Each yields 0..N Observation envelopes - ▼ -Upsert ObservationRow → publish attacker.observation.* - │ - ▼ -attribution_worker (attribution_worker.py) - │ Consumes attacker.observation.> bus events - │ Runs aggregate() per (identity_uuid, primitive) - ▼ -AttributionStateRow state ∈ {unknown, stable, drifting, conflicted, multi_actor} +BEHAVE-SHELL operates on **asciinema-compatible event streams**: sequences of +`(t: float, ch: "i"|"o", d: str)` tuples representing timestamped input and +output chunks from a PTY session. `"i"` is operator input; `"o"` is terminal +output. Non-UTF-8 bytes are handled via surrogateescape. + +The DECNET implementation records these as JSONL shards via `sessrec.c`: + +```json +{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"} +{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."} ``` --- ## Session context derivation -`build_session_context()` performs a single-pass walk over the raw event -stream and produces a `SessionContext` that all 37 feature functions read. -The seven derivation steps, in order: +Before feature extraction, a single-pass walk over the event stream builds a +`SessionContext` — a set of derived signals that all feature functions share. +The derivation steps, in order: -| Step | What it computes | +| Step | Output | |---|---| -| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars within 200 ms) into `paste_bursts` | -| **Typing-burst segmentation** | Splits the keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]` (dropped if < 3 IATs) | -| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line sequences (`0x15`, `0x17`); records IATs between each backspace and the preceding keystroke | -| **Per-command intra-typing IATs** | For each command, extracts keystroke inter-arrival times from that command's span only | -| **Command segmentation** | Splits on `\r`/`\n`; per command records `first_token_hash` (SHA-256), tab count, readline shortcut count, and pipe count | -| **Inter-command IAT gaps** | Time between consecutive commands | -| **Error detection** | Scans output between commands for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` | -| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix after ANSI stripping; caps at 256 chars | -| **Keyboard layout fingerprinting** | Builds unigram and bigram histograms from typed letters | +| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts` | +| **Typing-burst segmentation** | Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs | +| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke | +| **Per-command intra-typing IKIs** | For each command, IKIs from that command's span only | +| **Command segmentation** | Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count | +| **Inter-command IKI gaps** | Time between consecutive commands | +| **Error detection** | Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` | +| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars | +| **Keyboard layout fingerprinting** | Unigram and bigram histograms from typed letters | | **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run | -### Key data structures +### Key structures ``` SessionContext sid: str t_start, t_end, duration_s: float - input_events, output_events: tuple[AsciinemaEvent] - iats: tuple[float] # inter-keystroke intervals + input_events, output_events: tuple[Event] + iats: tuple[float] # inter-keystroke intervals paste_bursts: tuple[PasteBurst] typing_bursts: tuple[tuple[float]] backspace_count, kill_line_count: int @@ -104,7 +115,7 @@ SessionContext Command start_ts, end_ts: float - first_token_hash: str # SHA-256 of first token only + first_token_hash: str # SHA-256, first token only tab_count, shortcut_count, pipe_count: int errored: bool output_bytes: int @@ -121,507 +132,373 @@ PromptLine ## The 37 primitives -### Motor (9) — muscle memory and physical interaction style +### Motor (9) -These primitives capture *how* an operator's fingers interact with the -keyboard — patterns that persist across sessions, accounts, and even -operating systems. +Motor primitives capture muscle memory and physical interaction patterns. +They are among the most stable signals across sessions and across different +machines used by the same operator. -#### 1. `input_modality` +#### `input_modality` Values: `typed` | `pasted` | `mixed` -Ratio of paste events to total input events. ≥40 % pasted and ≤5 % -typed → `pasted`; ≤5 % pasted → `typed`; otherwise `mixed`. +Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed +→ `pasted`. ≤5 % pasted → `typed`. Otherwise `mixed`. A script kiddie running pre-written one-liners pastes habitually. A seasoned operator types most commands from memory. -#### 2. `paste_burst_rate` +#### `paste_burst_rate` Values: `none` | `occasional` | `habitual` -Coarser bucketing of the paste ratio. ≥50 % → `habitual`, -≥10 % → `occasional`. +Coarser paste-ratio bucketing. ≥50 % → `habitual`, ≥10 % → `occasional`. -#### 3. `keystroke_cadence` +#### `keystroke_cadence` Values: `steady` | `bursty` | `hunt_and_peck` | `machine` Median coefficient of variation (CV) of within-burst inter-keystroke -intervals (IKIs). +intervals (IKIs): | CV | Mean IKI | Label | |---|---|---| -| < 0.30 | < 30 ms | `machine` | -| < 0.45 | any | `steady` | -| < 0.70 | any | `bursty` | +| < 0.30 | < 30 ms | `machine` — inhumanly uniform | +| < 0.45 | any | `steady` — trained touch typist | +| < 0.70 | any | `bursty` — thinks between phrases | | ≥ 0.70 | any | `hunt_and_peck` | -`machine` catches automated input that passes as human visually but has -inhumanly uniform inter-key timing. - -#### 4. `motor_stability` +#### `motor_stability` Values: `steady` | `variable` | `tremor` -Fraction of IKIs below the tremor floor (30 ms). ≥20 % → `tremor` -(physiological or tool-simulated). Otherwise the median CV classifies -`steady` vs `variable`. +Fraction of IKIs below 30 ms. ≥20 % → `tremor` (physiological or +tool-simulated). Otherwise CV classifies `steady` vs `variable`. -#### 5. `error_correction` +#### `error_correction` Values: `immediate` | `deferred` | `absent` | `route_around` Timing of backspace relative to the preceding keystroke. Median ≤500 ms -→ `immediate` (noticed fast, muscle-memory correction). Median > 500 ms -→ `deferred` (reads output then corrects). Zero backspaces but kill-line -present → `route_around` (ctrl-u / ctrl-w). No corrections at all → -`absent`. +→ `immediate`. Median > 500 ms → `deferred`. No backspaces but kill-line +present → `route_around` (ctrl-u / ctrl-w). Nothing → `absent`. -#### 6. `command_chunking` +#### `command_chunking` Values: `fluent` | `fragmented` | `single_command` Median CV of per-command intra-typing IKIs. < 0.40 → `fluent` (commands -typed as rehearsed phrases). Otherwise `fragmented`. Only one command -in session → `single_command`. +typed as rehearsed phrases). -#### 7. `shell_mastery.tab_completion` +#### `shell_mastery.tab_completion` Values: `none` | `occasional` | `habitual` -Fraction of commands containing at least one `0x09` (tab) keystroke. -0 → `none`, < 50 % → `occasional`, ≥ 50 % → `habitual`. +Fraction of commands containing ≥1 tab keystroke. 0 → `none`, +< 50 % → `occasional`, ≥50 % → `habitual`. -Operators who tab-complete heavily know the filesystem; those who never do -either memorise paths or are running a prepared script. - -#### 8. `shell_mastery.shortcut_usage` +#### `shell_mastery.shortcut_usage` Values: `none` | `moderate` | `heavy` -Readline control-byte count (ctrl-a, ctrl-e, ctrl-r, etc.) per command. -< 0.05 → `none`, < 0.15 → `moderate`, ≥ 0.15 → `heavy`. +Readline control-byte count per command. < 0.05 → `none`, +< 0.15 → `moderate`, ≥0.15 → `heavy`. -#### 9. `shell_mastery.pipe_chaining_depth` +#### `shell_mastery.pipe_chaining_depth` Values: `shallow` | `moderate` | `deep` Median pipe count per command. ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`. --- -### Cognitive (11) — decision-making and planning style +### Cognitive (11) -These primitives capture *how* an operator thinks — their command repertoire, -response to failure, and how much they read output before acting. +Cognitive primitives capture decision-making style, planning depth, and how +the operator processes feedback. -#### 10. `inter_command_latency_class` +#### `inter_command_latency_class` Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long` Median inter-command pause bucketed against calibrated thresholds: -| Threshold | Label | What it suggests | +| Threshold | Label | Interpretation | |---|---|---| | ≤ 0.30 s | `instant` | Scripted or replay | | ≤ 1.50 s | `typing_speed` | Commands prepared, typing only | -| ≤ 2.00 s | `deliberate` | Reads output before next command | -| ≤ 8.00 s | `llm_lightweight` | May be consulting a fast LLM / notes | +| ≤ 2.00 s | `deliberate` | Reads output before acting | +| ≤ 8.00 s | `llm_lightweight` | Consulting a fast LLM or notes | | ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference | -| > 30.00 s | `long` | Long pauses — possibly interrupted or cautious | +| > 30.00 s | `long` | Interrupted or cautious | -`llm_lightweight` and `llm_heavyweight` were calibrated against Claude -Free (fast) and Claude (slow) assisted operator sessions — a novel class -of adversary DECNET is designed to detect. +The `llm_*` thresholds were calibrated against real sessions of Claude-assisted +operators — a novel adversary class BEHAVE-SHELL is explicitly designed to +detect. -#### 11. `command_branch_diversity` +#### `command_branch_diversity` Values: `linear_playbook` | `adaptive_branching` | `unknown` -Unique first-token / total command ratio. < 5 commands → `unknown`. -≥ 70 % unique → `linear_playbook` (each command is different — following -a prepared list). < 70 % → `adaptive_branching` (repeating tools, -iterating on a problem). +Unique first-token ratio. < 5 commands → `unknown`. ≥70 % unique → +`linear_playbook` (following a prepared list). < 70 % → +`adaptive_branching` (iterating on a problem). -#### 12. `feedback_loop_engagement` +#### `feedback_loop_engagement` Values: `closed_loop` | `fire_and_forget` | `unknown` Pearson correlation between per-command output bytes and the following inter-command pause. r > 0.30 → `closed_loop` (pauses longer when there -is more output to read). Otherwise `fire_and_forget`. Requires ≥5 -command/output/pause triples. +is more to read). Requires ≥5 triples. -#### 13. `inter_command_consistency` +#### `inter_command_consistency` Values: `metronomic` | `variable` | `bimodal` CV of inter-command IKIs. < 0.40 → `metronomic` (scripts, beacons). -> 1.50 → `bimodal` (two distinct paces — often short commands interleaved -with long waits for a compile or download). Otherwise `variable`. +> 1.50 → `bimodal` (short commands interleaved with long waits for +compiles or downloads). -#### 14. `cognitive_load` +#### `cognitive_load` Values: `low` | `medium` | `high` -Composite score: mean of (intra-typing CV / 1.0, error rate, pause CV / 1.5). -< 0.33 → `low`, < 0.67 → `medium`, otherwise `high`. +Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5). -High cognitive load across multiple sessions on the same identity is a -signal of an operator working outside their comfort zone — new target OS, -unfamiliar tooling, or time pressure. - -#### 15. `exploration_style` +#### `exploration_style` Values: `methodical` | `targeted` | `chaotic` -`repetition_rate` = 1 − unique/total commands. -`backtrack_rate` = fraction of commands that jump back to a previously used -tool category. Backtrack ≥30 % → `chaotic`. Repetition ≥50 % → `targeted` -(narrow focus, known objective). Otherwise `methodical`. +`backtrack_rate` ≥30 % → `chaotic`. `repetition_rate` ≥50 % → `targeted`. -#### 16. `planning_depth` +#### `planning_depth` Values: `deep` | `reactive` | `shallow` -`deep_pause_frac` = fraction of inter-command IKIs > 2.0 s. -`reactive_frac` = fraction ≤ 0.30 s. ≥40 % deep pauses → `deep`. -≥50 % reactive → `reactive`. Otherwise `shallow`. +Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive). -#### 17. `tool_vocabulary` +#### `tool_vocabulary` Values: `narrow` | `moderate` | `broad` -Distinct first-token count (absolute). ≤3 → `narrow`, ≥10 → `broad`. +Distinct first-token count. ≤3 → `narrow`, ≥10 → `broad`. -#### 18. `error_resilience.retry_tactic` +#### `error_resilience.retry_tactic` Values: `retry_same` | `pivot` | `fallback` -Post-error behaviour: does the operator retry the same command, switch to -a different approach, or fall back to reconnaissance? Skipped if no errors -occurred in the session. +Post-error behaviour pattern. Skipped if no errors. -#### 19. `error_resilience.frustration_typing` +#### `error_resilience.frustration_typing` Values: `low` | `moderate` | `high` Delta between median intra-IKI after an error vs. after a success. -< 10 % delta → `low`, < 30 % → `moderate`, ≥30 % → `high`. -Fast typing after errors suggests frustration; slow typing suggests -deliberation. - -#### 20. `error_resilience.fallback_to_man` +#### `error_resilience.fallback_to_man` Values: `present` | `absent` -After an error, does the next command start with `man`, `help`, or `info`? -Skipped if no errors. `present` indicates an operator consulting -documentation — less automated, less rehearsed. +After an error, does the next command start with `man`/`help`/`info`? --- -### Temporal (4) — session rhythm and pacing +### Temporal (4) -#### 21. `session_duration` +#### `session_duration` Values: `short` | `medium` | `long` | `marathon` -| Duration | Label | -|---|---| -| < 60 s | `short` — single recon or scan | -| < 600 s | `medium` — targeted interaction | -| < 3600 s | `long` — sustained operation | -| ≥ 3600 s | `marathon` — extended presence / slow-burn APT | +< 60 s / < 600 s / < 3600 s / ≥ 3600 s. -#### 22. `escalation_pattern` +#### `escalation_pattern` Values: `bursty` | `sustained` -Dynamic window analysis (window width = max(10 s, duration / target)). -CV and zero-window fraction classify whether activity clusters into bursts -separated by idle periods, or maintains a consistent level throughout. +Dynamic window analysis of activity density over the session lifetime. -#### 23. `landing_ritual` +#### `landing_ritual` Values: `cleanup` | `exploration` | `passive` -First ~5 commands classified by intent tokens. `cleanup` if the operator -immediately starts removing evidence; `exploration` if they run -reconnaissance commands (`id`, `whoami`, `uname`, `ls`); `passive` if -they do nothing that reveals intent. +Intent of the first ~5 commands. -#### 24. `exit_behavior` +#### `exit_behavior` Values: `cleanup` | `standard` | `anomalous` -Last ~5 commands. `cleanup` if history/log deletion or `exit`/`logout` -appears. `anomalous` if the session ends abruptly with no recognisable -closing pattern. +Intent of the last ~5 commands. --- -### Environmental (5) — operator's local setup +### Environmental (5) -These are stable across an operator's career and change only when they -switch machines or retool. +Environmental primitives are stable across an operator's career — they change +only when the operator switches machines or deliberately retools. -#### 25. `shell_type` +#### `shell_type` Values: `bash` | `sh` | `zsh` | `fish` | `unknown` -Detected from PS1 prompt regex patterns after ANSI stripping. +Detected from PS1 prompt regex patterns. -#### 26. `terminal_multiplexer` +#### `terminal_multiplexer` Values: `tmux` | `screen` | `none` -Detected from PS1 markers and characteristic escape sequences. +Detected from PS1 markers and escape sequences. -#### 27. `locale` +#### `locale` Values: `en-US` | `en` | `other` | `unknown` Language-specific keywords in prompt lines and error messages. -#### 28. `keyboard_layout` +#### `keyboard_layout` Values: `qwerty` | `dvorak` | `colemak` | `other` -Bigram frequency analysis of the typed character stream. Operators who -touch-type on Dvorak produce a statistically distinct bigram distribution -that persists even when typing non-English commands. +Bigram frequency analysis of the typed character stream. An operator who +touch-types on Dvorak produces a statistically distinct bigram distribution +that persists even when typing non-English commands — this is a pure +stylometric signal derived from motor habit. -#### 29. `numpad_usage` +#### `numpad_usage` Values: `occasional` | `frequent` | `none` -Keystroke pattern detection for numpad-originated digits. - --- -### Operational (4) — mission and OPSEC posture +### Operational (4) -#### 30. `objective` +#### `objective` Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive` -Token-based intent classification of command first-tokens. Majority vote -across classified tokens; precedence order applied for ties. Skipped if -fewer than 3 classified tokens. +Token-based intent classification. Majority vote; skipped if < 3 +classified tokens. Example token mappings: - `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat` - `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync` -- `persistence`: `crontab`, `echo`, `tee`, `systemctl`, `rc.local` +- `persistence`: `crontab`, `echo >> ~/.bashrc`, `systemctl enable` - `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec` - `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill` -#### 31. `opsec_discipline` +#### `opsec_discipline` Values: `careful` | `learning` | `careless` -Presence of history-disabling tokens (`unset HISTFILE`, `HISTSIZE=0`, -`history -c`) and cleanup activity in the session tail. Both → `careful`. -History-only → `learning` (knows to cover tracks but forgets cleanup). -Neither → `careless`. +Presence of history-disabling tokens and cleanup activity. Both → +`careful`. History only → `learning`. Neither → `careless`. -#### 32. `cleanup_behavior` +#### `cleanup_behavior` Values: `thorough` | `partial` | `none` -Distinct cleanup tokens in the last 5 commands. ≥3 → `thorough`, -1–2 → `partial`, 0 → `none`. +Distinct cleanup tokens in the session tail. ≥3 → `thorough`, +1–2 → `partial`. -#### 33. `multi_actor_indicators` +#### `multi_actor_indicators` Values: `solo` | `handoff_detected` -Splits commands at the session's temporal midpoint and compares the median -intra-IKI of each half. If the delta exceeds 50 % and both halves have -≥4 commands, `handoff_detected` is emitted — the session was likely shared -between two operators (e.g. initial access handed to a post-exploitation -specialist). +Splits commands at the session midpoint and compares median intra-IKI of +each half. Delta > 50 % with both halves having ≥4 commands → +`handoff_detected`. Suggests the session was shared between two operators +(initial access handed to a post-exploitation specialist, or a shared +credential). --- -### Emotional valence (4) — stress and cognitive state +### Emotional valence (4) -These features have a hard confidence cap of **0.50** — they contribute to -attribution but cannot dominate it. They require ≥80 typed letters to emit. +These primitives sit at the boundary of motor and stylometric signal. They +require ≥80 typed letters and carry a hard confidence cap of **0.50** — +they contribute to attribution but cannot dominate it. -#### 34. `valence` +#### `valence` Values: `positive` | `neutral` | `negative` -Lexical positive/negative token counts. `positive` if positive count > -(negative + obscenity) and ≥2 positive tokens. +Lexical positive/negative token counts. `positive` requires positive count +> (negative + obscenity) with ≥2 positive tokens. -#### 35. `arousal` +#### `arousal` Values: `low_calm` | `medium_engaged` | `high_agitated` `high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest -IKI < 60 ms on ≥30 keystrokes. `low_calm` if slowest IKI > 300 ms. -Otherwise `medium_engaged`. +IKI < 60 ms on ≥30 keystrokes. -#### 36. `stress_response` +#### `stress_response` Values: `none` | `eustress_positive` | `distress_negative` -Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (types -faster under pressure — experienced). ≤ 1/1.20 → `distress` (types -slower — less experienced or genuinely stressed). +Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (experienced, +types faster under pressure). ≤ 1/1.20 → `distress`. -#### 37. `frustration_venting` +#### `frustration_venting` Values: `low` | `moderate` | `high` -Post-error frustration token count plus obscenity count. +Post-error frustration token count plus obscenity count. A purely +lexicometric signal. --- -## Attribution state machine +## Attribution -Primitives feed a per-`(identity_uuid, primitive)` state machine in -`decnet/correlation/attribution/aggregate.py`. +BEHAVE-SHELL does not define how observations are aggregated — that is the +responsibility of the implementing system's attribution engine. The DECNET +reference implementation uses a five-state machine per +`(identity_uuid, primitive)`: -### States +| State | Condition | +|---|---| +| `unknown` | < 3 observations | +| `stable` | Recent N agree, no drift from older N | +| `drifting` | Recent N agree but differ from older N | +| `conflicted` | Recent N are split | +| `multi_actor` | `conflicted` + cross-session alternation | -| State | Meaning | Condition | -|---|---|---| -| `unknown` | Insufficient data | < 3 observations | -| `stable` | Consistent value | Recent N agree AND no drift from older N | -| `drifting` | Recently changed | Recent N agree BUT differ from older N | -| `conflicted` | Contradictory values | Recent N are split (high CV) | -| `multi_actor` | Multiple operators | `conflicted` + cross-session alternation | - -Window size N = 5 (categorical primitives). EWMA is used for numeric -primitives (Phase 3). - -### Multi-actor detection - -The attribution worker runs a `_multi_actor_tick` every 60 seconds. For -every `(identity, primitive)` pair in `conflicted` state, it checks whether -the alternation pattern across sessions is consistent with a credential -being shared between two distinct operators. When ≥2 primitives -independently flag `multi_actor` for the same identity, the bus emits: - -``` -attribution.profile.multi_actor_suspected - {identity_uuid, primitives: [...], evidence_summary, confidence, ts} -``` - -`confidence` is capped at 0.60 — cross-primitive agreement is the real -signal, but a hard cap prevents over-alarming on noisy primitives. - ---- - -## Database tables - -### `ObservationRow` - -One row per `(evidence_ref, primitive)`. `evidence_ref` is the session -shard identifier — the `UniqueConstraint` makes re-processing idempotent. - -| Column | Type | Description | -|---|---|---| -| `id` | UUID PK | | -| `identity_uuid` | FK → `attacker_identities` | | -| `attacker_uuid` | FK → `attackers` | Direct link for pre-clusterer path | -| `evidence_ref` | TEXT | Shard ID | -| `primitive` | TEXT | e.g. `keystroke_cadence` | -| `value` | TEXT | Categorical label or serialised numeric | -| `confidence` | FLOAT | 0.0–1.0 | -| `observed_at` | DATETIME | Session end time | - -### `AttributionStateRow` - -One row per `(identity_uuid, primitive)`. Updated by the attribution -worker each time a new observation arrives. - -| Column | Type | Description | -|---|---|---| -| `identity_uuid` | FK → `attacker_identities` | | -| `primitive` | TEXT | | -| `state` | TEXT | `unknown`/`stable`/`drifting`/`conflicted`/`multi_actor` | -| `current_value` | TEXT | Most recent or EWMA value | -| `confidence` | FLOAT | | -| `observation_count` | INT | Total observations aggregated | -| `last_observation_ts` | DATETIME | | - ---- - -## Key thresholds - -All calibration constants live in `decnet/profiler/behave_shell/_thresholds.py` -(416 lines). The values below are the defaults; they can be overridden per -deployment without touching feature code. - -| Constant | Value | Used by | -|---|---|---| -| `PASTE_MIN_CHARS_PER_EVENT` | 4 | Paste detection | -| `PASTE_BURST_MAX_IAT_S` | 0.20 | Paste burst grouping | -| `MODALITY_PASTED_MIN` | 0.40 | `input_modality` | -| `CV_STEADY_MAX` | 0.45 | `keystroke_cadence` | -| `TREMOR_FAST_FLOOR_S` | 0.030 | `motor_stability` | -| `IKI_THINK_MAX_S` | 2.0 | Typing-burst split | -| `INTER_CMD_INSTANT_MAX` | 0.30 s | `inter_command_latency_class` | -| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | LLM-assisted detection | -| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | LLM-assisted detection | -| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | `command_branch_diversity` | -| `FEEDBACK_CORRELATION_MIN` | 0.30 | `feedback_loop_engagement` | -| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | `inter_command_consistency` | -| `PAUSE_CV_BIMODAL_MIN` | 1.50 | `inter_command_consistency` | -| `SESSION_DURATION_SHORT_MAX` | 60 s | `session_duration` | -| `SESSION_DURATION_MEDIUM_MAX` | 600 s | `session_duration` | -| `SESSION_DURATION_LONG_MAX` | 3600 s | `session_duration` | -| `MIN_OBSERVATIONS_FOR_STATE` | 3 | Attribution state machine | -| `CATEGORICAL_WINDOW_N` | 5 | Attribution window | -| `MULTI_ACTOR_TICK_SECS` | 60 | Multi-actor tick | -| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | All `emotional_valence` features | +Window N = 5 for categorical primitives. When ≥2 primitives independently +reach `multi_actor` for the same identity, the engine emits a +`multi_actor_suspected` signal — a strong indicator of a shared credential +or a compromised operator account. --- ## Calibration -The system was calibrated against five behavioural classes across 15 sessions -(424 total observations): +The reference thresholds were calibrated against five behavioural classes +across 15 sessions (424 total observations): | Class | Sessions | Observations | Description | |---|---|---|---| -| `HUMAN` | 1 | 34 | Human operator, no assistance | +| `HUMAN` | 1 | 34 | Human operator, unassisted | | `YOU-sim` | 2 | 59 | Human-simulated scripted attacker | | `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator | -| `CLAUDE-FF` | 3 | 84 | Claude (fast/free tier) assisted | -| `CLAUDE-CL` | 4 | 111 | Claude (standard tier) assisted | +| `CLAUDE-FF` | 3 | 84 | Claude (fast) assisted | +| `CLAUDE-CL` | 4 | 111 | Claude (standard) assisted | -All classes emit ≥27 distinct primitives (pass threshold). - -The `inter_command_latency_class` thresholds `llm_lightweight` (≤8 s) and -`llm_heavyweight` (≤30 s) were derived from timing measurements of these -sessions — DECNET can distinguish a human-with-fast-LLM from an unassisted -human in a single session with moderate confidence, and with high confidence -across 3+ sessions. +All classes emit ≥27 distinct primitives. The `inter_command_latency_class` +LLM buckets are the primary discriminator between unassisted and +LLM-assisted operators in single-session analysis; cross-session attribution +uses the full primitive set. --- -## Testing +## Key thresholds (reference implementation) -```bash -# Offline smoke test — 5 shards, mock bus, must emit ≥27 distinct per class -scripts/behave_shell/smoke.sh +All constants live in `_thresholds.py`. -# Live round-trip — replay calibration shards through a running DECNET -scripts/behave_shell/replay_calibration.py -``` +| Constant | Value | +|---|---| +| `PASTE_MIN_CHARS_PER_EVENT` | 4 | +| `PASTE_BURST_MAX_IAT_S` | 0.20 | +| `IKI_THINK_MAX_S` | 2.0 (typing-burst split) | +| `TREMOR_FAST_FLOOR_S` | 0.030 | +| `CV_STEADY_MAX` | 0.45 | +| `INTER_CMD_INSTANT_MAX` | 0.30 s | +| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | +| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | +| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | +| `FEEDBACK_CORRELATION_MIN` | 0.30 | +| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | +| `PAUSE_CV_BIMODAL_MIN` | 1.50 | +| `SESSION_DURATION_SHORT_MAX` | 60 s | +| `SESSION_DURATION_MEDIUM_MAX` | 600 s | +| `SESSION_DURATION_LONG_MAX` | 3600 s | +| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | +| `MIN_OBSERVATIONS_FOR_STATE` | 3 | +| `CATEGORICAL_WINDOW_N` | 5 | --- -## File reference +## DECNET implementation -``` -decnet/profiler/behave_shell/ - __init__.py Public API: extract_session() - extract.py Entry point — fans out to FEATURES registry (51 lines) - _ctx.py SessionContext builder (573 lines) - _parse.py Asciinema JSONL parsing (272 lines) - _handler.py Bus subscriber — disk I/O, persistence, publish (235 lines) - _intent.py Token → intent classification (115 lines) - _thresholds.py All calibration constants (416 lines) - _features/ - __init__.py FEATURES registry — list of 37 functions (104 lines) - motor.py Primitives 1–9 (422 lines) - cognitive.py Primitives 10–20 (593 lines) - temporal.py Primitives 21–24 (237 lines) - environmental.py Primitives 25–29 (352 lines) - operational.py Primitives 30–33 (218 lines) - emotional_valence.py Primitives 34–37 (223 lines) +In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every +`attacker.session.ended` bus event. The worker reads the PTY shard from disk, +runs `extract_session()`, and upserts one `ObservationRow` per primitive per +session. A `UniqueConstraint(evidence_ref, primitive)` makes re-processing +idempotent. -decnet/correlation/ - attribution_worker.py Bus loop: consume observations, run tick - attribution/ - aggregate.py State machine: unknown→stable→drifting→conflicted→multi_actor - _thresholds.py Attribution-layer thresholds +The attribution worker consumes `attacker.observation.*` bus events and +maintains one `AttributionStateRow` per `(identity_uuid, primitive)`. -decnet/web/db/models/ - observations.py ObservationRow schema - attribution_state.py AttributionStateRow schema -``` +Source: `decnet/profiler/behave_shell/` (~3 868 lines across 12 files). --- -## Related pages +## See also -- [Fingerprinting](Fingerprinting) — all fingerprint layers, including the - BEHAVE-SHELL summary -- [Identity-Resolution](Identity-Resolution) — how observations are clustered - into attacker identities and how state machine transitions propagate -- [Service-Personas](Service-Personas) — enabling session recording and - BEHAVE-SHELL per service +- **BEHAVE-TEXT** — sibling spec for written-text stylometry and lexicometry, + implemented by [EYENET](https://github.com/xmartlab/eyenet) +- [Fingerprinting](Fingerprinting) — all DECNET fingerprint layers +- [Identity-Resolution](Identity-Resolution) — how observations feed the + identity clusterer