docs: reframe BEHAVE-SHELL as a spec, not a DECNET component — add stylometry/lexicometry scope, BEHAVE-TEXT/EYENET cross-reference

2026-05-10 04:31:36 -04:00
parent 58915d8115
commit 5a39211645

@@ -1,95 +1,106 @@
# BEHAVE-SHELL # BEHAVE-SHELL
BEHAVE-SHELL is DECNET's behavioural biometrics engine for interactive shell BEHAVE-SHELL is a **behavioural biometrics specification** for interactive
sessions. It transforms raw PTY recordings into 37 attribution primitives shell sessions. It defines a set of attribution primitives — observable,
that fingerprint *how* an operator works — their motor patterns, cognitive computable signals — that characterise *how* an operator works at a terminal,
style, OPSEC habits, and emotional state — independently of what IP address independently of what IP address, credential, or tooling they use.
or tooling they use.
The primitives feed the [Identity-Resolution](Identity-Resolution) attribution The spec was born out of DECNET's need to correlate attackers across sessions
state machine, which accumulates evidence across sessions to answer: *is this and IP changes, but it is not DECNET-specific. Any system that records PTY
the same hands?* sessions can implement BEHAVE-SHELL extraction and feed the resulting
primitives into an attribution engine. DECNET is the reference implementation.
A sibling specification, **BEHAVE-TEXT**, defines equivalent primitives for
written text — stylometry, lexicometry, and discourse structure — and is
implemented by [EYENET](https://github.com/xmartlab/eyenet).
---
## Scope
BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The
current specification covers three broad domains:
| Domain | What it captures |
|---|---|
| **Motor biometrics** | Keystroke timing, error correction, paste vs. type habits, shell mastery signals |
| **Cognitive / behavioural** | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure |
| **Stylometry / lexicometry** | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions |
The emotional valence cluster (`valence`, `arousal`, `stress_response`,
`frustration_venting`) sits at the boundary of motor and stylometric signal —
it measures both typing speed changes and lexical content after stress events.
--- ---
## Design principles ## Design principles
- **Pure extraction library.** `extract_session()` takes an iterable of - **Extraction is pure.** The spec defines a function
asciinema events and yields `Observation` envelopes. No I/O, no DB access, `extract_session(events) → Observations` that takes an iterable of timestamped
no bus calls. The worker owns all side effects. PTY events and yields structured observations. No I/O. No database.
- **PII by design.** Command text is never stored in plain form — only the No side effects. Implementations are free to run this in any context.
- **PII by design.** Command text is never stored in plain form. Only the
SHA-256 of the first token is retained. Output is reduced to a byte count SHA-256 of the first token is retained. Output is reduced to a byte count
and an error verdict. Prompt lines are ANSI-stripped and capped at 256 and an error verdict. Prompt lines are ANSI-stripped and capped at 256
characters. characters. Raw bigram/unigram counts are used for layout fingerprinting —
- **Idempotent persistence.** `UniqueConstraint(evidence_ref, primitive)` not the text itself.
on the observations table means replaying a shard never duplicates rows.
- **Confidence capping.** Emotional-valence features carry a hard confidence - **Confidence is explicit.** Every observation carries a confidence value
cap of 0.50 — they contribute, but never dominate an attribution decision. [0.01.0]. Features that are inherently noisier have hard confidence caps
(emotional valence: 0.50). Attribution engines must propagate confidence
rather than treating all observations as equal.
- **Skip conditions over imputation.** A feature that cannot be computed on a
given session (e.g. `error_resilience` features when no errors occurred)
yields no observation rather than a default value. Attribution engines
treat absence of an observation differently from an `unknown` state.
--- ---
## Data flow ## Input format
``` BEHAVE-SHELL operates on **asciinema-compatible event streams**: sequences of
PTY session `(t: float, ch: "i"|"o", d: str)` tuples representing timestamped input and
output chunks from a PTY session. `"i"` is operator input; `"o"` is terminal
output. Non-UTF-8 bytes are handled via surrogateescape.
sessrec.c — writes JSONL shard per session
│ {"sid": id, "t": ts, "ch": "i"|"o", "d": data} The DECNET implementation records these as JSONL shards via `sessrec.c`:
│ Non-UTF-8 bytes handled via surrogateescape
```json
attacker.session.ended bus event {"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}
_handler.handle_session_ended()
│ Reads shard from disk → parse_shard_line() → AsciinemaEvent tuples
build_session_context() (_ctx.py, ~573 lines)
│ Seven derivation steps (see below)
extract_session() (extract.py)
│ Fan-out across 37 registered feature functions (FEATURES registry)
│ Each yields 0..N Observation envelopes
Upsert ObservationRow → publish attacker.observation.*
attribution_worker (attribution_worker.py)
│ Consumes attacker.observation.> bus events
│ Runs aggregate() per (identity_uuid, primitive)
AttributionStateRow state ∈ {unknown, stable, drifting, conflicted, multi_actor}
``` ```
--- ---
## Session context derivation ## Session context derivation
`build_session_context()` performs a single-pass walk over the raw event Before feature extraction, a single-pass walk over the event stream builds a
stream and produces a `SessionContext` that all 37 feature functions read. `SessionContext` — a set of derived signals that all feature functions share.
The seven derivation steps, in order: The derivation steps, in order:
| Step | What it computes | | Step | Output |
|---|---| |---|---|
| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars within 200 ms) into `paste_bursts` | | **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts` |
| **Typing-burst segmentation** | Splits the keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]` (dropped if < 3 IATs) | | **Typing-burst segmentation** | Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs |
| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line sequences (`0x15`, `0x17`); records IATs between each backspace and the preceding keystroke | | **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke |
| **Per-command intra-typing IATs** | For each command, extracts keystroke inter-arrival times from that command's span only | | **Per-command intra-typing IKIs** | For each command, IKIs from that command's span only |
| **Command segmentation** | Splits on `\r`/`\n`; per command records `first_token_hash` (SHA-256), tab count, readline shortcut count, and pipe count | | **Command segmentation** | Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count |
| **Inter-command IAT gaps** | Time between consecutive commands | | **Inter-command IKI gaps** | Time between consecutive commands |
| **Error detection** | Scans output between commands for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` | | **Error detection** | Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix after ANSI stripping; caps at 256 chars | | **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars |
| **Keyboard layout fingerprinting** | Builds unigram and bigram histograms from typed letters | | **Keyboard layout fingerprinting** | Unigram and bigram histograms from typed letters |
| **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run | | **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run |
### Key data structures ### Key structures
``` ```
SessionContext SessionContext
sid: str sid: str
t_start, t_end, duration_s: float t_start, t_end, duration_s: float
input_events, output_events: tuple[AsciinemaEvent] input_events, output_events: tuple[Event]
iats: tuple[float] # inter-keystroke intervals iats: tuple[float] # inter-keystroke intervals
paste_bursts: tuple[PasteBurst] paste_bursts: tuple[PasteBurst]
typing_bursts: tuple[tuple[float]] typing_bursts: tuple[tuple[float]]
backspace_count, kill_line_count: int backspace_count, kill_line_count: int
@@ -104,7 +115,7 @@ SessionContext
Command Command
start_ts, end_ts: float start_ts, end_ts: float
first_token_hash: str # SHA-256 of first token only first_token_hash: str # SHA-256, first token only
tab_count, shortcut_count, pipe_count: int tab_count, shortcut_count, pipe_count: int
errored: bool errored: bool
output_bytes: int output_bytes: int
@@ -121,507 +132,373 @@ PromptLine
## The 37 primitives ## The 37 primitives
### Motor (9) — muscle memory and physical interaction style ### Motor (9)
These primitives capture *how* an operator's fingers interact with the Motor primitives capture muscle memory and physical interaction patterns.
keyboard — patterns that persist across sessions, accounts, and even They are among the most stable signals across sessions and across different
operating systems. machines used by the same operator.
#### 1. `input_modality` #### `input_modality`
Values: `typed` | `pasted` | `mixed` Values: `typed` | `pasted` | `mixed`
Ratio of paste events to total input events. ≥40 % pasted and ≤5 % Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed
typed `pasted`; ≤5 % pasted → `typed`; otherwise `mixed`. `pasted`. ≤5 % pasted → `typed`. Otherwise `mixed`.
A script kiddie running pre-written one-liners pastes habitually. A A script kiddie running pre-written one-liners pastes habitually. A
seasoned operator types most commands from memory. seasoned operator types most commands from memory.
#### 2. `paste_burst_rate` #### `paste_burst_rate`
Values: `none` | `occasional` | `habitual` Values: `none` | `occasional` | `habitual`
Coarser bucketing of the paste ratio. ≥50 % → `habitual`, Coarser paste-ratio bucketing. ≥50 % → `habitual`, ≥10 % → `occasional`.
≥10 % → `occasional`.
#### 3. `keystroke_cadence` #### `keystroke_cadence`
Values: `steady` | `bursty` | `hunt_and_peck` | `machine` Values: `steady` | `bursty` | `hunt_and_peck` | `machine`
Median coefficient of variation (CV) of within-burst inter-keystroke Median coefficient of variation (CV) of within-burst inter-keystroke
intervals (IKIs). intervals (IKIs):
| CV | Mean IKI | Label | | CV | Mean IKI | Label |
|---|---|---| |---|---|---|
| < 0.30 | < 30 ms | `machine` | | < 0.30 | < 30 ms | `machine` — inhumanly uniform |
| < 0.45 | any | `steady` | | < 0.45 | any | `steady` — trained touch typist |
| < 0.70 | any | `bursty` | | < 0.70 | any | `bursty` — thinks between phrases |
| ≥ 0.70 | any | `hunt_and_peck` | | ≥ 0.70 | any | `hunt_and_peck` |
`machine` catches automated input that passes as human visually but has #### `motor_stability`
inhumanly uniform inter-key timing.
#### 4. `motor_stability`
Values: `steady` | `variable` | `tremor` Values: `steady` | `variable` | `tremor`
Fraction of IKIs below the tremor floor (30 ms). ≥20 % → `tremor` Fraction of IKIs below 30 ms. ≥20 % → `tremor` (physiological or
(physiological or tool-simulated). Otherwise the median CV classifies tool-simulated). Otherwise CV classifies `steady` vs `variable`.
`steady` vs `variable`.
#### 5. `error_correction` #### `error_correction`
Values: `immediate` | `deferred` | `absent` | `route_around` Values: `immediate` | `deferred` | `absent` | `route_around`
Timing of backspace relative to the preceding keystroke. Median ≤500 ms Timing of backspace relative to the preceding keystroke. Median ≤500 ms
`immediate` (noticed fast, muscle-memory correction). Median > 500 ms `immediate`. Median > 500 ms → `deferred`. No backspaces but kill-line
`deferred` (reads output then corrects). Zero backspaces but kill-line present → `route_around` (ctrl-u / ctrl-w). Nothing → `absent`.
present → `route_around` (ctrl-u / ctrl-w). No corrections at all →
`absent`.
#### 6. `command_chunking` #### `command_chunking`
Values: `fluent` | `fragmented` | `single_command` Values: `fluent` | `fragmented` | `single_command`
Median CV of per-command intra-typing IKIs. < 0.40 → `fluent` (commands Median CV of per-command intra-typing IKIs. < 0.40 → `fluent` (commands
typed as rehearsed phrases). Otherwise `fragmented`. Only one command typed as rehearsed phrases).
in session → `single_command`.
#### 7. `shell_mastery.tab_completion` #### `shell_mastery.tab_completion`
Values: `none` | `occasional` | `habitual` Values: `none` | `occasional` | `habitual`
Fraction of commands containing at least one `0x09` (tab) keystroke. Fraction of commands containing ≥1 tab keystroke. 0 → `none`,
0 → `none`, < 50 % → `occasional`, ≥ 50 % → `habitual`. < 50 % → `occasional`, ≥50 % → `habitual`.
Operators who tab-complete heavily know the filesystem; those who never do #### `shell_mastery.shortcut_usage`
either memorise paths or are running a prepared script.
#### 8. `shell_mastery.shortcut_usage`
Values: `none` | `moderate` | `heavy` Values: `none` | `moderate` | `heavy`
Readline control-byte count (ctrl-a, ctrl-e, ctrl-r, etc.) per command. Readline control-byte count per command. < 0.05 → `none`,
< 0.05 → `none`, < 0.15 → `moderate`, ≥ 0.15 → `heavy`. < 0.15 → `moderate`, ≥0.15 → `heavy`.
#### 9. `shell_mastery.pipe_chaining_depth` #### `shell_mastery.pipe_chaining_depth`
Values: `shallow` | `moderate` | `deep` Values: `shallow` | `moderate` | `deep`
Median pipe count per command. ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`. Median pipe count per command. ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`.
--- ---
### Cognitive (11) — decision-making and planning style ### Cognitive (11)
These primitives capture *how* an operator thinks — their command repertoire, Cognitive primitives capture decision-making style, planning depth, and how
response to failure, and how much they read output before acting. the operator processes feedback.
#### 10. `inter_command_latency_class` #### `inter_command_latency_class`
Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long` Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long`
Median inter-command pause bucketed against calibrated thresholds: Median inter-command pause bucketed against calibrated thresholds:
| Threshold | Label | What it suggests | | Threshold | Label | Interpretation |
|---|---|---| |---|---|---|
| ≤ 0.30 s | `instant` | Scripted or replay | | ≤ 0.30 s | `instant` | Scripted or replay |
| ≤ 1.50 s | `typing_speed` | Commands prepared, typing only | | ≤ 1.50 s | `typing_speed` | Commands prepared, typing only |
| ≤ 2.00 s | `deliberate` | Reads output before next command | | ≤ 2.00 s | `deliberate` | Reads output before acting |
| ≤ 8.00 s | `llm_lightweight` | May be consulting a fast LLM / notes | | ≤ 8.00 s | `llm_lightweight` | Consulting a fast LLM or notes |
| ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference | | ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference |
| > 30.00 s | `long` | Long pauses — possibly interrupted or cautious | | > 30.00 s | `long` | Interrupted or cautious |
`llm_lightweight` and `llm_heavyweight` were calibrated against Claude The `llm_*` thresholds were calibrated against real sessions of Claude-assisted
Free (fast) and Claude (slow) assisted operator sessions — a novel class operators — a novel adversary class BEHAVE-SHELL is explicitly designed to
of adversary DECNET is designed to detect. detect.
#### 11. `command_branch_diversity` #### `command_branch_diversity`
Values: `linear_playbook` | `adaptive_branching` | `unknown` Values: `linear_playbook` | `adaptive_branching` | `unknown`
Unique first-token / total command ratio. < 5 commands → `unknown`. Unique first-token ratio. < 5 commands → `unknown`. ≥70 % unique →
≥ 70 % unique → `linear_playbook` (each command is different — following `linear_playbook` (following a prepared list). < 70 % →
a prepared list). < 70 % → `adaptive_branching` (repeating tools, `adaptive_branching` (iterating on a problem).
iterating on a problem).
#### 12. `feedback_loop_engagement` #### `feedback_loop_engagement`
Values: `closed_loop` | `fire_and_forget` | `unknown` Values: `closed_loop` | `fire_and_forget` | `unknown`
Pearson correlation between per-command output bytes and the following Pearson correlation between per-command output bytes and the following
inter-command pause. r > 0.30 → `closed_loop` (pauses longer when there inter-command pause. r > 0.30 → `closed_loop` (pauses longer when there
is more output to read). Otherwise `fire_and_forget`. Requires ≥5 is more to read). Requires ≥5 triples.
command/output/pause triples.
#### 13. `inter_command_consistency` #### `inter_command_consistency`
Values: `metronomic` | `variable` | `bimodal` Values: `metronomic` | `variable` | `bimodal`
CV of inter-command IKIs. < 0.40 → `metronomic` (scripts, beacons). CV of inter-command IKIs. < 0.40 → `metronomic` (scripts, beacons).
> 1.50 → `bimodal` (two distinct paces — often short commands interleaved > 1.50 → `bimodal` (short commands interleaved with long waits for
with long waits for a compile or download). Otherwise `variable`. compiles or downloads).
#### 14. `cognitive_load` #### `cognitive_load`
Values: `low` | `medium` | `high` Values: `low` | `medium` | `high`
Composite score: mean of (intra-typing CV / 1.0, error rate, pause CV / 1.5). Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).
< 0.33 → `low`, < 0.67 → `medium`, otherwise `high`.
High cognitive load across multiple sessions on the same identity is a #### `exploration_style`
signal of an operator working outside their comfort zone — new target OS,
unfamiliar tooling, or time pressure.
#### 15. `exploration_style`
Values: `methodical` | `targeted` | `chaotic` Values: `methodical` | `targeted` | `chaotic`
`repetition_rate` = 1 unique/total commands. `backtrack_rate` ≥30 % → `chaotic`. `repetition_rate` ≥50 % → `targeted`.
`backtrack_rate` = fraction of commands that jump back to a previously used
tool category. Backtrack ≥30 % → `chaotic`. Repetition ≥50 % → `targeted`
(narrow focus, known objective). Otherwise `methodical`.
#### 16. `planning_depth` #### `planning_depth`
Values: `deep` | `reactive` | `shallow` Values: `deep` | `reactive` | `shallow`
`deep_pause_frac` = fraction of inter-command IKIs > 2.0 s. Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).
`reactive_frac` = fraction ≤ 0.30 s. ≥40 % deep pauses → `deep`.
≥50 % reactive → `reactive`. Otherwise `shallow`.
#### 17. `tool_vocabulary` #### `tool_vocabulary`
Values: `narrow` | `moderate` | `broad` Values: `narrow` | `moderate` | `broad`
Distinct first-token count (absolute). ≤3 → `narrow`, ≥10 → `broad`. Distinct first-token count. ≤3 → `narrow`, ≥10 → `broad`.
#### 18. `error_resilience.retry_tactic` #### `error_resilience.retry_tactic`
Values: `retry_same` | `pivot` | `fallback` Values: `retry_same` | `pivot` | `fallback`
Post-error behaviour: does the operator retry the same command, switch to Post-error behaviour pattern. Skipped if no errors.
a different approach, or fall back to reconnaissance? Skipped if no errors
occurred in the session.
#### 19. `error_resilience.frustration_typing` #### `error_resilience.frustration_typing`
Values: `low` | `moderate` | `high` Values: `low` | `moderate` | `high`
Delta between median intra-IKI after an error vs. after a success. Delta between median intra-IKI after an error vs. after a success.
< 10 % delta → `low`, < 30 % → `moderate`, ≥30 % → `high`.
Fast typing after errors suggests frustration; slow typing suggests #### `error_resilience.fallback_to_man`
deliberation.
#### 20. `error_resilience.fallback_to_man`
Values: `present` | `absent` Values: `present` | `absent`
After an error, does the next command start with `man`, `help`, or `info`? After an error, does the next command start with `man`/`help`/`info`?
Skipped if no errors. `present` indicates an operator consulting
documentation — less automated, less rehearsed.
--- ---
### Temporal (4) — session rhythm and pacing ### Temporal (4)
#### 21. `session_duration` #### `session_duration`
Values: `short` | `medium` | `long` | `marathon` Values: `short` | `medium` | `long` | `marathon`
| Duration | Label | < 60 s / < 600 s / < 3600 s / ≥ 3600 s.
|---|---|
| < 60 s | `short` — single recon or scan |
| < 600 s | `medium` — targeted interaction |
| < 3600 s | `long` — sustained operation |
| ≥ 3600 s | `marathon` — extended presence / slow-burn APT |
#### 22. `escalation_pattern` #### `escalation_pattern`
Values: `bursty` | `sustained` Values: `bursty` | `sustained`
Dynamic window analysis (window width = max(10 s, duration / target)). Dynamic window analysis of activity density over the session lifetime.
CV and zero-window fraction classify whether activity clusters into bursts
separated by idle periods, or maintains a consistent level throughout.
#### 23. `landing_ritual` #### `landing_ritual`
Values: `cleanup` | `exploration` | `passive` Values: `cleanup` | `exploration` | `passive`
First ~5 commands classified by intent tokens. `cleanup` if the operator Intent of the first ~5 commands.
immediately starts removing evidence; `exploration` if they run
reconnaissance commands (`id`, `whoami`, `uname`, `ls`); `passive` if
they do nothing that reveals intent.
#### 24. `exit_behavior` #### `exit_behavior`
Values: `cleanup` | `standard` | `anomalous` Values: `cleanup` | `standard` | `anomalous`
Last ~5 commands. `cleanup` if history/log deletion or `exit`/`logout` Intent of the last ~5 commands.
appears. `anomalous` if the session ends abruptly with no recognisable
closing pattern.
--- ---
### Environmental (5) — operator's local setup ### Environmental (5)
These are stable across an operator's career and change only when they Environmental primitives are stable across an operator's career — they change
switch machines or retool. only when the operator switches machines or deliberately retools.
#### 25. `shell_type` #### `shell_type`
Values: `bash` | `sh` | `zsh` | `fish` | `unknown` Values: `bash` | `sh` | `zsh` | `fish` | `unknown`
Detected from PS1 prompt regex patterns after ANSI stripping. Detected from PS1 prompt regex patterns.
#### 26. `terminal_multiplexer` #### `terminal_multiplexer`
Values: `tmux` | `screen` | `none` Values: `tmux` | `screen` | `none`
Detected from PS1 markers and characteristic escape sequences. Detected from PS1 markers and escape sequences.
#### 27. `locale` #### `locale`
Values: `en-US` | `en` | `other` | `unknown` Values: `en-US` | `en` | `other` | `unknown`
Language-specific keywords in prompt lines and error messages. Language-specific keywords in prompt lines and error messages.
#### 28. `keyboard_layout` #### `keyboard_layout`
Values: `qwerty` | `dvorak` | `colemak` | `other` Values: `qwerty` | `dvorak` | `colemak` | `other`
Bigram frequency analysis of the typed character stream. Operators who Bigram frequency analysis of the typed character stream. An operator who
touch-type on Dvorak produce a statistically distinct bigram distribution touch-types on Dvorak produces a statistically distinct bigram distribution
that persists even when typing non-English commands. that persists even when typing non-English commands — this is a pure
stylometric signal derived from motor habit.
#### 29. `numpad_usage` #### `numpad_usage`
Values: `occasional` | `frequent` | `none` Values: `occasional` | `frequent` | `none`
Keystroke pattern detection for numpad-originated digits.
--- ---
### Operational (4) — mission and OPSEC posture ### Operational (4)
#### 30. `objective` #### `objective`
Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive` Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive`
Token-based intent classification of command first-tokens. Majority vote Token-based intent classification. Majority vote; skipped if < 3
across classified tokens; precedence order applied for ties. Skipped if classified tokens.
fewer than 3 classified tokens.
Example token mappings: Example token mappings:
- `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat` - `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat`
- `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync` - `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync`
- `persistence`: `crontab`, `echo`, `tee`, `systemctl`, `rc.local` - `persistence`: `crontab`, `echo >> ~/.bashrc`, `systemctl enable`
- `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec` - `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec`
- `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill` - `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill`
#### 31. `opsec_discipline` #### `opsec_discipline`
Values: `careful` | `learning` | `careless` Values: `careful` | `learning` | `careless`
Presence of history-disabling tokens (`unset HISTFILE`, `HISTSIZE=0`, Presence of history-disabling tokens and cleanup activity. Both →
`history -c`) and cleanup activity in the session tail. Both`careful`. `careful`. History only → `learning`. Neither`careless`.
History-only → `learning` (knows to cover tracks but forgets cleanup).
Neither → `careless`.
#### 32. `cleanup_behavior` #### `cleanup_behavior`
Values: `thorough` | `partial` | `none` Values: `thorough` | `partial` | `none`
Distinct cleanup tokens in the last 5 commands. ≥3 → `thorough`, Distinct cleanup tokens in the session tail. ≥3 → `thorough`,
12 → `partial`, 0 → `none`. 12 → `partial`.
#### 33. `multi_actor_indicators` #### `multi_actor_indicators`
Values: `solo` | `handoff_detected` Values: `solo` | `handoff_detected`
Splits commands at the session's temporal midpoint and compares the median Splits commands at the session midpoint and compares median intra-IKI of
intra-IKI of each half. If the delta exceeds 50 % and both halves have each half. Delta > 50 % with both halves having ≥4 commands →
≥4 commands, `handoff_detected` is emitted — the session was likely shared `handoff_detected`. Suggests the session was shared between two operators
between two operators (e.g. initial access handed to a post-exploitation (initial access handed to a post-exploitation specialist, or a shared
specialist). credential).
--- ---
### Emotional valence (4) — stress and cognitive state ### Emotional valence (4)
These features have a hard confidence cap of **0.50** — they contribute to These primitives sit at the boundary of motor and stylometric signal. They
attribution but cannot dominate it. They require ≥80 typed letters to emit. require ≥80 typed letters and carry a hard confidence cap of **0.50**
they contribute to attribution but cannot dominate it.
#### 34. `valence` #### `valence`
Values: `positive` | `neutral` | `negative` Values: `positive` | `neutral` | `negative`
Lexical positive/negative token counts. `positive` if positive count > Lexical positive/negative token counts. `positive` requires positive count
(negative + obscenity) and ≥2 positive tokens. > (negative + obscenity) with ≥2 positive tokens.
#### 35. `arousal` #### `arousal`
Values: `low_calm` | `medium_engaged` | `high_agitated` Values: `low_calm` | `medium_engaged` | `high_agitated`
`high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest `high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest
IKI < 60 ms on ≥30 keystrokes. `low_calm` if slowest IKI > 300 ms. IKI < 60 ms on ≥30 keystrokes.
Otherwise `medium_engaged`.
#### 36. `stress_response` #### `stress_response`
Values: `none` | `eustress_positive` | `distress_negative` Values: `none` | `eustress_positive` | `distress_negative`
Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (types Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (experienced,
faster under pressure — experienced). ≤ 1/1.20 → `distress` (types types faster under pressure). ≤ 1/1.20 → `distress`.
slower — less experienced or genuinely stressed).
#### 37. `frustration_venting` #### `frustration_venting`
Values: `low` | `moderate` | `high` Values: `low` | `moderate` | `high`
Post-error frustration token count plus obscenity count. Post-error frustration token count plus obscenity count. A purely
lexicometric signal.
--- ---
## Attribution state machine ## Attribution
Primitives feed a per-`(identity_uuid, primitive)` state machine in BEHAVE-SHELL does not define how observations are aggregated — that is the
`decnet/correlation/attribution/aggregate.py`. responsibility of the implementing system's attribution engine. The DECNET
reference implementation uses a five-state machine per
`(identity_uuid, primitive)`:
### States | State | Condition |
|---|---|
| `unknown` | < 3 observations |
| `stable` | Recent N agree, no drift from older N |
| `drifting` | Recent N agree but differ from older N |
| `conflicted` | Recent N are split |
| `multi_actor` | `conflicted` + cross-session alternation |
| State | Meaning | Condition | Window N = 5 for categorical primitives. When ≥2 primitives independently
|---|---|---| reach `multi_actor` for the same identity, the engine emits a
| `unknown` | Insufficient data | < 3 observations | `multi_actor_suspected` signal — a strong indicator of a shared credential
| `stable` | Consistent value | Recent N agree AND no drift from older N | or a compromised operator account.
| `drifting` | Recently changed | Recent N agree BUT differ from older N |
| `conflicted` | Contradictory values | Recent N are split (high CV) |
| `multi_actor` | Multiple operators | `conflicted` + cross-session alternation |
Window size N = 5 (categorical primitives). EWMA is used for numeric
primitives (Phase 3).
### Multi-actor detection
The attribution worker runs a `_multi_actor_tick` every 60 seconds. For
every `(identity, primitive)` pair in `conflicted` state, it checks whether
the alternation pattern across sessions is consistent with a credential
being shared between two distinct operators. When ≥2 primitives
independently flag `multi_actor` for the same identity, the bus emits:
```
attribution.profile.multi_actor_suspected
{identity_uuid, primitives: [...], evidence_summary, confidence, ts}
```
`confidence` is capped at 0.60 — cross-primitive agreement is the real
signal, but a hard cap prevents over-alarming on noisy primitives.
---
## Database tables
### `ObservationRow`
One row per `(evidence_ref, primitive)`. `evidence_ref` is the session
shard identifier — the `UniqueConstraint` makes re-processing idempotent.
| Column | Type | Description |
|---|---|---|
| `id` | UUID PK | |
| `identity_uuid` | FK → `attacker_identities` | |
| `attacker_uuid` | FK → `attackers` | Direct link for pre-clusterer path |
| `evidence_ref` | TEXT | Shard ID |
| `primitive` | TEXT | e.g. `keystroke_cadence` |
| `value` | TEXT | Categorical label or serialised numeric |
| `confidence` | FLOAT | 0.01.0 |
| `observed_at` | DATETIME | Session end time |
### `AttributionStateRow`
One row per `(identity_uuid, primitive)`. Updated by the attribution
worker each time a new observation arrives.
| Column | Type | Description |
|---|---|---|
| `identity_uuid` | FK → `attacker_identities` | |
| `primitive` | TEXT | |
| `state` | TEXT | `unknown`/`stable`/`drifting`/`conflicted`/`multi_actor` |
| `current_value` | TEXT | Most recent or EWMA value |
| `confidence` | FLOAT | |
| `observation_count` | INT | Total observations aggregated |
| `last_observation_ts` | DATETIME | |
---
## Key thresholds
All calibration constants live in `decnet/profiler/behave_shell/_thresholds.py`
(416 lines). The values below are the defaults; they can be overridden per
deployment without touching feature code.
| Constant | Value | Used by |
|---|---|---|
| `PASTE_MIN_CHARS_PER_EVENT` | 4 | Paste detection |
| `PASTE_BURST_MAX_IAT_S` | 0.20 | Paste burst grouping |
| `MODALITY_PASTED_MIN` | 0.40 | `input_modality` |
| `CV_STEADY_MAX` | 0.45 | `keystroke_cadence` |
| `TREMOR_FAST_FLOOR_S` | 0.030 | `motor_stability` |
| `IKI_THINK_MAX_S` | 2.0 | Typing-burst split |
| `INTER_CMD_INSTANT_MAX` | 0.30 s | `inter_command_latency_class` |
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | LLM-assisted detection |
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | LLM-assisted detection |
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | `command_branch_diversity` |
| `FEEDBACK_CORRELATION_MIN` | 0.30 | `feedback_loop_engagement` |
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | `inter_command_consistency` |
| `PAUSE_CV_BIMODAL_MIN` | 1.50 | `inter_command_consistency` |
| `SESSION_DURATION_SHORT_MAX` | 60 s | `session_duration` |
| `SESSION_DURATION_MEDIUM_MAX` | 600 s | `session_duration` |
| `SESSION_DURATION_LONG_MAX` | 3600 s | `session_duration` |
| `MIN_OBSERVATIONS_FOR_STATE` | 3 | Attribution state machine |
| `CATEGORICAL_WINDOW_N` | 5 | Attribution window |
| `MULTI_ACTOR_TICK_SECS` | 60 | Multi-actor tick |
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | All `emotional_valence` features |
--- ---
## Calibration ## Calibration
The system was calibrated against five behavioural classes across 15 sessions The reference thresholds were calibrated against five behavioural classes
(424 total observations): across 15 sessions (424 total observations):
| Class | Sessions | Observations | Description | | Class | Sessions | Observations | Description |
|---|---|---|---| |---|---|---|---|
| `HUMAN` | 1 | 34 | Human operator, no assistance | | `HUMAN` | 1 | 34 | Human operator, unassisted |
| `YOU-sim` | 2 | 59 | Human-simulated scripted attacker | | `YOU-sim` | 2 | 59 | Human-simulated scripted attacker |
| `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator | | `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator |
| `CLAUDE-FF` | 3 | 84 | Claude (fast/free tier) assisted | | `CLAUDE-FF` | 3 | 84 | Claude (fast) assisted |
| `CLAUDE-CL` | 4 | 111 | Claude (standard tier) assisted | | `CLAUDE-CL` | 4 | 111 | Claude (standard) assisted |
All classes emit ≥27 distinct primitives (pass threshold). All classes emit ≥27 distinct primitives. The `inter_command_latency_class`
LLM buckets are the primary discriminator between unassisted and
The `inter_command_latency_class` thresholds `llm_lightweight` (≤8 s) and LLM-assisted operators in single-session analysis; cross-session attribution
`llm_heavyweight` (≤30 s) were derived from timing measurements of these uses the full primitive set.
sessions — DECNET can distinguish a human-with-fast-LLM from an unassisted
human in a single session with moderate confidence, and with high confidence
across 3+ sessions.
--- ---
## Testing ## Key thresholds (reference implementation)
```bash All constants live in `_thresholds.py`.
# Offline smoke test — 5 shards, mock bus, must emit ≥27 distinct per class
scripts/behave_shell/smoke.sh
# Live round-trip — replay calibration shards through a running DECNET | Constant | Value |
scripts/behave_shell/replay_calibration.py |---|---|
``` | `PASTE_MIN_CHARS_PER_EVENT` | 4 |
| `PASTE_BURST_MAX_IAT_S` | 0.20 |
| `IKI_THINK_MAX_S` | 2.0 (typing-burst split) |
| `TREMOR_FAST_FLOOR_S` | 0.030 |
| `CV_STEADY_MAX` | 0.45 |
| `INTER_CMD_INSTANT_MAX` | 0.30 s |
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s |
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s |
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 |
| `FEEDBACK_CORRELATION_MIN` | 0.30 |
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 |
| `PAUSE_CV_BIMODAL_MIN` | 1.50 |
| `SESSION_DURATION_SHORT_MAX` | 60 s |
| `SESSION_DURATION_MEDIUM_MAX` | 600 s |
| `SESSION_DURATION_LONG_MAX` | 3600 s |
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 |
| `MIN_OBSERVATIONS_FOR_STATE` | 3 |
| `CATEGORICAL_WINDOW_N` | 5 |
--- ---
## File reference ## DECNET implementation
``` In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every
decnet/profiler/behave_shell/ `attacker.session.ended` bus event. The worker reads the PTY shard from disk,
__init__.py Public API: extract_session() runs `extract_session()`, and upserts one `ObservationRow` per primitive per
extract.py Entry point — fans out to FEATURES registry (51 lines) session. A `UniqueConstraint(evidence_ref, primitive)` makes re-processing
_ctx.py SessionContext builder (573 lines) idempotent.
_parse.py Asciinema JSONL parsing (272 lines)
_handler.py Bus subscriber — disk I/O, persistence, publish (235 lines)
_intent.py Token → intent classification (115 lines)
_thresholds.py All calibration constants (416 lines)
_features/
__init__.py FEATURES registry — list of 37 functions (104 lines)
motor.py Primitives 19 (422 lines)
cognitive.py Primitives 1020 (593 lines)
temporal.py Primitives 2124 (237 lines)
environmental.py Primitives 2529 (352 lines)
operational.py Primitives 3033 (218 lines)
emotional_valence.py Primitives 3437 (223 lines)
decnet/correlation/ The attribution worker consumes `attacker.observation.*` bus events and
attribution_worker.py Bus loop: consume observations, run tick maintains one `AttributionStateRow` per `(identity_uuid, primitive)`.
attribution/
aggregate.py State machine: unknown→stable→drifting→conflicted→multi_actor
_thresholds.py Attribution-layer thresholds
decnet/web/db/models/ Source: `decnet/profiler/behave_shell/` (~3 868 lines across 12 files).
observations.py ObservationRow schema
attribution_state.py AttributionStateRow schema
```
--- ---
## Related pages ## See also
- [Fingerprinting](Fingerprinting) — all fingerprint layers, including the - **BEHAVE-TEXT** — sibling spec for written-text stylometry and lexicometry,
BEHAVE-SHELL summary implemented by [EYENET](https://github.com/xmartlab/eyenet)
- [Identity-Resolution](Identity-Resolution) — how observations are clustered - [Fingerprinting](Fingerprinting) — all DECNET fingerprint layers
into attacker identities and how state machine transitions propagate - [Identity-Resolution](Identity-Resolution) — how observations feed the
- [Service-Personas](Service-Personas) — enabling session recording and identity clusterer
BEHAVE-SHELL per service