docs: reframe BEHAVE-SHELL as a spec, not a DECNET component — add stylometry/lexicometry scope, BEHAVE-TEXT/EYENET cross-reference

2026-05-10 04:31:36 -04:00
parent 58915d8115
commit 5a39211645

@@ -1,95 +1,106 @@
# BEHAVE-SHELL
BEHAVE-SHELL is DECNET's behavioural biometrics engine for interactive shell
sessions. It transforms raw PTY recordings into 37 attribution primitives
that fingerprint *how* an operator works — their motor patterns, cognitive
style, OPSEC habits, and emotional state — independently of what IP address
or tooling they use.
BEHAVE-SHELL is a **behavioural biometrics specification** for interactive
shell sessions. It defines a set of attribution primitives — observable,
computable signals — that characterise *how* an operator works at a terminal,
independently of what IP address, credential, or tooling they use.
The primitives feed the [Identity-Resolution](Identity-Resolution) attribution
state machine, which accumulates evidence across sessions to answer: *is this
the same hands?*
The spec was born out of DECNET's need to correlate attackers across sessions
and IP changes, but it is not DECNET-specific. Any system that records PTY
sessions can implement BEHAVE-SHELL extraction and feed the resulting
primitives into an attribution engine. DECNET is the reference implementation.
A sibling specification, **BEHAVE-TEXT**, defines equivalent primitives for
written text — stylometry, lexicometry, and discourse structure — and is
implemented by [EYENET](https://github.com/xmartlab/eyenet).
---
## Scope
BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The
current specification covers three broad domains:
| Domain | What it captures |
|---|---|
| **Motor biometrics** | Keystroke timing, error correction, paste vs. type habits, shell mastery signals |
| **Cognitive / behavioural** | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure |
| **Stylometry / lexicometry** | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions |
The emotional valence cluster (`valence`, `arousal`, `stress_response`,
`frustration_venting`) sits at the boundary of motor and stylometric signal —
it measures both typing speed changes and lexical content after stress events.
---
## Design principles
- **Pure extraction library.** `extract_session()` takes an iterable of
asciinema events and yields `Observation` envelopes. No I/O, no DB access,
no bus calls. The worker owns all side effects.
- **PII by design.** Command text is never stored in plain form — only the
- **Extraction is pure.** The spec defines a function
`extract_session(events) → Observations` that takes an iterable of timestamped
PTY events and yields structured observations. No I/O. No database.
No side effects. Implementations are free to run this in any context.
- **PII by design.** Command text is never stored in plain form. Only the
SHA-256 of the first token is retained. Output is reduced to a byte count
and an error verdict. Prompt lines are ANSI-stripped and capped at 256
characters.
- **Idempotent persistence.** `UniqueConstraint(evidence_ref, primitive)`
on the observations table means replaying a shard never duplicates rows.
- **Confidence capping.** Emotional-valence features carry a hard confidence
cap of 0.50 — they contribute, but never dominate an attribution decision.
characters. Raw bigram/unigram counts are used for layout fingerprinting —
not the text itself.
- **Confidence is explicit.** Every observation carries a confidence value
[0.01.0]. Features that are inherently noisier have hard confidence caps
(emotional valence: 0.50). Attribution engines must propagate confidence
rather than treating all observations as equal.
- **Skip conditions over imputation.** A feature that cannot be computed on a
given session (e.g. `error_resilience` features when no errors occurred)
yields no observation rather than a default value. Attribution engines
treat absence of an observation differently from an `unknown` state.
---
## Data flow
## Input format
```
PTY session
sessrec.c — writes JSONL shard per session
│ {"sid": id, "t": ts, "ch": "i"|"o", "d": data}
│ Non-UTF-8 bytes handled via surrogateescape
attacker.session.ended bus event
_handler.handle_session_ended()
│ Reads shard from disk → parse_shard_line() → AsciinemaEvent tuples
build_session_context() (_ctx.py, ~573 lines)
│ Seven derivation steps (see below)
extract_session() (extract.py)
│ Fan-out across 37 registered feature functions (FEATURES registry)
│ Each yields 0..N Observation envelopes
Upsert ObservationRow → publish attacker.observation.*
attribution_worker (attribution_worker.py)
│ Consumes attacker.observation.> bus events
│ Runs aggregate() per (identity_uuid, primitive)
AttributionStateRow state ∈ {unknown, stable, drifting, conflicted, multi_actor}
BEHAVE-SHELL operates on **asciinema-compatible event streams**: sequences of
`(t: float, ch: "i"|"o", d: str)` tuples representing timestamped input and
output chunks from a PTY session. `"i"` is operator input; `"o"` is terminal
output. Non-UTF-8 bytes are handled via surrogateescape.
The DECNET implementation records these as JSONL shards via `sessrec.c`:
```json
{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}
```
---
## Session context derivation
`build_session_context()` performs a single-pass walk over the raw event
stream and produces a `SessionContext` that all 37 feature functions read.
The seven derivation steps, in order:
Before feature extraction, a single-pass walk over the event stream builds a
`SessionContext` — a set of derived signals that all feature functions share.
The derivation steps, in order:
| Step | What it computes |
| Step | Output |
|---|---|
| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars within 200 ms) into `paste_bursts` |
| **Typing-burst segmentation** | Splits the keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]` (dropped if < 3 IATs) |
| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line sequences (`0x15`, `0x17`); records IATs between each backspace and the preceding keystroke |
| **Per-command intra-typing IATs** | For each command, extracts keystroke inter-arrival times from that command's span only |
| **Command segmentation** | Splits on `\r`/`\n`; per command records `first_token_hash` (SHA-256), tab count, readline shortcut count, and pipe count |
| **Inter-command IAT gaps** | Time between consecutive commands |
| **Error detection** | Scans output between commands for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix after ANSI stripping; caps at 256 chars |
| **Keyboard layout fingerprinting** | Builds unigram and bigram histograms from typed letters |
| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts` |
| **Typing-burst segmentation** | Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs |
| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke |
| **Per-command intra-typing IKIs** | For each command, IKIs from that command's span only |
| **Command segmentation** | Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count |
| **Inter-command IKI gaps** | Time between consecutive commands |
| **Error detection** | Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars |
| **Keyboard layout fingerprinting** | Unigram and bigram histograms from typed letters |
| **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run |
### Key data structures
### Key structures
```
SessionContext
sid: str
t_start, t_end, duration_s: float
input_events, output_events: tuple[AsciinemaEvent]
iats: tuple[float] # inter-keystroke intervals
input_events, output_events: tuple[Event]
iats: tuple[float] # inter-keystroke intervals
paste_bursts: tuple[PasteBurst]
typing_bursts: tuple[tuple[float]]
backspace_count, kill_line_count: int
@@ -104,7 +115,7 @@ SessionContext
Command
start_ts, end_ts: float
first_token_hash: str # SHA-256 of first token only
first_token_hash: str # SHA-256, first token only
tab_count, shortcut_count, pipe_count: int
errored: bool
output_bytes: int
@@ -121,507 +132,373 @@ PromptLine
## The 37 primitives
### Motor (9) — muscle memory and physical interaction style
### Motor (9)
These primitives capture *how* an operator's fingers interact with the
keyboard — patterns that persist across sessions, accounts, and even
operating systems.
Motor primitives capture muscle memory and physical interaction patterns.
They are among the most stable signals across sessions and across different
machines used by the same operator.
#### 1. `input_modality`
#### `input_modality`
Values: `typed` | `pasted` | `mixed`
Ratio of paste events to total input events. ≥40 % pasted and ≤5 %
typed `pasted`; ≤5 % pasted → `typed`; otherwise `mixed`.
Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed
`pasted`. ≤5 % pasted → `typed`. Otherwise `mixed`.
A script kiddie running pre-written one-liners pastes habitually. A
seasoned operator types most commands from memory.
#### 2. `paste_burst_rate`
#### `paste_burst_rate`
Values: `none` | `occasional` | `habitual`
Coarser bucketing of the paste ratio. ≥50 % → `habitual`,
≥10 % → `occasional`.
Coarser paste-ratio bucketing. ≥50 % → `habitual`, ≥10 % → `occasional`.
#### 3. `keystroke_cadence`
#### `keystroke_cadence`
Values: `steady` | `bursty` | `hunt_and_peck` | `machine`
Median coefficient of variation (CV) of within-burst inter-keystroke
intervals (IKIs).
intervals (IKIs):
| CV | Mean IKI | Label |
|---|---|---|
| < 0.30 | < 30 ms | `machine` |
| < 0.45 | any | `steady` |
| < 0.70 | any | `bursty` |
| < 0.30 | < 30 ms | `machine` — inhumanly uniform |
| < 0.45 | any | `steady` — trained touch typist |
| < 0.70 | any | `bursty` — thinks between phrases |
| ≥ 0.70 | any | `hunt_and_peck` |
`machine` catches automated input that passes as human visually but has
inhumanly uniform inter-key timing.
#### 4. `motor_stability`
#### `motor_stability`
Values: `steady` | `variable` | `tremor`
Fraction of IKIs below the tremor floor (30 ms). ≥20 % → `tremor`
(physiological or tool-simulated). Otherwise the median CV classifies
`steady` vs `variable`.
Fraction of IKIs below 30 ms. ≥20 % → `tremor` (physiological or
tool-simulated). Otherwise CV classifies `steady` vs `variable`.
#### 5. `error_correction`
#### `error_correction`
Values: `immediate` | `deferred` | `absent` | `route_around`
Timing of backspace relative to the preceding keystroke. Median ≤500 ms
`immediate` (noticed fast, muscle-memory correction). Median > 500 ms
`deferred` (reads output then corrects). Zero backspaces but kill-line
present → `route_around` (ctrl-u / ctrl-w). No corrections at all →
`absent`.
`immediate`. Median > 500 ms → `deferred`. No backspaces but kill-line
present → `route_around` (ctrl-u / ctrl-w). Nothing → `absent`.
#### 6. `command_chunking`
#### `command_chunking`
Values: `fluent` | `fragmented` | `single_command`
Median CV of per-command intra-typing IKIs. < 0.40 → `fluent` (commands
typed as rehearsed phrases). Otherwise `fragmented`. Only one command
in session → `single_command`.
typed as rehearsed phrases).
#### 7. `shell_mastery.tab_completion`
#### `shell_mastery.tab_completion`
Values: `none` | `occasional` | `habitual`
Fraction of commands containing at least one `0x09` (tab) keystroke.
0 → `none`, < 50 % → `occasional`, ≥ 50 % → `habitual`.
Fraction of commands containing ≥1 tab keystroke. 0 → `none`,
< 50 % → `occasional`, ≥50 % → `habitual`.
Operators who tab-complete heavily know the filesystem; those who never do
either memorise paths or are running a prepared script.
#### 8. `shell_mastery.shortcut_usage`
#### `shell_mastery.shortcut_usage`
Values: `none` | `moderate` | `heavy`
Readline control-byte count (ctrl-a, ctrl-e, ctrl-r, etc.) per command.
< 0.05 → `none`, < 0.15 → `moderate`, ≥ 0.15 → `heavy`.
Readline control-byte count per command. < 0.05 → `none`,
< 0.15 → `moderate`, ≥0.15 → `heavy`.
#### 9. `shell_mastery.pipe_chaining_depth`
#### `shell_mastery.pipe_chaining_depth`
Values: `shallow` | `moderate` | `deep`
Median pipe count per command. ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`.
---
### Cognitive (11) — decision-making and planning style
### Cognitive (11)
These primitives capture *how* an operator thinks — their command repertoire,
response to failure, and how much they read output before acting.
Cognitive primitives capture decision-making style, planning depth, and how
the operator processes feedback.
#### 10. `inter_command_latency_class`
#### `inter_command_latency_class`
Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long`
Median inter-command pause bucketed against calibrated thresholds:
| Threshold | Label | What it suggests |
| Threshold | Label | Interpretation |
|---|---|---|
| ≤ 0.30 s | `instant` | Scripted or replay |
| ≤ 1.50 s | `typing_speed` | Commands prepared, typing only |
| ≤ 2.00 s | `deliberate` | Reads output before next command |
| ≤ 8.00 s | `llm_lightweight` | May be consulting a fast LLM / notes |
| ≤ 2.00 s | `deliberate` | Reads output before acting |
| ≤ 8.00 s | `llm_lightweight` | Consulting a fast LLM or notes |
| ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference |
| > 30.00 s | `long` | Long pauses — possibly interrupted or cautious |
| > 30.00 s | `long` | Interrupted or cautious |
`llm_lightweight` and `llm_heavyweight` were calibrated against Claude
Free (fast) and Claude (slow) assisted operator sessions — a novel class
of adversary DECNET is designed to detect.
The `llm_*` thresholds were calibrated against real sessions of Claude-assisted
operators — a novel adversary class BEHAVE-SHELL is explicitly designed to
detect.
#### 11. `command_branch_diversity`
#### `command_branch_diversity`
Values: `linear_playbook` | `adaptive_branching` | `unknown`
Unique first-token / total command ratio. < 5 commands → `unknown`.
≥ 70 % unique → `linear_playbook` (each command is different — following
a prepared list). < 70 % → `adaptive_branching` (repeating tools,
iterating on a problem).
Unique first-token ratio. < 5 commands → `unknown`. ≥70 % unique →
`linear_playbook` (following a prepared list). < 70 % →
`adaptive_branching` (iterating on a problem).
#### 12. `feedback_loop_engagement`
#### `feedback_loop_engagement`
Values: `closed_loop` | `fire_and_forget` | `unknown`
Pearson correlation between per-command output bytes and the following
inter-command pause. r > 0.30 → `closed_loop` (pauses longer when there
is more output to read). Otherwise `fire_and_forget`. Requires ≥5
command/output/pause triples.
is more to read). Requires ≥5 triples.
#### 13. `inter_command_consistency`
#### `inter_command_consistency`
Values: `metronomic` | `variable` | `bimodal`
CV of inter-command IKIs. < 0.40 → `metronomic` (scripts, beacons).
> 1.50 → `bimodal` (two distinct paces — often short commands interleaved
with long waits for a compile or download). Otherwise `variable`.
> 1.50 → `bimodal` (short commands interleaved with long waits for
compiles or downloads).
#### 14. `cognitive_load`
#### `cognitive_load`
Values: `low` | `medium` | `high`
Composite score: mean of (intra-typing CV / 1.0, error rate, pause CV / 1.5).
< 0.33 → `low`, < 0.67 → `medium`, otherwise `high`.
Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).
High cognitive load across multiple sessions on the same identity is a
signal of an operator working outside their comfort zone — new target OS,
unfamiliar tooling, or time pressure.
#### 15. `exploration_style`
#### `exploration_style`
Values: `methodical` | `targeted` | `chaotic`
`repetition_rate` = 1 unique/total commands.
`backtrack_rate` = fraction of commands that jump back to a previously used
tool category. Backtrack ≥30 % → `chaotic`. Repetition ≥50 % → `targeted`
(narrow focus, known objective). Otherwise `methodical`.
`backtrack_rate` ≥30 % → `chaotic`. `repetition_rate` ≥50 % → `targeted`.
#### 16. `planning_depth`
#### `planning_depth`
Values: `deep` | `reactive` | `shallow`
`deep_pause_frac` = fraction of inter-command IKIs > 2.0 s.
`reactive_frac` = fraction ≤ 0.30 s. ≥40 % deep pauses → `deep`.
≥50 % reactive → `reactive`. Otherwise `shallow`.
Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).
#### 17. `tool_vocabulary`
#### `tool_vocabulary`
Values: `narrow` | `moderate` | `broad`
Distinct first-token count (absolute). ≤3 → `narrow`, ≥10 → `broad`.
Distinct first-token count. ≤3 → `narrow`, ≥10 → `broad`.
#### 18. `error_resilience.retry_tactic`
#### `error_resilience.retry_tactic`
Values: `retry_same` | `pivot` | `fallback`
Post-error behaviour: does the operator retry the same command, switch to
a different approach, or fall back to reconnaissance? Skipped if no errors
occurred in the session.
Post-error behaviour pattern. Skipped if no errors.
#### 19. `error_resilience.frustration_typing`
#### `error_resilience.frustration_typing`
Values: `low` | `moderate` | `high`
Delta between median intra-IKI after an error vs. after a success.
< 10 % delta → `low`, < 30 % → `moderate`, ≥30 % → `high`.
Fast typing after errors suggests frustration; slow typing suggests
deliberation.
#### 20. `error_resilience.fallback_to_man`
#### `error_resilience.fallback_to_man`
Values: `present` | `absent`
After an error, does the next command start with `man`, `help`, or `info`?
Skipped if no errors. `present` indicates an operator consulting
documentation — less automated, less rehearsed.
After an error, does the next command start with `man`/`help`/`info`?
---
### Temporal (4) — session rhythm and pacing
### Temporal (4)
#### 21. `session_duration`
#### `session_duration`
Values: `short` | `medium` | `long` | `marathon`
| Duration | Label |
|---|---|
| < 60 s | `short` — single recon or scan |
| < 600 s | `medium` — targeted interaction |
| < 3600 s | `long` — sustained operation |
| ≥ 3600 s | `marathon` — extended presence / slow-burn APT |
< 60 s / < 600 s / < 3600 s / ≥ 3600 s.
#### 22. `escalation_pattern`
#### `escalation_pattern`
Values: `bursty` | `sustained`
Dynamic window analysis (window width = max(10 s, duration / target)).
CV and zero-window fraction classify whether activity clusters into bursts
separated by idle periods, or maintains a consistent level throughout.
Dynamic window analysis of activity density over the session lifetime.
#### 23. `landing_ritual`
#### `landing_ritual`
Values: `cleanup` | `exploration` | `passive`
First ~5 commands classified by intent tokens. `cleanup` if the operator
immediately starts removing evidence; `exploration` if they run
reconnaissance commands (`id`, `whoami`, `uname`, `ls`); `passive` if
they do nothing that reveals intent.
Intent of the first ~5 commands.
#### 24. `exit_behavior`
#### `exit_behavior`
Values: `cleanup` | `standard` | `anomalous`
Last ~5 commands. `cleanup` if history/log deletion or `exit`/`logout`
appears. `anomalous` if the session ends abruptly with no recognisable
closing pattern.
Intent of the last ~5 commands.
---
### Environmental (5) — operator's local setup
### Environmental (5)
These are stable across an operator's career and change only when they
switch machines or retool.
Environmental primitives are stable across an operator's career — they change
only when the operator switches machines or deliberately retools.
#### 25. `shell_type`
#### `shell_type`
Values: `bash` | `sh` | `zsh` | `fish` | `unknown`
Detected from PS1 prompt regex patterns after ANSI stripping.
Detected from PS1 prompt regex patterns.
#### 26. `terminal_multiplexer`
#### `terminal_multiplexer`
Values: `tmux` | `screen` | `none`
Detected from PS1 markers and characteristic escape sequences.
Detected from PS1 markers and escape sequences.
#### 27. `locale`
#### `locale`
Values: `en-US` | `en` | `other` | `unknown`
Language-specific keywords in prompt lines and error messages.
#### 28. `keyboard_layout`
#### `keyboard_layout`
Values: `qwerty` | `dvorak` | `colemak` | `other`
Bigram frequency analysis of the typed character stream. Operators who
touch-type on Dvorak produce a statistically distinct bigram distribution
that persists even when typing non-English commands.
Bigram frequency analysis of the typed character stream. An operator who
touch-types on Dvorak produces a statistically distinct bigram distribution
that persists even when typing non-English commands — this is a pure
stylometric signal derived from motor habit.
#### 29. `numpad_usage`
#### `numpad_usage`
Values: `occasional` | `frequent` | `none`
Keystroke pattern detection for numpad-originated digits.
---
### Operational (4) — mission and OPSEC posture
### Operational (4)
#### 30. `objective`
#### `objective`
Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive`
Token-based intent classification of command first-tokens. Majority vote
across classified tokens; precedence order applied for ties. Skipped if
fewer than 3 classified tokens.
Token-based intent classification. Majority vote; skipped if < 3
classified tokens.
Example token mappings:
- `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat`
- `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync`
- `persistence`: `crontab`, `echo`, `tee`, `systemctl`, `rc.local`
- `persistence`: `crontab`, `echo >> ~/.bashrc`, `systemctl enable`
- `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec`
- `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill`
#### 31. `opsec_discipline`
#### `opsec_discipline`
Values: `careful` | `learning` | `careless`
Presence of history-disabling tokens (`unset HISTFILE`, `HISTSIZE=0`,
`history -c`) and cleanup activity in the session tail. Both`careful`.
History-only → `learning` (knows to cover tracks but forgets cleanup).
Neither → `careless`.
Presence of history-disabling tokens and cleanup activity. Both →
`careful`. History only → `learning`. Neither`careless`.
#### 32. `cleanup_behavior`
#### `cleanup_behavior`
Values: `thorough` | `partial` | `none`
Distinct cleanup tokens in the last 5 commands. ≥3 → `thorough`,
12 → `partial`, 0 → `none`.
Distinct cleanup tokens in the session tail. ≥3 → `thorough`,
12 → `partial`.
#### 33. `multi_actor_indicators`
#### `multi_actor_indicators`
Values: `solo` | `handoff_detected`
Splits commands at the session's temporal midpoint and compares the median
intra-IKI of each half. If the delta exceeds 50 % and both halves have
≥4 commands, `handoff_detected` is emitted — the session was likely shared
between two operators (e.g. initial access handed to a post-exploitation
specialist).
Splits commands at the session midpoint and compares median intra-IKI of
each half. Delta > 50 % with both halves having ≥4 commands →
`handoff_detected`. Suggests the session was shared between two operators
(initial access handed to a post-exploitation specialist, or a shared
credential).
---
### Emotional valence (4) — stress and cognitive state
### Emotional valence (4)
These features have a hard confidence cap of **0.50** — they contribute to
attribution but cannot dominate it. They require ≥80 typed letters to emit.
These primitives sit at the boundary of motor and stylometric signal. They
require ≥80 typed letters and carry a hard confidence cap of **0.50**
they contribute to attribution but cannot dominate it.
#### 34. `valence`
#### `valence`
Values: `positive` | `neutral` | `negative`
Lexical positive/negative token counts. `positive` if positive count >
(negative + obscenity) and ≥2 positive tokens.
Lexical positive/negative token counts. `positive` requires positive count
> (negative + obscenity) with ≥2 positive tokens.
#### 35. `arousal`
#### `arousal`
Values: `low_calm` | `medium_engaged` | `high_agitated`
`high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest
IKI < 60 ms on ≥30 keystrokes. `low_calm` if slowest IKI > 300 ms.
Otherwise `medium_engaged`.
IKI < 60 ms on ≥30 keystrokes.
#### 36. `stress_response`
#### `stress_response`
Values: `none` | `eustress_positive` | `distress_negative`
Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (types
faster under pressure — experienced). ≤ 1/1.20 → `distress` (types
slower — less experienced or genuinely stressed).
Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (experienced,
types faster under pressure). ≤ 1/1.20 → `distress`.
#### 37. `frustration_venting`
#### `frustration_venting`
Values: `low` | `moderate` | `high`
Post-error frustration token count plus obscenity count.
Post-error frustration token count plus obscenity count. A purely
lexicometric signal.
---
## Attribution state machine
## Attribution
Primitives feed a per-`(identity_uuid, primitive)` state machine in
`decnet/correlation/attribution/aggregate.py`.
BEHAVE-SHELL does not define how observations are aggregated — that is the
responsibility of the implementing system's attribution engine. The DECNET
reference implementation uses a five-state machine per
`(identity_uuid, primitive)`:
### States
| State | Condition |
|---|---|
| `unknown` | < 3 observations |
| `stable` | Recent N agree, no drift from older N |
| `drifting` | Recent N agree but differ from older N |
| `conflicted` | Recent N are split |
| `multi_actor` | `conflicted` + cross-session alternation |
| State | Meaning | Condition |
|---|---|---|
| `unknown` | Insufficient data | < 3 observations |
| `stable` | Consistent value | Recent N agree AND no drift from older N |
| `drifting` | Recently changed | Recent N agree BUT differ from older N |
| `conflicted` | Contradictory values | Recent N are split (high CV) |
| `multi_actor` | Multiple operators | `conflicted` + cross-session alternation |
Window size N = 5 (categorical primitives). EWMA is used for numeric
primitives (Phase 3).
### Multi-actor detection
The attribution worker runs a `_multi_actor_tick` every 60 seconds. For
every `(identity, primitive)` pair in `conflicted` state, it checks whether
the alternation pattern across sessions is consistent with a credential
being shared between two distinct operators. When ≥2 primitives
independently flag `multi_actor` for the same identity, the bus emits:
```
attribution.profile.multi_actor_suspected
{identity_uuid, primitives: [...], evidence_summary, confidence, ts}
```
`confidence` is capped at 0.60 — cross-primitive agreement is the real
signal, but a hard cap prevents over-alarming on noisy primitives.
---
## Database tables
### `ObservationRow`
One row per `(evidence_ref, primitive)`. `evidence_ref` is the session
shard identifier — the `UniqueConstraint` makes re-processing idempotent.
| Column | Type | Description |
|---|---|---|
| `id` | UUID PK | |
| `identity_uuid` | FK → `attacker_identities` | |
| `attacker_uuid` | FK → `attackers` | Direct link for pre-clusterer path |
| `evidence_ref` | TEXT | Shard ID |
| `primitive` | TEXT | e.g. `keystroke_cadence` |
| `value` | TEXT | Categorical label or serialised numeric |
| `confidence` | FLOAT | 0.01.0 |
| `observed_at` | DATETIME | Session end time |
### `AttributionStateRow`
One row per `(identity_uuid, primitive)`. Updated by the attribution
worker each time a new observation arrives.
| Column | Type | Description |
|---|---|---|
| `identity_uuid` | FK → `attacker_identities` | |
| `primitive` | TEXT | |
| `state` | TEXT | `unknown`/`stable`/`drifting`/`conflicted`/`multi_actor` |
| `current_value` | TEXT | Most recent or EWMA value |
| `confidence` | FLOAT | |
| `observation_count` | INT | Total observations aggregated |
| `last_observation_ts` | DATETIME | |
---
## Key thresholds
All calibration constants live in `decnet/profiler/behave_shell/_thresholds.py`
(416 lines). The values below are the defaults; they can be overridden per
deployment without touching feature code.
| Constant | Value | Used by |
|---|---|---|
| `PASTE_MIN_CHARS_PER_EVENT` | 4 | Paste detection |
| `PASTE_BURST_MAX_IAT_S` | 0.20 | Paste burst grouping |
| `MODALITY_PASTED_MIN` | 0.40 | `input_modality` |
| `CV_STEADY_MAX` | 0.45 | `keystroke_cadence` |
| `TREMOR_FAST_FLOOR_S` | 0.030 | `motor_stability` |
| `IKI_THINK_MAX_S` | 2.0 | Typing-burst split |
| `INTER_CMD_INSTANT_MAX` | 0.30 s | `inter_command_latency_class` |
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | LLM-assisted detection |
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | LLM-assisted detection |
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | `command_branch_diversity` |
| `FEEDBACK_CORRELATION_MIN` | 0.30 | `feedback_loop_engagement` |
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | `inter_command_consistency` |
| `PAUSE_CV_BIMODAL_MIN` | 1.50 | `inter_command_consistency` |
| `SESSION_DURATION_SHORT_MAX` | 60 s | `session_duration` |
| `SESSION_DURATION_MEDIUM_MAX` | 600 s | `session_duration` |
| `SESSION_DURATION_LONG_MAX` | 3600 s | `session_duration` |
| `MIN_OBSERVATIONS_FOR_STATE` | 3 | Attribution state machine |
| `CATEGORICAL_WINDOW_N` | 5 | Attribution window |
| `MULTI_ACTOR_TICK_SECS` | 60 | Multi-actor tick |
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | All `emotional_valence` features |
Window N = 5 for categorical primitives. When ≥2 primitives independently
reach `multi_actor` for the same identity, the engine emits a
`multi_actor_suspected` signal — a strong indicator of a shared credential
or a compromised operator account.
---
## Calibration
The system was calibrated against five behavioural classes across 15 sessions
(424 total observations):
The reference thresholds were calibrated against five behavioural classes
across 15 sessions (424 total observations):
| Class | Sessions | Observations | Description |
|---|---|---|---|
| `HUMAN` | 1 | 34 | Human operator, no assistance |
| `HUMAN` | 1 | 34 | Human operator, unassisted |
| `YOU-sim` | 2 | 59 | Human-simulated scripted attacker |
| `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator |
| `CLAUDE-FF` | 3 | 84 | Claude (fast/free tier) assisted |
| `CLAUDE-CL` | 4 | 111 | Claude (standard tier) assisted |
| `CLAUDE-FF` | 3 | 84 | Claude (fast) assisted |
| `CLAUDE-CL` | 4 | 111 | Claude (standard) assisted |
All classes emit ≥27 distinct primitives (pass threshold).
The `inter_command_latency_class` thresholds `llm_lightweight` (≤8 s) and
`llm_heavyweight` (≤30 s) were derived from timing measurements of these
sessions — DECNET can distinguish a human-with-fast-LLM from an unassisted
human in a single session with moderate confidence, and with high confidence
across 3+ sessions.
All classes emit ≥27 distinct primitives. The `inter_command_latency_class`
LLM buckets are the primary discriminator between unassisted and
LLM-assisted operators in single-session analysis; cross-session attribution
uses the full primitive set.
---
## Testing
## Key thresholds (reference implementation)
```bash
# Offline smoke test — 5 shards, mock bus, must emit ≥27 distinct per class
scripts/behave_shell/smoke.sh
All constants live in `_thresholds.py`.
# Live round-trip — replay calibration shards through a running DECNET
scripts/behave_shell/replay_calibration.py
```
| Constant | Value |
|---|---|
| `PASTE_MIN_CHARS_PER_EVENT` | 4 |
| `PASTE_BURST_MAX_IAT_S` | 0.20 |
| `IKI_THINK_MAX_S` | 2.0 (typing-burst split) |
| `TREMOR_FAST_FLOOR_S` | 0.030 |
| `CV_STEADY_MAX` | 0.45 |
| `INTER_CMD_INSTANT_MAX` | 0.30 s |
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s |
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s |
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 |
| `FEEDBACK_CORRELATION_MIN` | 0.30 |
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 |
| `PAUSE_CV_BIMODAL_MIN` | 1.50 |
| `SESSION_DURATION_SHORT_MAX` | 60 s |
| `SESSION_DURATION_MEDIUM_MAX` | 600 s |
| `SESSION_DURATION_LONG_MAX` | 3600 s |
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 |
| `MIN_OBSERVATIONS_FOR_STATE` | 3 |
| `CATEGORICAL_WINDOW_N` | 5 |
---
## File reference
## DECNET implementation
```
decnet/profiler/behave_shell/
__init__.py Public API: extract_session()
extract.py Entry point — fans out to FEATURES registry (51 lines)
_ctx.py SessionContext builder (573 lines)
_parse.py Asciinema JSONL parsing (272 lines)
_handler.py Bus subscriber — disk I/O, persistence, publish (235 lines)
_intent.py Token → intent classification (115 lines)
_thresholds.py All calibration constants (416 lines)
_features/
__init__.py FEATURES registry — list of 37 functions (104 lines)
motor.py Primitives 19 (422 lines)
cognitive.py Primitives 1020 (593 lines)
temporal.py Primitives 2124 (237 lines)
environmental.py Primitives 2529 (352 lines)
operational.py Primitives 3033 (218 lines)
emotional_valence.py Primitives 3437 (223 lines)
In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every
`attacker.session.ended` bus event. The worker reads the PTY shard from disk,
runs `extract_session()`, and upserts one `ObservationRow` per primitive per
session. A `UniqueConstraint(evidence_ref, primitive)` makes re-processing
idempotent.
decnet/correlation/
attribution_worker.py Bus loop: consume observations, run tick
attribution/
aggregate.py State machine: unknown→stable→drifting→conflicted→multi_actor
_thresholds.py Attribution-layer thresholds
The attribution worker consumes `attacker.observation.*` bus events and
maintains one `AttributionStateRow` per `(identity_uuid, primitive)`.
decnet/web/db/models/
observations.py ObservationRow schema
attribution_state.py AttributionStateRow schema
```
Source: `decnet/profiler/behave_shell/` (~3 868 lines across 12 files).
---
## Related pages
## See also
- [Fingerprinting](Fingerprinting) — all fingerprint layers, including the
BEHAVE-SHELL summary
- [Identity-Resolution](Identity-Resolution) — how observations are clustered
into attacker identities and how state machine transitions propagate
- [Service-Personas](Service-Personas) — enabling session recording and
BEHAVE-SHELL per service
- **BEHAVE-TEXT** — sibling spec for written-text stylometry and lexicometry,
implemented by [EYENET](https://github.com/xmartlab/eyenet)
- [Fingerprinting](Fingerprinting) — all DECNET fingerprint layers
- [Identity-Resolution](Identity-Resolution) — how observations feed the
identity clusterer