docs: reframe BEHAVE-SHELL as a spec, not a DECNET component — add stylometry/lexicometry scope, BEHAVE-TEXT/EYENET cross-reference
631
BEHAVE-SHELL.md
631
BEHAVE-SHELL.md
@@ -1,95 +1,106 @@
|
||||
# BEHAVE-SHELL
|
||||
|
||||
BEHAVE-SHELL is DECNET's behavioural biometrics engine for interactive shell
|
||||
sessions. It transforms raw PTY recordings into 37 attribution primitives
|
||||
that fingerprint *how* an operator works — their motor patterns, cognitive
|
||||
style, OPSEC habits, and emotional state — independently of what IP address
|
||||
or tooling they use.
|
||||
BEHAVE-SHELL is a **behavioural biometrics specification** for interactive
|
||||
shell sessions. It defines a set of attribution primitives — observable,
|
||||
computable signals — that characterise *how* an operator works at a terminal,
|
||||
independently of what IP address, credential, or tooling they use.
|
||||
|
||||
The primitives feed the [Identity-Resolution](Identity-Resolution) attribution
|
||||
state machine, which accumulates evidence across sessions to answer: *is this
|
||||
the same hands?*
|
||||
The spec was born out of DECNET's need to correlate attackers across sessions
|
||||
and IP changes, but it is not DECNET-specific. Any system that records PTY
|
||||
sessions can implement BEHAVE-SHELL extraction and feed the resulting
|
||||
primitives into an attribution engine. DECNET is the reference implementation.
|
||||
|
||||
A sibling specification, **BEHAVE-TEXT**, defines equivalent primitives for
|
||||
written text — stylometry, lexicometry, and discourse structure — and is
|
||||
implemented by [EYENET](https://github.com/xmartlab/eyenet).
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The
|
||||
current specification covers three broad domains:
|
||||
|
||||
| Domain | What it captures |
|
||||
|---|---|
|
||||
| **Motor biometrics** | Keystroke timing, error correction, paste vs. type habits, shell mastery signals |
|
||||
| **Cognitive / behavioural** | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure |
|
||||
| **Stylometry / lexicometry** | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions |
|
||||
|
||||
The emotional valence cluster (`valence`, `arousal`, `stress_response`,
|
||||
`frustration_venting`) sits at the boundary of motor and stylometric signal —
|
||||
it measures both typing speed changes and lexical content after stress events.
|
||||
|
||||
---
|
||||
|
||||
## Design principles
|
||||
|
||||
- **Pure extraction library.** `extract_session()` takes an iterable of
|
||||
asciinema events and yields `Observation` envelopes. No I/O, no DB access,
|
||||
no bus calls. The worker owns all side effects.
|
||||
- **PII by design.** Command text is never stored in plain form — only the
|
||||
- **Extraction is pure.** The spec defines a function
|
||||
`extract_session(events) → Observations` that takes an iterable of timestamped
|
||||
PTY events and yields structured observations. No I/O. No database.
|
||||
No side effects. Implementations are free to run this in any context.
|
||||
|
||||
- **PII by design.** Command text is never stored in plain form. Only the
|
||||
SHA-256 of the first token is retained. Output is reduced to a byte count
|
||||
and an error verdict. Prompt lines are ANSI-stripped and capped at 256
|
||||
characters.
|
||||
- **Idempotent persistence.** `UniqueConstraint(evidence_ref, primitive)`
|
||||
on the observations table means replaying a shard never duplicates rows.
|
||||
- **Confidence capping.** Emotional-valence features carry a hard confidence
|
||||
cap of 0.50 — they contribute, but never dominate an attribution decision.
|
||||
characters. Raw bigram/unigram counts are used for layout fingerprinting —
|
||||
not the text itself.
|
||||
|
||||
- **Confidence is explicit.** Every observation carries a confidence value
|
||||
[0.0–1.0]. Features that are inherently noisier have hard confidence caps
|
||||
(emotional valence: 0.50). Attribution engines must propagate confidence
|
||||
rather than treating all observations as equal.
|
||||
|
||||
- **Skip conditions over imputation.** A feature that cannot be computed on a
|
||||
given session (e.g. `error_resilience` features when no errors occurred)
|
||||
yields no observation rather than a default value. Attribution engines
|
||||
treat absence of an observation differently from an `unknown` state.
|
||||
|
||||
---
|
||||
|
||||
## Data flow
|
||||
## Input format
|
||||
|
||||
```
|
||||
PTY session
|
||||
│
|
||||
▼
|
||||
sessrec.c — writes JSONL shard per session
|
||||
│ {"sid": id, "t": ts, "ch": "i"|"o", "d": data}
|
||||
│ Non-UTF-8 bytes handled via surrogateescape
|
||||
▼
|
||||
attacker.session.ended bus event
|
||||
│
|
||||
▼
|
||||
_handler.handle_session_ended()
|
||||
│ Reads shard from disk → parse_shard_line() → AsciinemaEvent tuples
|
||||
▼
|
||||
build_session_context() (_ctx.py, ~573 lines)
|
||||
│ Seven derivation steps (see below)
|
||||
▼
|
||||
extract_session() (extract.py)
|
||||
│ Fan-out across 37 registered feature functions (FEATURES registry)
|
||||
│ Each yields 0..N Observation envelopes
|
||||
▼
|
||||
Upsert ObservationRow → publish attacker.observation.*
|
||||
│
|
||||
▼
|
||||
attribution_worker (attribution_worker.py)
|
||||
│ Consumes attacker.observation.> bus events
|
||||
│ Runs aggregate() per (identity_uuid, primitive)
|
||||
▼
|
||||
AttributionStateRow state ∈ {unknown, stable, drifting, conflicted, multi_actor}
|
||||
BEHAVE-SHELL operates on **asciinema-compatible event streams**: sequences of
|
||||
`(t: float, ch: "i"|"o", d: str)` tuples representing timestamped input and
|
||||
output chunks from a PTY session. `"i"` is operator input; `"o"` is terminal
|
||||
output. Non-UTF-8 bytes are handled via surrogateescape.
|
||||
|
||||
The DECNET implementation records these as JSONL shards via `sessrec.c`:
|
||||
|
||||
```json
|
||||
{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
|
||||
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session context derivation
|
||||
|
||||
`build_session_context()` performs a single-pass walk over the raw event
|
||||
stream and produces a `SessionContext` that all 37 feature functions read.
|
||||
The seven derivation steps, in order:
|
||||
Before feature extraction, a single-pass walk over the event stream builds a
|
||||
`SessionContext` — a set of derived signals that all feature functions share.
|
||||
The derivation steps, in order:
|
||||
|
||||
| Step | What it computes |
|
||||
| Step | Output |
|
||||
|---|---|
|
||||
| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars within 200 ms) into `paste_bursts` |
|
||||
| **Typing-burst segmentation** | Splits the keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]` (dropped if < 3 IATs) |
|
||||
| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line sequences (`0x15`, `0x17`); records IATs between each backspace and the preceding keystroke |
|
||||
| **Per-command intra-typing IATs** | For each command, extracts keystroke inter-arrival times from that command's span only |
|
||||
| **Command segmentation** | Splits on `\r`/`\n`; per command records `first_token_hash` (SHA-256), tab count, readline shortcut count, and pipe count |
|
||||
| **Inter-command IAT gaps** | Time between consecutive commands |
|
||||
| **Error detection** | Scans output between commands for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
|
||||
| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix after ANSI stripping; caps at 256 chars |
|
||||
| **Keyboard layout fingerprinting** | Builds unigram and bigram histograms from typed letters |
|
||||
| **Paste-burst detection** | Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts` |
|
||||
| **Typing-burst segmentation** | Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs |
|
||||
| **Correction signals** | Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke |
|
||||
| **Per-command intra-typing IKIs** | For each command, IKIs from that command's span only |
|
||||
| **Command segmentation** | Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count |
|
||||
| **Inter-command IKI gaps** | Time between consecutive commands |
|
||||
| **Error detection** | Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored` |
|
||||
| **PS1 prompt detection** | Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars |
|
||||
| **Keyboard layout fingerprinting** | Unigram and bigram histograms from typed letters |
|
||||
| **Lexical counters** | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run |
|
||||
|
||||
### Key data structures
|
||||
### Key structures
|
||||
|
||||
```
|
||||
SessionContext
|
||||
sid: str
|
||||
t_start, t_end, duration_s: float
|
||||
input_events, output_events: tuple[AsciinemaEvent]
|
||||
iats: tuple[float] # inter-keystroke intervals
|
||||
input_events, output_events: tuple[Event]
|
||||
iats: tuple[float] # inter-keystroke intervals
|
||||
paste_bursts: tuple[PasteBurst]
|
||||
typing_bursts: tuple[tuple[float]]
|
||||
backspace_count, kill_line_count: int
|
||||
@@ -104,7 +115,7 @@ SessionContext
|
||||
|
||||
Command
|
||||
start_ts, end_ts: float
|
||||
first_token_hash: str # SHA-256 of first token only
|
||||
first_token_hash: str # SHA-256, first token only
|
||||
tab_count, shortcut_count, pipe_count: int
|
||||
errored: bool
|
||||
output_bytes: int
|
||||
@@ -121,507 +132,373 @@ PromptLine
|
||||
|
||||
## The 37 primitives
|
||||
|
||||
### Motor (9) — muscle memory and physical interaction style
|
||||
### Motor (9)
|
||||
|
||||
These primitives capture *how* an operator's fingers interact with the
|
||||
keyboard — patterns that persist across sessions, accounts, and even
|
||||
operating systems.
|
||||
Motor primitives capture muscle memory and physical interaction patterns.
|
||||
They are among the most stable signals across sessions and across different
|
||||
machines used by the same operator.
|
||||
|
||||
#### 1. `input_modality`
|
||||
#### `input_modality`
|
||||
Values: `typed` | `pasted` | `mixed`
|
||||
|
||||
Ratio of paste events to total input events. ≥40 % pasted and ≤5 %
|
||||
typed → `pasted`; ≤5 % pasted → `typed`; otherwise `mixed`.
|
||||
Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed
|
||||
→ `pasted`. ≤5 % pasted → `typed`. Otherwise `mixed`.
|
||||
|
||||
A script kiddie running pre-written one-liners pastes habitually. A
|
||||
seasoned operator types most commands from memory.
|
||||
|
||||
#### 2. `paste_burst_rate`
|
||||
#### `paste_burst_rate`
|
||||
Values: `none` | `occasional` | `habitual`
|
||||
|
||||
Coarser bucketing of the paste ratio. ≥50 % → `habitual`,
|
||||
≥10 % → `occasional`.
|
||||
Coarser paste-ratio bucketing. ≥50 % → `habitual`, ≥10 % → `occasional`.
|
||||
|
||||
#### 3. `keystroke_cadence`
|
||||
#### `keystroke_cadence`
|
||||
Values: `steady` | `bursty` | `hunt_and_peck` | `machine`
|
||||
|
||||
Median coefficient of variation (CV) of within-burst inter-keystroke
|
||||
intervals (IKIs).
|
||||
intervals (IKIs):
|
||||
|
||||
| CV | Mean IKI | Label |
|
||||
|---|---|---|
|
||||
| < 0.30 | < 30 ms | `machine` |
|
||||
| < 0.45 | any | `steady` |
|
||||
| < 0.70 | any | `bursty` |
|
||||
| < 0.30 | < 30 ms | `machine` — inhumanly uniform |
|
||||
| < 0.45 | any | `steady` — trained touch typist |
|
||||
| < 0.70 | any | `bursty` — thinks between phrases |
|
||||
| ≥ 0.70 | any | `hunt_and_peck` |
|
||||
|
||||
`machine` catches automated input that passes as human visually but has
|
||||
inhumanly uniform inter-key timing.
|
||||
|
||||
#### 4. `motor_stability`
|
||||
#### `motor_stability`
|
||||
Values: `steady` | `variable` | `tremor`
|
||||
|
||||
Fraction of IKIs below the tremor floor (30 ms). ≥20 % → `tremor`
|
||||
(physiological or tool-simulated). Otherwise the median CV classifies
|
||||
`steady` vs `variable`.
|
||||
Fraction of IKIs below 30 ms. ≥20 % → `tremor` (physiological or
|
||||
tool-simulated). Otherwise CV classifies `steady` vs `variable`.
|
||||
|
||||
#### 5. `error_correction`
|
||||
#### `error_correction`
|
||||
Values: `immediate` | `deferred` | `absent` | `route_around`
|
||||
|
||||
Timing of backspace relative to the preceding keystroke. Median ≤500 ms
|
||||
→ `immediate` (noticed fast, muscle-memory correction). Median > 500 ms
|
||||
→ `deferred` (reads output then corrects). Zero backspaces but kill-line
|
||||
present → `route_around` (ctrl-u / ctrl-w). No corrections at all →
|
||||
`absent`.
|
||||
→ `immediate`. Median > 500 ms → `deferred`. No backspaces but kill-line
|
||||
present → `route_around` (ctrl-u / ctrl-w). Nothing → `absent`.
|
||||
|
||||
#### 6. `command_chunking`
|
||||
#### `command_chunking`
|
||||
Values: `fluent` | `fragmented` | `single_command`
|
||||
|
||||
Median CV of per-command intra-typing IKIs. < 0.40 → `fluent` (commands
|
||||
typed as rehearsed phrases). Otherwise `fragmented`. Only one command
|
||||
in session → `single_command`.
|
||||
typed as rehearsed phrases).
|
||||
|
||||
#### 7. `shell_mastery.tab_completion`
|
||||
#### `shell_mastery.tab_completion`
|
||||
Values: `none` | `occasional` | `habitual`
|
||||
|
||||
Fraction of commands containing at least one `0x09` (tab) keystroke.
|
||||
0 → `none`, < 50 % → `occasional`, ≥ 50 % → `habitual`.
|
||||
Fraction of commands containing ≥1 tab keystroke. 0 → `none`,
|
||||
< 50 % → `occasional`, ≥50 % → `habitual`.
|
||||
|
||||
Operators who tab-complete heavily know the filesystem; those who never do
|
||||
either memorise paths or are running a prepared script.
|
||||
|
||||
#### 8. `shell_mastery.shortcut_usage`
|
||||
#### `shell_mastery.shortcut_usage`
|
||||
Values: `none` | `moderate` | `heavy`
|
||||
|
||||
Readline control-byte count (ctrl-a, ctrl-e, ctrl-r, etc.) per command.
|
||||
< 0.05 → `none`, < 0.15 → `moderate`, ≥ 0.15 → `heavy`.
|
||||
Readline control-byte count per command. < 0.05 → `none`,
|
||||
< 0.15 → `moderate`, ≥0.15 → `heavy`.
|
||||
|
||||
#### 9. `shell_mastery.pipe_chaining_depth`
|
||||
#### `shell_mastery.pipe_chaining_depth`
|
||||
Values: `shallow` | `moderate` | `deep`
|
||||
|
||||
Median pipe count per command. ≤1 → `shallow`, 2 → `moderate`, ≥3 → `deep`.
|
||||
|
||||
---
|
||||
|
||||
### Cognitive (11) — decision-making and planning style
|
||||
### Cognitive (11)
|
||||
|
||||
These primitives capture *how* an operator thinks — their command repertoire,
|
||||
response to failure, and how much they read output before acting.
|
||||
Cognitive primitives capture decision-making style, planning depth, and how
|
||||
the operator processes feedback.
|
||||
|
||||
#### 10. `inter_command_latency_class`
|
||||
#### `inter_command_latency_class`
|
||||
Values: `instant` | `typing_speed` | `deliberate` | `llm_lightweight` | `llm_heavyweight` | `long`
|
||||
|
||||
Median inter-command pause bucketed against calibrated thresholds:
|
||||
|
||||
| Threshold | Label | What it suggests |
|
||||
| Threshold | Label | Interpretation |
|
||||
|---|---|---|
|
||||
| ≤ 0.30 s | `instant` | Scripted or replay |
|
||||
| ≤ 1.50 s | `typing_speed` | Commands prepared, typing only |
|
||||
| ≤ 2.00 s | `deliberate` | Reads output before next command |
|
||||
| ≤ 8.00 s | `llm_lightweight` | May be consulting a fast LLM / notes |
|
||||
| ≤ 2.00 s | `deliberate` | Reads output before acting |
|
||||
| ≤ 8.00 s | `llm_lightweight` | Consulting a fast LLM or notes |
|
||||
| ≤ 30.00 s | `llm_heavyweight` | Consulting a slow LLM or manual reference |
|
||||
| > 30.00 s | `long` | Long pauses — possibly interrupted or cautious |
|
||||
| > 30.00 s | `long` | Interrupted or cautious |
|
||||
|
||||
`llm_lightweight` and `llm_heavyweight` were calibrated against Claude
|
||||
Free (fast) and Claude (slow) assisted operator sessions — a novel class
|
||||
of adversary DECNET is designed to detect.
|
||||
The `llm_*` thresholds were calibrated against real sessions of Claude-assisted
|
||||
operators — a novel adversary class BEHAVE-SHELL is explicitly designed to
|
||||
detect.
|
||||
|
||||
#### 11. `command_branch_diversity`
|
||||
#### `command_branch_diversity`
|
||||
Values: `linear_playbook` | `adaptive_branching` | `unknown`
|
||||
|
||||
Unique first-token / total command ratio. < 5 commands → `unknown`.
|
||||
≥ 70 % unique → `linear_playbook` (each command is different — following
|
||||
a prepared list). < 70 % → `adaptive_branching` (repeating tools,
|
||||
iterating on a problem).
|
||||
Unique first-token ratio. < 5 commands → `unknown`. ≥70 % unique →
|
||||
`linear_playbook` (following a prepared list). < 70 % →
|
||||
`adaptive_branching` (iterating on a problem).
|
||||
|
||||
#### 12. `feedback_loop_engagement`
|
||||
#### `feedback_loop_engagement`
|
||||
Values: `closed_loop` | `fire_and_forget` | `unknown`
|
||||
|
||||
Pearson correlation between per-command output bytes and the following
|
||||
inter-command pause. r > 0.30 → `closed_loop` (pauses longer when there
|
||||
is more output to read). Otherwise `fire_and_forget`. Requires ≥5
|
||||
command/output/pause triples.
|
||||
is more to read). Requires ≥5 triples.
|
||||
|
||||
#### 13. `inter_command_consistency`
|
||||
#### `inter_command_consistency`
|
||||
Values: `metronomic` | `variable` | `bimodal`
|
||||
|
||||
CV of inter-command IKIs. < 0.40 → `metronomic` (scripts, beacons).
|
||||
> 1.50 → `bimodal` (two distinct paces — often short commands interleaved
|
||||
with long waits for a compile or download). Otherwise `variable`.
|
||||
> 1.50 → `bimodal` (short commands interleaved with long waits for
|
||||
compiles or downloads).
|
||||
|
||||
#### 14. `cognitive_load`
|
||||
#### `cognitive_load`
|
||||
Values: `low` | `medium` | `high`
|
||||
|
||||
Composite score: mean of (intra-typing CV / 1.0, error rate, pause CV / 1.5).
|
||||
< 0.33 → `low`, < 0.67 → `medium`, otherwise `high`.
|
||||
Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).
|
||||
|
||||
High cognitive load across multiple sessions on the same identity is a
|
||||
signal of an operator working outside their comfort zone — new target OS,
|
||||
unfamiliar tooling, or time pressure.
|
||||
|
||||
#### 15. `exploration_style`
|
||||
#### `exploration_style`
|
||||
Values: `methodical` | `targeted` | `chaotic`
|
||||
|
||||
`repetition_rate` = 1 − unique/total commands.
|
||||
`backtrack_rate` = fraction of commands that jump back to a previously used
|
||||
tool category. Backtrack ≥30 % → `chaotic`. Repetition ≥50 % → `targeted`
|
||||
(narrow focus, known objective). Otherwise `methodical`.
|
||||
`backtrack_rate` ≥30 % → `chaotic`. `repetition_rate` ≥50 % → `targeted`.
|
||||
|
||||
#### 16. `planning_depth`
|
||||
#### `planning_depth`
|
||||
Values: `deep` | `reactive` | `shallow`
|
||||
|
||||
`deep_pause_frac` = fraction of inter-command IKIs > 2.0 s.
|
||||
`reactive_frac` = fraction ≤ 0.30 s. ≥40 % deep pauses → `deep`.
|
||||
≥50 % reactive → `reactive`. Otherwise `shallow`.
|
||||
Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).
|
||||
|
||||
#### 17. `tool_vocabulary`
|
||||
#### `tool_vocabulary`
|
||||
Values: `narrow` | `moderate` | `broad`
|
||||
|
||||
Distinct first-token count (absolute). ≤3 → `narrow`, ≥10 → `broad`.
|
||||
Distinct first-token count. ≤3 → `narrow`, ≥10 → `broad`.
|
||||
|
||||
#### 18. `error_resilience.retry_tactic`
|
||||
#### `error_resilience.retry_tactic`
|
||||
Values: `retry_same` | `pivot` | `fallback`
|
||||
|
||||
Post-error behaviour: does the operator retry the same command, switch to
|
||||
a different approach, or fall back to reconnaissance? Skipped if no errors
|
||||
occurred in the session.
|
||||
Post-error behaviour pattern. Skipped if no errors.
|
||||
|
||||
#### 19. `error_resilience.frustration_typing`
|
||||
#### `error_resilience.frustration_typing`
|
||||
Values: `low` | `moderate` | `high`
|
||||
|
||||
Delta between median intra-IKI after an error vs. after a success.
|
||||
< 10 % delta → `low`, < 30 % → `moderate`, ≥30 % → `high`.
|
||||
|
||||
Fast typing after errors suggests frustration; slow typing suggests
|
||||
deliberation.
|
||||
|
||||
#### 20. `error_resilience.fallback_to_man`
|
||||
#### `error_resilience.fallback_to_man`
|
||||
Values: `present` | `absent`
|
||||
|
||||
After an error, does the next command start with `man`, `help`, or `info`?
|
||||
Skipped if no errors. `present` indicates an operator consulting
|
||||
documentation — less automated, less rehearsed.
|
||||
After an error, does the next command start with `man`/`help`/`info`?
|
||||
|
||||
---
|
||||
|
||||
### Temporal (4) — session rhythm and pacing
|
||||
### Temporal (4)
|
||||
|
||||
#### 21. `session_duration`
|
||||
#### `session_duration`
|
||||
Values: `short` | `medium` | `long` | `marathon`
|
||||
|
||||
| Duration | Label |
|
||||
|---|---|
|
||||
| < 60 s | `short` — single recon or scan |
|
||||
| < 600 s | `medium` — targeted interaction |
|
||||
| < 3600 s | `long` — sustained operation |
|
||||
| ≥ 3600 s | `marathon` — extended presence / slow-burn APT |
|
||||
< 60 s / < 600 s / < 3600 s / ≥ 3600 s.
|
||||
|
||||
#### 22. `escalation_pattern`
|
||||
#### `escalation_pattern`
|
||||
Values: `bursty` | `sustained`
|
||||
|
||||
Dynamic window analysis (window width = max(10 s, duration / target)).
|
||||
CV and zero-window fraction classify whether activity clusters into bursts
|
||||
separated by idle periods, or maintains a consistent level throughout.
|
||||
Dynamic window analysis of activity density over the session lifetime.
|
||||
|
||||
#### 23. `landing_ritual`
|
||||
#### `landing_ritual`
|
||||
Values: `cleanup` | `exploration` | `passive`
|
||||
|
||||
First ~5 commands classified by intent tokens. `cleanup` if the operator
|
||||
immediately starts removing evidence; `exploration` if they run
|
||||
reconnaissance commands (`id`, `whoami`, `uname`, `ls`); `passive` if
|
||||
they do nothing that reveals intent.
|
||||
Intent of the first ~5 commands.
|
||||
|
||||
#### 24. `exit_behavior`
|
||||
#### `exit_behavior`
|
||||
Values: `cleanup` | `standard` | `anomalous`
|
||||
|
||||
Last ~5 commands. `cleanup` if history/log deletion or `exit`/`logout`
|
||||
appears. `anomalous` if the session ends abruptly with no recognisable
|
||||
closing pattern.
|
||||
Intent of the last ~5 commands.
|
||||
|
||||
---
|
||||
|
||||
### Environmental (5) — operator's local setup
|
||||
### Environmental (5)
|
||||
|
||||
These are stable across an operator's career and change only when they
|
||||
switch machines or retool.
|
||||
Environmental primitives are stable across an operator's career — they change
|
||||
only when the operator switches machines or deliberately retools.
|
||||
|
||||
#### 25. `shell_type`
|
||||
#### `shell_type`
|
||||
Values: `bash` | `sh` | `zsh` | `fish` | `unknown`
|
||||
|
||||
Detected from PS1 prompt regex patterns after ANSI stripping.
|
||||
Detected from PS1 prompt regex patterns.
|
||||
|
||||
#### 26. `terminal_multiplexer`
|
||||
#### `terminal_multiplexer`
|
||||
Values: `tmux` | `screen` | `none`
|
||||
|
||||
Detected from PS1 markers and characteristic escape sequences.
|
||||
Detected from PS1 markers and escape sequences.
|
||||
|
||||
#### 27. `locale`
|
||||
#### `locale`
|
||||
Values: `en-US` | `en` | `other` | `unknown`
|
||||
|
||||
Language-specific keywords in prompt lines and error messages.
|
||||
|
||||
#### 28. `keyboard_layout`
|
||||
#### `keyboard_layout`
|
||||
Values: `qwerty` | `dvorak` | `colemak` | `other`
|
||||
|
||||
Bigram frequency analysis of the typed character stream. Operators who
|
||||
touch-type on Dvorak produce a statistically distinct bigram distribution
|
||||
that persists even when typing non-English commands.
|
||||
Bigram frequency analysis of the typed character stream. An operator who
|
||||
touch-types on Dvorak produces a statistically distinct bigram distribution
|
||||
that persists even when typing non-English commands — this is a pure
|
||||
stylometric signal derived from motor habit.
|
||||
|
||||
#### 29. `numpad_usage`
|
||||
#### `numpad_usage`
|
||||
Values: `occasional` | `frequent` | `none`
|
||||
|
||||
Keystroke pattern detection for numpad-originated digits.
|
||||
|
||||
---
|
||||
|
||||
### Operational (4) — mission and OPSEC posture
|
||||
### Operational (4)
|
||||
|
||||
#### 30. `objective`
|
||||
#### `objective`
|
||||
Values: `recon` | `exfil` | `persistence` | `lateral` | `destructive`
|
||||
|
||||
Token-based intent classification of command first-tokens. Majority vote
|
||||
across classified tokens; precedence order applied for ties. Skipped if
|
||||
fewer than 3 classified tokens.
|
||||
Token-based intent classification. Majority vote; skipped if < 3
|
||||
classified tokens.
|
||||
|
||||
Example token mappings:
|
||||
- `recon`: `id`, `whoami`, `uname`, `cat`, `find`, `ls`, `ps`, `netstat`
|
||||
- `exfil`: `scp`, `curl`, `wget`, `base64`, `nc`, `rsync`
|
||||
- `persistence`: `crontab`, `echo`, `tee`, `systemctl`, `rc.local`
|
||||
- `persistence`: `crontab`, `echo >> ~/.bashrc`, `systemctl enable`
|
||||
- `lateral`: `ssh`, `xfreerdp`, `psexec`, `wmiexec`
|
||||
- `destructive`: `rm`, `shred`, `dd`, `mkfs`, `kill`
|
||||
|
||||
#### 31. `opsec_discipline`
|
||||
#### `opsec_discipline`
|
||||
Values: `careful` | `learning` | `careless`
|
||||
|
||||
Presence of history-disabling tokens (`unset HISTFILE`, `HISTSIZE=0`,
|
||||
`history -c`) and cleanup activity in the session tail. Both → `careful`.
|
||||
History-only → `learning` (knows to cover tracks but forgets cleanup).
|
||||
Neither → `careless`.
|
||||
Presence of history-disabling tokens and cleanup activity. Both →
|
||||
`careful`. History only → `learning`. Neither → `careless`.
|
||||
|
||||
#### 32. `cleanup_behavior`
|
||||
#### `cleanup_behavior`
|
||||
Values: `thorough` | `partial` | `none`
|
||||
|
||||
Distinct cleanup tokens in the last 5 commands. ≥3 → `thorough`,
|
||||
1–2 → `partial`, 0 → `none`.
|
||||
Distinct cleanup tokens in the session tail. ≥3 → `thorough`,
|
||||
1–2 → `partial`.
|
||||
|
||||
#### 33. `multi_actor_indicators`
|
||||
#### `multi_actor_indicators`
|
||||
Values: `solo` | `handoff_detected`
|
||||
|
||||
Splits commands at the session's temporal midpoint and compares the median
|
||||
intra-IKI of each half. If the delta exceeds 50 % and both halves have
|
||||
≥4 commands, `handoff_detected` is emitted — the session was likely shared
|
||||
between two operators (e.g. initial access handed to a post-exploitation
|
||||
specialist).
|
||||
Splits commands at the session midpoint and compares median intra-IKI of
|
||||
each half. Delta > 50 % with both halves having ≥4 commands →
|
||||
`handoff_detected`. Suggests the session was shared between two operators
|
||||
(initial access handed to a post-exploitation specialist, or a shared
|
||||
credential).
|
||||
|
||||
---
|
||||
|
||||
### Emotional valence (4) — stress and cognitive state
|
||||
### Emotional valence (4)
|
||||
|
||||
These features have a hard confidence cap of **0.50** — they contribute to
|
||||
attribution but cannot dominate it. They require ≥80 typed letters to emit.
|
||||
These primitives sit at the boundary of motor and stylometric signal. They
|
||||
require ≥80 typed letters and carry a hard confidence cap of **0.50** —
|
||||
they contribute to attribution but cannot dominate it.
|
||||
|
||||
#### 34. `valence`
|
||||
#### `valence`
|
||||
Values: `positive` | `neutral` | `negative`
|
||||
|
||||
Lexical positive/negative token counts. `positive` if positive count >
|
||||
(negative + obscenity) and ≥2 positive tokens.
|
||||
Lexical positive/negative token counts. `positive` requires positive count
|
||||
> (negative + obscenity) with ≥2 positive tokens.
|
||||
|
||||
#### 35. `arousal`
|
||||
#### `arousal`
|
||||
Values: `low_calm` | `medium_engaged` | `high_agitated`
|
||||
|
||||
`high_agitated` if ≥5 consecutive caps, ≥3 consecutive `!`, or fastest
|
||||
IKI < 60 ms on ≥30 keystrokes. `low_calm` if slowest IKI > 300 ms.
|
||||
Otherwise `medium_engaged`.
|
||||
IKI < 60 ms on ≥30 keystrokes.
|
||||
|
||||
#### 36. `stress_response`
|
||||
#### `stress_response`
|
||||
Values: `none` | `eustress_positive` | `distress_negative`
|
||||
|
||||
Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (types
|
||||
faster under pressure — experienced). ≤ 1/1.20 → `distress` (types
|
||||
slower — less experienced or genuinely stressed).
|
||||
Post-error vs baseline typing speed ratio. ≥1.20 → `eustress` (experienced,
|
||||
types faster under pressure). ≤ 1/1.20 → `distress`.
|
||||
|
||||
#### 37. `frustration_venting`
|
||||
#### `frustration_venting`
|
||||
Values: `low` | `moderate` | `high`
|
||||
|
||||
Post-error frustration token count plus obscenity count.
|
||||
Post-error frustration token count plus obscenity count. A purely
|
||||
lexicometric signal.
|
||||
|
||||
---
|
||||
|
||||
## Attribution state machine
|
||||
## Attribution
|
||||
|
||||
Primitives feed a per-`(identity_uuid, primitive)` state machine in
|
||||
`decnet/correlation/attribution/aggregate.py`.
|
||||
BEHAVE-SHELL does not define how observations are aggregated — that is the
|
||||
responsibility of the implementing system's attribution engine. The DECNET
|
||||
reference implementation uses a five-state machine per
|
||||
`(identity_uuid, primitive)`:
|
||||
|
||||
### States
|
||||
| State | Condition |
|
||||
|---|---|
|
||||
| `unknown` | < 3 observations |
|
||||
| `stable` | Recent N agree, no drift from older N |
|
||||
| `drifting` | Recent N agree but differ from older N |
|
||||
| `conflicted` | Recent N are split |
|
||||
| `multi_actor` | `conflicted` + cross-session alternation |
|
||||
|
||||
| State | Meaning | Condition |
|
||||
|---|---|---|
|
||||
| `unknown` | Insufficient data | < 3 observations |
|
||||
| `stable` | Consistent value | Recent N agree AND no drift from older N |
|
||||
| `drifting` | Recently changed | Recent N agree BUT differ from older N |
|
||||
| `conflicted` | Contradictory values | Recent N are split (high CV) |
|
||||
| `multi_actor` | Multiple operators | `conflicted` + cross-session alternation |
|
||||
|
||||
Window size N = 5 (categorical primitives). EWMA is used for numeric
|
||||
primitives (Phase 3).
|
||||
|
||||
### Multi-actor detection
|
||||
|
||||
The attribution worker runs a `_multi_actor_tick` every 60 seconds. For
|
||||
every `(identity, primitive)` pair in `conflicted` state, it checks whether
|
||||
the alternation pattern across sessions is consistent with a credential
|
||||
being shared between two distinct operators. When ≥2 primitives
|
||||
independently flag `multi_actor` for the same identity, the bus emits:
|
||||
|
||||
```
|
||||
attribution.profile.multi_actor_suspected
|
||||
{identity_uuid, primitives: [...], evidence_summary, confidence, ts}
|
||||
```
|
||||
|
||||
`confidence` is capped at 0.60 — cross-primitive agreement is the real
|
||||
signal, but a hard cap prevents over-alarming on noisy primitives.
|
||||
|
||||
---
|
||||
|
||||
## Database tables
|
||||
|
||||
### `ObservationRow`
|
||||
|
||||
One row per `(evidence_ref, primitive)`. `evidence_ref` is the session
|
||||
shard identifier — the `UniqueConstraint` makes re-processing idempotent.
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| `id` | UUID PK | |
|
||||
| `identity_uuid` | FK → `attacker_identities` | |
|
||||
| `attacker_uuid` | FK → `attackers` | Direct link for pre-clusterer path |
|
||||
| `evidence_ref` | TEXT | Shard ID |
|
||||
| `primitive` | TEXT | e.g. `keystroke_cadence` |
|
||||
| `value` | TEXT | Categorical label or serialised numeric |
|
||||
| `confidence` | FLOAT | 0.0–1.0 |
|
||||
| `observed_at` | DATETIME | Session end time |
|
||||
|
||||
### `AttributionStateRow`
|
||||
|
||||
One row per `(identity_uuid, primitive)`. Updated by the attribution
|
||||
worker each time a new observation arrives.
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| `identity_uuid` | FK → `attacker_identities` | |
|
||||
| `primitive` | TEXT | |
|
||||
| `state` | TEXT | `unknown`/`stable`/`drifting`/`conflicted`/`multi_actor` |
|
||||
| `current_value` | TEXT | Most recent or EWMA value |
|
||||
| `confidence` | FLOAT | |
|
||||
| `observation_count` | INT | Total observations aggregated |
|
||||
| `last_observation_ts` | DATETIME | |
|
||||
|
||||
---
|
||||
|
||||
## Key thresholds
|
||||
|
||||
All calibration constants live in `decnet/profiler/behave_shell/_thresholds.py`
|
||||
(416 lines). The values below are the defaults; they can be overridden per
|
||||
deployment without touching feature code.
|
||||
|
||||
| Constant | Value | Used by |
|
||||
|---|---|---|
|
||||
| `PASTE_MIN_CHARS_PER_EVENT` | 4 | Paste detection |
|
||||
| `PASTE_BURST_MAX_IAT_S` | 0.20 | Paste burst grouping |
|
||||
| `MODALITY_PASTED_MIN` | 0.40 | `input_modality` |
|
||||
| `CV_STEADY_MAX` | 0.45 | `keystroke_cadence` |
|
||||
| `TREMOR_FAST_FLOOR_S` | 0.030 | `motor_stability` |
|
||||
| `IKI_THINK_MAX_S` | 2.0 | Typing-burst split |
|
||||
| `INTER_CMD_INSTANT_MAX` | 0.30 s | `inter_command_latency_class` |
|
||||
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s | LLM-assisted detection |
|
||||
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s | LLM-assisted detection |
|
||||
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 | `command_branch_diversity` |
|
||||
| `FEEDBACK_CORRELATION_MIN` | 0.30 | `feedback_loop_engagement` |
|
||||
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 | `inter_command_consistency` |
|
||||
| `PAUSE_CV_BIMODAL_MIN` | 1.50 | `inter_command_consistency` |
|
||||
| `SESSION_DURATION_SHORT_MAX` | 60 s | `session_duration` |
|
||||
| `SESSION_DURATION_MEDIUM_MAX` | 600 s | `session_duration` |
|
||||
| `SESSION_DURATION_LONG_MAX` | 3600 s | `session_duration` |
|
||||
| `MIN_OBSERVATIONS_FOR_STATE` | 3 | Attribution state machine |
|
||||
| `CATEGORICAL_WINDOW_N` | 5 | Attribution window |
|
||||
| `MULTI_ACTOR_TICK_SECS` | 60 | Multi-actor tick |
|
||||
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 | All `emotional_valence` features |
|
||||
Window N = 5 for categorical primitives. When ≥2 primitives independently
|
||||
reach `multi_actor` for the same identity, the engine emits a
|
||||
`multi_actor_suspected` signal — a strong indicator of a shared credential
|
||||
or a compromised operator account.
|
||||
|
||||
---
|
||||
|
||||
## Calibration
|
||||
|
||||
The system was calibrated against five behavioural classes across 15 sessions
|
||||
(424 total observations):
|
||||
The reference thresholds were calibrated against five behavioural classes
|
||||
across 15 sessions (424 total observations):
|
||||
|
||||
| Class | Sessions | Observations | Description |
|
||||
|---|---|---|---|
|
||||
| `HUMAN` | 1 | 34 | Human operator, no assistance |
|
||||
| `HUMAN` | 1 | 34 | Human operator, unassisted |
|
||||
| `YOU-sim` | 2 | 59 | Human-simulated scripted attacker |
|
||||
| `LW-sim` | 5 | 136 | Lightweight LLM-assisted operator |
|
||||
| `CLAUDE-FF` | 3 | 84 | Claude (fast/free tier) assisted |
|
||||
| `CLAUDE-CL` | 4 | 111 | Claude (standard tier) assisted |
|
||||
| `CLAUDE-FF` | 3 | 84 | Claude (fast) assisted |
|
||||
| `CLAUDE-CL` | 4 | 111 | Claude (standard) assisted |
|
||||
|
||||
All classes emit ≥27 distinct primitives (pass threshold).
|
||||
|
||||
The `inter_command_latency_class` thresholds `llm_lightweight` (≤8 s) and
|
||||
`llm_heavyweight` (≤30 s) were derived from timing measurements of these
|
||||
sessions — DECNET can distinguish a human-with-fast-LLM from an unassisted
|
||||
human in a single session with moderate confidence, and with high confidence
|
||||
across 3+ sessions.
|
||||
All classes emit ≥27 distinct primitives. The `inter_command_latency_class`
|
||||
LLM buckets are the primary discriminator between unassisted and
|
||||
LLM-assisted operators in single-session analysis; cross-session attribution
|
||||
uses the full primitive set.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
## Key thresholds (reference implementation)
|
||||
|
||||
```bash
|
||||
# Offline smoke test — 5 shards, mock bus, must emit ≥27 distinct per class
|
||||
scripts/behave_shell/smoke.sh
|
||||
All constants live in `_thresholds.py`.
|
||||
|
||||
# Live round-trip — replay calibration shards through a running DECNET
|
||||
scripts/behave_shell/replay_calibration.py
|
||||
```
|
||||
| Constant | Value |
|
||||
|---|---|
|
||||
| `PASTE_MIN_CHARS_PER_EVENT` | 4 |
|
||||
| `PASTE_BURST_MAX_IAT_S` | 0.20 |
|
||||
| `IKI_THINK_MAX_S` | 2.0 (typing-burst split) |
|
||||
| `TREMOR_FAST_FLOOR_S` | 0.030 |
|
||||
| `CV_STEADY_MAX` | 0.45 |
|
||||
| `INTER_CMD_INSTANT_MAX` | 0.30 s |
|
||||
| `INTER_CMD_LLM_LIGHTWEIGHT_MAX` | 8.0 s |
|
||||
| `INTER_CMD_LLM_HEAVYWEIGHT_MAX` | 30.0 s |
|
||||
| `BRANCH_DIVERSITY_LINEAR_MIN` | 0.70 |
|
||||
| `FEEDBACK_CORRELATION_MIN` | 0.30 |
|
||||
| `PAUSE_CV_METRONOMIC_MAX` | 0.40 |
|
||||
| `PAUSE_CV_BIMODAL_MIN` | 1.50 |
|
||||
| `SESSION_DURATION_SHORT_MAX` | 60 s |
|
||||
| `SESSION_DURATION_MEDIUM_MAX` | 600 s |
|
||||
| `SESSION_DURATION_LONG_MAX` | 3600 s |
|
||||
| `EMOTIONAL_VALENCE_CONFIDENCE_CAP` | 0.50 |
|
||||
| `MIN_OBSERVATIONS_FOR_STATE` | 3 |
|
||||
| `CATEGORICAL_WINDOW_N` | 5 |
|
||||
|
||||
---
|
||||
|
||||
## File reference
|
||||
## DECNET implementation
|
||||
|
||||
```
|
||||
decnet/profiler/behave_shell/
|
||||
__init__.py Public API: extract_session()
|
||||
extract.py Entry point — fans out to FEATURES registry (51 lines)
|
||||
_ctx.py SessionContext builder (573 lines)
|
||||
_parse.py Asciinema JSONL parsing (272 lines)
|
||||
_handler.py Bus subscriber — disk I/O, persistence, publish (235 lines)
|
||||
_intent.py Token → intent classification (115 lines)
|
||||
_thresholds.py All calibration constants (416 lines)
|
||||
_features/
|
||||
__init__.py FEATURES registry — list of 37 functions (104 lines)
|
||||
motor.py Primitives 1–9 (422 lines)
|
||||
cognitive.py Primitives 10–20 (593 lines)
|
||||
temporal.py Primitives 21–24 (237 lines)
|
||||
environmental.py Primitives 25–29 (352 lines)
|
||||
operational.py Primitives 30–33 (218 lines)
|
||||
emotional_valence.py Primitives 34–37 (223 lines)
|
||||
In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every
|
||||
`attacker.session.ended` bus event. The worker reads the PTY shard from disk,
|
||||
runs `extract_session()`, and upserts one `ObservationRow` per primitive per
|
||||
session. A `UniqueConstraint(evidence_ref, primitive)` makes re-processing
|
||||
idempotent.
|
||||
|
||||
decnet/correlation/
|
||||
attribution_worker.py Bus loop: consume observations, run tick
|
||||
attribution/
|
||||
aggregate.py State machine: unknown→stable→drifting→conflicted→multi_actor
|
||||
_thresholds.py Attribution-layer thresholds
|
||||
The attribution worker consumes `attacker.observation.*` bus events and
|
||||
maintains one `AttributionStateRow` per `(identity_uuid, primitive)`.
|
||||
|
||||
decnet/web/db/models/
|
||||
observations.py ObservationRow schema
|
||||
attribution_state.py AttributionStateRow schema
|
||||
```
|
||||
Source: `decnet/profiler/behave_shell/` (~3 868 lines across 12 files).
|
||||
|
||||
---
|
||||
|
||||
## Related pages
|
||||
## See also
|
||||
|
||||
- [Fingerprinting](Fingerprinting) — all fingerprint layers, including the
|
||||
BEHAVE-SHELL summary
|
||||
- [Identity-Resolution](Identity-Resolution) — how observations are clustered
|
||||
into attacker identities and how state machine transitions propagate
|
||||
- [Service-Personas](Service-Personas) — enabling session recording and
|
||||
BEHAVE-SHELL per service
|
||||
- **BEHAVE-TEXT** — sibling spec for written-text stylometry and lexicometry,
|
||||
implemented by [EYENET](https://github.com/xmartlab/eyenet)
|
||||
- [Fingerprinting](Fingerprinting) — all DECNET fingerprint layers
|
||||
- [Identity-Resolution](Identity-Resolution) — how observations feed the
|
||||
identity clusterer
|
||||
|
||||
Reference in New Issue
Block a user