4
BEHAVE SHELL
anti edited this page 2026-05-10 04:35:57 -04:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

BEHAVE-SHELL

BEHAVE-SHELL is a behavioural biometrics specification for interactive shell sessions. It defines a set of attribution primitives — observable, computable signals — that characterise how an operator works at a terminal, independently of what IP address, credential, or tooling they use.

The spec was born out of DECNET's need to correlate attackers across sessions and IP changes, but it is not DECNET-specific. Any system that records PTY sessions can implement BEHAVE-SHELL extraction and feed the resulting primitives into an attribution engine. DECNET is the reference implementation.

A sibling specification, BEHAVE-TEXT, defines equivalent primitives for written text: stylometry, lexicometry, and discourse structure. It's in being worked on in a different project.


Scope

BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The current specification covers three broad domains:

Domain What it captures
Motor biometrics Keystroke timing, error correction, paste vs. type habits, shell mastery signals
Cognitive / behavioural Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure
Stylometry / lexicometry Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions

The emotional valence cluster (valence, arousal, stress_response, frustration_venting) sits at the boundary of motor and stylometric signal — it measures both typing speed changes and lexical content after stress events.


Design principles

  • Extraction is pure. The spec defines a function extract_session(events) → Observations that takes an iterable of timestamped PTY events and yields structured observations. No I/O. No database. No side effects. Implementations are free to run this in any context.

  • PII by design. Command text is never stored in plain form. Only the SHA-256 of the first token is retained. Output is reduced to a byte count and an error verdict. Prompt lines are ANSI-stripped and capped at 256 characters. Raw bigram/unigram counts are used for layout fingerprinting — not the text itself.

  • Confidence is explicit. Every observation carries a confidence value [0.01.0]. Features that are inherently noisier have hard confidence caps (emotional valence: 0.50). Attribution engines must propagate confidence rather than treating all observations as equal.

  • Skip conditions over imputation. A feature that cannot be computed on a given session (e.g. error_resilience features when no errors occurred) yields no observation rather than a default value. Attribution engines treat absence of an observation differently from an unknown state.


Input format

BEHAVE-SHELL operates on asciinema-compatible event streams: sequences of (t: float, ch: "i"|"o", d: str) tuples representing timestamped input and output chunks from a PTY session. "i" is operator input; "o" is terminal output. Non-UTF-8 bytes are handled via surrogateescape.

The DECNET implementation records these as JSONL shards via sessrec.c:

{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}

Session context derivation

Before feature extraction, a single-pass walk over the event stream builds a SessionContext — a set of derived signals that all feature functions share. The derivation steps, in order:

Step Output
Paste-burst detection Groups consecutive paste-class events (≥4 chars, within 200 ms) into paste_bursts
Typing-burst segmentation Splits keystroke stream at think-pauses > 2.0 s into typing_bursts[][]; drops bursts < 3 IKIs
Correction signals Counts backspaces (0x7f, 0x08) and kill-line (0x15, 0x17); records IKI between each backspace and the preceding keystroke
Per-command intra-typing IKIs For each command, IKIs from that command's span only
Command segmentation Splits on \r/\n; per command: first_token_hash (SHA-256), tab count, readline shortcut count, pipe count
Inter-command IKI gaps Time between consecutive commands
Error detection Scans output for canonical error patterns ("command not found", "Permission denied", "No such file") to set command.errored
PS1 prompt detection Regex for $, #, %, > suffix; ANSI-stripped, capped at 256 chars
Keyboard layout fingerprinting Unigram and bigram histograms from typed letters
Lexical counters Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive ! run

Key structures

SessionContext
  sid: str
  t_start, t_end, duration_s: float
  input_events, output_events: tuple[Event]
  iats: tuple[float]                       # inter-keystroke intervals
  paste_bursts: tuple[PasteBurst]
  typing_bursts: tuple[tuple[float]]
  backspace_count, kill_line_count: int
  intra_command_iats: tuple[tuple[float]]
  commands: tuple[Command]
  inter_cmd_iats: tuple[float]
  prompt_lines: tuple[PromptLine]
  typed_unigram_counts, typed_bigram_counts: Mapping[str, int]
  typed_letter_count: int
  obscenity_hits, positive_lex_hits, negative_lex_hits: int
  caps_run_max, bang_run_max: int

Command
  start_ts, end_ts: float
  first_token_hash: str    # SHA-256, first token only
  tab_count, shortcut_count, pipe_count: int
  errored: bool
  output_bytes: int
  followed_by_prompt: bool

PromptLine
  ts: float
  suffix_char: str         # $ # % >
  raw_line: str            # ANSI-stripped, ≤256 chars
  is_root: bool

The 37 primitives

Motor (9)

Motor primitives capture muscle memory and physical interaction patterns. They are among the most stable signals across sessions and across different machines used by the same operator.

input_modality

Values: typed | pasted | mixed

Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed → pasted. ≤5 % pasted → typed. Otherwise mixed.

A script kiddie running pre-written one-liners pastes habitually. A seasoned operator types most commands from memory.

paste_burst_rate

Values: none | occasional | habitual

Coarser paste-ratio bucketing. ≥50 % → habitual, ≥10 % → occasional.

keystroke_cadence

Values: steady | bursty | hunt_and_peck | machine

Median coefficient of variation (CV) of within-burst inter-keystroke intervals (IKIs):

CV Mean IKI Label
< 0.30 < 30 ms machine — inhumanly uniform
< 0.45 any steady — trained touch typist
< 0.70 any bursty — thinks between phrases
≥ 0.70 any hunt_and_peck

motor_stability

Values: steady | variable | tremor

Fraction of IKIs below 30 ms. ≥20 % → tremor (physiological or tool-simulated). Otherwise CV classifies steady vs variable.

error_correction

Values: immediate | deferred | absent | route_around

Timing of backspace relative to the preceding keystroke. Median ≤500 ms → immediate. Median > 500 ms → deferred. No backspaces but kill-line present → route_around (ctrl-u / ctrl-w). Nothing → absent.

command_chunking

Values: fluent | fragmented | single_command

Median CV of per-command intra-typing IKIs. < 0.40 → fluent (commands typed as rehearsed phrases).

shell_mastery.tab_completion

Values: none | occasional | habitual

Fraction of commands containing ≥1 tab keystroke. 0 → none, < 50 % → occasional, ≥50 % → habitual.

shell_mastery.shortcut_usage

Values: none | moderate | heavy

Readline control-byte count per command. < 0.05 → none, < 0.15 → moderate, ≥0.15 → heavy.

shell_mastery.pipe_chaining_depth

Values: shallow | moderate | deep

Median pipe count per command. ≤1 → shallow, 2 → moderate, ≥3 → deep.


Cognitive (11)

Cognitive primitives capture decision-making style, planning depth, and how the operator processes feedback.

inter_command_latency_class

Values: instant | typing_speed | deliberate | llm_lightweight | llm_heavyweight | long

Median inter-command pause bucketed against calibrated thresholds:

Threshold Label Interpretation
≤ 0.30 s instant Scripted or replay
≤ 1.50 s typing_speed Commands prepared, typing only
≤ 2.00 s deliberate Reads output before acting
≤ 8.00 s llm_lightweight Consulting a fast LLM or notes
≤ 30.00 s llm_heavyweight Consulting a slow LLM or manual reference
> 30.00 s long Interrupted or cautious

The llm_* thresholds were calibrated against real sessions of Claude-assisted operators — a novel adversary class BEHAVE-SHELL is explicitly designed to detect.

command_branch_diversity

Values: linear_playbook | adaptive_branching | unknown

Unique first-token ratio. < 5 commands → unknown. ≥70 % unique → linear_playbook (following a prepared list). < 70 % → adaptive_branching (iterating on a problem).

feedback_loop_engagement

Values: closed_loop | fire_and_forget | unknown

Pearson correlation between per-command output bytes and the following inter-command pause. r > 0.30 → closed_loop (pauses longer when there is more to read). Requires ≥5 triples.

inter_command_consistency

Values: metronomic | variable | bimodal

CV of inter-command IKIs. < 0.40 → metronomic (scripts, beacons).

1.50 → bimodal (short commands interleaved with long waits for compiles or downloads).

cognitive_load

Values: low | medium | high

Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).

exploration_style

Values: methodical | targeted | chaotic

backtrack_rate ≥30 % → chaotic. repetition_rate ≥50 % → targeted.

planning_depth

Values: deep | reactive | shallow

Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).

tool_vocabulary

Values: narrow | moderate | broad

Distinct first-token count. ≤3 → narrow, ≥10 → broad.

error_resilience.retry_tactic

Values: retry_same | pivot | fallback

Post-error behaviour pattern. Skipped if no errors.

error_resilience.frustration_typing

Values: low | moderate | high

Delta between median intra-IKI after an error vs. after a success.

error_resilience.fallback_to_man

Values: present | absent

After an error, does the next command start with man/help/info?


Temporal (4)

session_duration

Values: short | medium | long | marathon

< 60 s / < 600 s / < 3600 s / ≥ 3600 s.

escalation_pattern

Values: bursty | sustained

Dynamic window analysis of activity density over the session lifetime.

landing_ritual

Values: cleanup | exploration | passive

Intent of the first ~5 commands.

exit_behavior

Values: cleanup | standard | anomalous

Intent of the last ~5 commands.


Environmental (5)

Environmental primitives are stable across an operator's career — they change only when the operator switches machines or deliberately retools.

shell_type

Values: bash | sh | zsh | fish | unknown

Detected from PS1 prompt regex patterns.

terminal_multiplexer

Values: tmux | screen | none

Detected from PS1 markers and escape sequences.

locale

Values: en-US | en | other | unknown

Language-specific keywords in prompt lines and error messages.

keyboard_layout

Values: qwerty | dvorak | colemak | other

Bigram frequency analysis of the typed character stream. An operator who touch-types on Dvorak produces a statistically distinct bigram distribution that persists even when typing non-English commands — this is a pure stylometric signal derived from motor habit.

numpad_usage

Values: occasional | frequent | none


Operational (4)

objective

Values: recon | exfil | persistence | lateral | destructive

Token-based intent classification. Majority vote; skipped if < 3 classified tokens.

Example token mappings:

  • recon: id, whoami, uname, cat, find, ls, ps, netstat
  • exfil: scp, curl, wget, base64, nc, rsync
  • persistence: crontab, echo >> ~/.bashrc, systemctl enable
  • lateral: ssh, xfreerdp, psexec, wmiexec
  • destructive: rm, shred, dd, mkfs, kill

opsec_discipline

Values: careful | learning | careless

Presence of history-disabling tokens and cleanup activity. Both → careful. History only → learning. Neither → careless.

cleanup_behavior

Values: thorough | partial | none

Distinct cleanup tokens in the session tail. ≥3 → thorough, 12 → partial.

multi_actor_indicators

Values: solo | handoff_detected

Splits commands at the session midpoint and compares median intra-IKI of each half. Delta > 50 % with both halves having ≥4 commands → handoff_detected. Suggests the session was shared between two operators (initial access handed to a post-exploitation specialist, or a shared credential).


Emotional valence (4)

These primitives sit at the boundary of motor and stylometric signal. They require ≥80 typed letters and carry a hard confidence cap of 0.50 — they contribute to attribution but cannot dominate it.

valence

Values: positive | neutral | negative

Lexical positive/negative token counts. positive requires positive count

(negative + obscenity) with ≥2 positive tokens.

arousal

Values: low_calm | medium_engaged | high_agitated

high_agitated if ≥5 consecutive caps, ≥3 consecutive !, or fastest IKI < 60 ms on ≥30 keystrokes.

stress_response

Values: none | eustress_positive | distress_negative

Post-error vs baseline typing speed ratio. ≥1.20 → eustress (experienced, types faster under pressure). ≤ 1/1.20 → distress.

frustration_venting

Values: low | moderate | high

Post-error frustration token count plus obscenity count. A purely lexicometric signal.


Attribution

BEHAVE-SHELL does not define how observations are aggregated — that is the responsibility of the implementing system's attribution engine. The DECNET reference implementation uses a five-state machine per (identity_uuid, primitive):

State Condition
unknown < 3 observations
stable Recent N agree, no drift from older N
drifting Recent N agree but differ from older N
conflicted Recent N are split
multi_actor conflicted + cross-session alternation

Window N = 5 for categorical primitives. When ≥2 primitives independently reach multi_actor for the same identity, the engine emits a multi_actor_suspected signal — a strong indicator of a shared credential or a compromised operator account.


Calibration

The reference thresholds were calibrated against five behavioural classes across 15 sessions (424 total observations):

Class Sessions Observations Description
HUMAN 1 34 Human operator, unassisted
YOU-sim 2 59 Human-simulated scripted attacker
LW-sim 5 136 Lightweight LLM-assisted operator
CLAUDE-FF 3 84 Claude (fast) assisted
CLAUDE-CL 4 111 Claude (standard) assisted

All classes emit ≥27 distinct primitives. The inter_command_latency_class LLM buckets are the primary discriminator between unassisted and LLM-assisted operators in single-session analysis; cross-session attribution uses the full primitive set.


Key thresholds (reference implementation)

All constants live in _thresholds.py.

Constant Value
PASTE_MIN_CHARS_PER_EVENT 4
PASTE_BURST_MAX_IAT_S 0.20
IKI_THINK_MAX_S 2.0 (typing-burst split)
TREMOR_FAST_FLOOR_S 0.030
CV_STEADY_MAX 0.45
INTER_CMD_INSTANT_MAX 0.30 s
INTER_CMD_LLM_LIGHTWEIGHT_MAX 8.0 s
INTER_CMD_LLM_HEAVYWEIGHT_MAX 30.0 s
BRANCH_DIVERSITY_LINEAR_MIN 0.70
FEEDBACK_CORRELATION_MIN 0.30
PAUSE_CV_METRONOMIC_MAX 0.40
PAUSE_CV_BIMODAL_MIN 1.50
SESSION_DURATION_SHORT_MAX 60 s
SESSION_DURATION_MEDIUM_MAX 600 s
SESSION_DURATION_LONG_MAX 3600 s
EMOTIONAL_VALENCE_CONFIDENCE_CAP 0.50
MIN_OBSERVATIONS_FOR_STATE 3
CATEGORICAL_WINDOW_N 5

DECNET implementation

In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every attacker.session.ended bus event. The worker reads the PTY shard from disk, runs extract_session(), and upserts one ObservationRow per primitive per session. A UniqueConstraint(evidence_ref, primitive) makes re-processing idempotent.

The attribution worker consumes attacker.observation.* bus events and maintains one AttributionStateRow per (identity_uuid, primitive).

Source: decnet/profiler/behave_shell/ (~3 868 lines across 12 files).


See also

  • BEHAVE-TEXT — sibling spec for written-text stylometry and lexicometry
  • Fingerprinting — all DECNET fingerprint layers
  • Identity-Resolution — how observations feed the identity clusterer