This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

BEHAVE-SHELL

BEHAVE-SHELL is a behavioural biometrics specification for interactive shell sessions. It defines a set of attribution primitives — observable, computable signals — that characterise how an operator works at a terminal, independently of what IP address, credential, or tooling they use.

The spec was born out of DECNET's need to correlate attackers across sessions and IP changes, but it is not DECNET-specific. Any system that records PTY sessions can implement BEHAVE-SHELL extraction and feed the resulting primitives into an attribution engine. DECNET is the reference implementation.

A sibling specification, BEHAVE-TEXT, defines equivalent primitives for written text: stylometry, lexicometry, and discourse structure. It's in being worked on in a different project.

Scope

BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The current specification covers three broad domains:

Domain	What it captures
Motor biometrics	Keystroke timing, error correction, paste vs. type habits, shell mastery signals
Cognitive / behavioural	Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure
Stylometry / lexicometry	Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions

The emotional valence cluster (valence, arousal, stress_response, frustration_venting) sits at the boundary of motor and stylometric signal — it measures both typing speed changes and lexical content after stress events.

Design principles

Extraction is pure. The spec defines a function extract_session(events) → Observations that takes an iterable of timestamped PTY events and yields structured observations. No I/O. No database. No side effects. Implementations are free to run this in any context.
PII by design. Command text is never stored in plain form. Only the SHA-256 of the first token is retained. Output is reduced to a byte count and an error verdict. Prompt lines are ANSI-stripped and capped at 256 characters. Raw bigram/unigram counts are used for layout fingerprinting — not the text itself.
Confidence is explicit. Every observation carries a confidence value [0.0–1.0]. Features that are inherently noisier have hard confidence caps (emotional valence: 0.50). Attribution engines must propagate confidence rather than treating all observations as equal.
Skip conditions over imputation. A feature that cannot be computed on a given session (e.g. error_resilience features when no errors occurred) yields no observation rather than a default value. Attribution engines treat absence of an observation differently from an unknown state.

Input format

BEHAVE-SHELL operates on asciinema-compatible event streams: sequences of (t: float, ch: "i"|"o", d: str) tuples representing timestamped input and output chunks from a PTY session. "i" is operator input; "o" is terminal output. Non-UTF-8 bytes are handled via surrogateescape.

The DECNET implementation records these as JSONL shards via sessrec.c:

{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}

Session context derivation

Before feature extraction, a single-pass walk over the event stream builds a SessionContext — a set of derived signals that all feature functions share. The derivation steps, in order:

Step	Output
Paste-burst detection	Groups consecutive paste-class events (≥4 chars, within 200 ms) into `paste_bursts`
Typing-burst segmentation	Splits keystroke stream at think-pauses > 2.0 s into `typing_bursts[][]`; drops bursts < 3 IKIs
Correction signals	Counts backspaces (`0x7f`, `0x08`) and kill-line (`0x15`, `0x17`); records IKI between each backspace and the preceding keystroke
Per-command intra-typing IKIs	For each command, IKIs from that command's span only
Command segmentation	Splits on `\r`/`\n`; per command: `first_token_hash` (SHA-256), tab count, readline shortcut count, pipe count
Inter-command IKI gaps	Time between consecutive commands
Error detection	Scans output for canonical error patterns (`"command not found"`, `"Permission denied"`, `"No such file"`) to set `command.errored`
PS1 prompt detection	Regex for `$`, `#`, `%`, `>` suffix; ANSI-stripped, capped at 256 chars
Keyboard layout fingerprinting	Unigram and bigram histograms from typed letters
Lexical counters	Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive `!` run

Key structures

SessionContext
  sid: str
  t_start, t_end, duration_s: float
  input_events, output_events: tuple[Event]
  iats: tuple[float]                       # inter-keystroke intervals
  paste_bursts: tuple[PasteBurst]
  typing_bursts: tuple[tuple[float]]
  backspace_count, kill_line_count: int
  intra_command_iats: tuple[tuple[float]]
  commands: tuple[Command]
  inter_cmd_iats: tuple[float]
  prompt_lines: tuple[PromptLine]
  typed_unigram_counts, typed_bigram_counts: Mapping[str, int]
  typed_letter_count: int
  obscenity_hits, positive_lex_hits, negative_lex_hits: int
  caps_run_max, bang_run_max: int

Command
  start_ts, end_ts: float
  first_token_hash: str    # SHA-256, first token only
  tab_count, shortcut_count, pipe_count: int
  errored: bool
  output_bytes: int
  followed_by_prompt: bool

PromptLine
  ts: float
  suffix_char: str         # $ # % >
  raw_line: str            # ANSI-stripped, ≤256 chars
  is_root: bool

The 37 primitives

Motor (9)

Motor primitives capture muscle memory and physical interaction patterns. They are among the most stable signals across sessions and across different machines used by the same operator.

`input_modality`

Values: typed | pasted | mixed

Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed → pasted. ≤5 % pasted → typed. Otherwise mixed.

A script kiddie running pre-written one-liners pastes habitually. A seasoned operator types most commands from memory.

`paste_burst_rate`

Values: none | occasional | habitual

Coarser paste-ratio bucketing. ≥50 % → habitual, ≥10 % → occasional.

`keystroke_cadence`

Values: steady | bursty | hunt_and_peck | machine

Median coefficient of variation (CV) of within-burst inter-keystroke intervals (IKIs):

CV	Mean IKI	Label
< 0.30	< 30 ms	`machine` — inhumanly uniform
< 0.45	any	`steady` — trained touch typist
< 0.70	any	`bursty` — thinks between phrases
≥ 0.70	any	`hunt_and_peck`

`motor_stability`

Values: steady | variable | tremor

Fraction of IKIs below 30 ms. ≥20 % → tremor (physiological or tool-simulated). Otherwise CV classifies steady vs variable.

`error_correction`

Values: immediate | deferred | absent | route_around

Timing of backspace relative to the preceding keystroke. Median ≤500 ms → immediate. Median > 500 ms → deferred. No backspaces but kill-line present → route_around (ctrl-u / ctrl-w). Nothing → absent.

`command_chunking`

Values: fluent | fragmented | single_command

Median CV of per-command intra-typing IKIs. < 0.40 → fluent (commands typed as rehearsed phrases).

`shell_mastery.tab_completion`

Values: none | occasional | habitual

Fraction of commands containing ≥1 tab keystroke. 0 → none, < 50 % → occasional, ≥50 % → habitual.

`shell_mastery.shortcut_usage`

Values: none | moderate | heavy

Readline control-byte count per command. < 0.05 → none, < 0.15 → moderate, ≥0.15 → heavy.

`shell_mastery.pipe_chaining_depth`

Values: shallow | moderate | deep

Median pipe count per command. ≤1 → shallow, 2 → moderate, ≥3 → deep.

Cognitive (11)

Cognitive primitives capture decision-making style, planning depth, and how the operator processes feedback.

`inter_command_latency_class`

Median inter-command pause bucketed against calibrated thresholds:

Threshold	Label	Interpretation
≤ 0.30 s	`instant`	Scripted or replay
≤ 1.50 s	`typing_speed`	Commands prepared, typing only
≤ 2.00 s	`deliberate`	Reads output before acting
≤ 8.00 s	`llm_lightweight`	Consulting a fast LLM or notes
≤ 30.00 s	`llm_heavyweight`	Consulting a slow LLM or manual reference
> 30.00 s	`long`	Interrupted or cautious

The llm_* thresholds were calibrated against real sessions of Claude-assisted operators — a novel adversary class BEHAVE-SHELL is explicitly designed to detect.

`command_branch_diversity`

Values: linear_playbook | adaptive_branching | unknown

Unique first-token ratio. < 5 commands → unknown. ≥70 % unique → linear_playbook (following a prepared list). < 70 % → adaptive_branching (iterating on a problem).

`feedback_loop_engagement`

Values: closed_loop | fire_and_forget | unknown

Pearson correlation between per-command output bytes and the following inter-command pause. r > 0.30 → closed_loop (pauses longer when there is more to read). Requires ≥5 triples.

`inter_command_consistency`

Values: metronomic | variable | bimodal

CV of inter-command IKIs. < 0.40 → metronomic (scripts, beacons).

1.50 → bimodal (short commands interleaved with long waits for compiles or downloads).

`cognitive_load`

Values: low | medium | high

Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).

`exploration_style`

Values: methodical | targeted | chaotic

backtrack_rate ≥30 % → chaotic. repetition_rate ≥50 % → targeted.

`planning_depth`

Values: deep | reactive | shallow

Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).

`tool_vocabulary`

Values: narrow | moderate | broad

Distinct first-token count. ≤3 → narrow, ≥10 → broad.

`error_resilience.retry_tactic`

Values: retry_same | pivot | fallback

Post-error behaviour pattern. Skipped if no errors.

`error_resilience.frustration_typing`

Values: low | moderate | high

Delta between median intra-IKI after an error vs. after a success.

`error_resilience.fallback_to_man`

Values: present | absent

After an error, does the next command start with man/help/info?

Temporal (4)

`session_duration`

Values: short | medium | long | marathon

< 60 s / < 600 s / < 3600 s / ≥ 3600 s.

`escalation_pattern`

Values: bursty | sustained

Dynamic window analysis of activity density over the session lifetime.

`landing_ritual`

Values: cleanup | exploration | passive

Intent of the first ~5 commands.

`exit_behavior`

Values: cleanup | standard | anomalous

Intent of the last ~5 commands.

Environmental (5)

Environmental primitives are stable across an operator's career — they change only when the operator switches machines or deliberately retools.

`shell_type`

Values: bash | sh | zsh | fish | unknown

Detected from PS1 prompt regex patterns.

`terminal_multiplexer`

Values: tmux | screen | none

Detected from PS1 markers and escape sequences.

`locale`

Values: en-US | en | other | unknown

Language-specific keywords in prompt lines and error messages.

`keyboard_layout`

Values: qwerty | dvorak | colemak | other

Bigram frequency analysis of the typed character stream. An operator who touch-types on Dvorak produces a statistically distinct bigram distribution that persists even when typing non-English commands — this is a pure stylometric signal derived from motor habit.

`numpad_usage`

Values: occasional | frequent | none

Operational (4)

`objective`

Values: recon | exfil | persistence | lateral | destructive

Token-based intent classification. Majority vote; skipped if < 3 classified tokens.

Example token mappings:

recon: id, whoami, uname, cat, find, ls, ps, netstat
exfil: scp, curl, wget, base64, nc, rsync
persistence: crontab, echo >> ~/.bashrc, systemctl enable
lateral: ssh, xfreerdp, psexec, wmiexec
destructive: rm, shred, dd, mkfs, kill

`opsec_discipline`

Values: careful | learning | careless

Presence of history-disabling tokens and cleanup activity. Both → careful. History only → learning. Neither → careless.

`cleanup_behavior`

Values: thorough | partial | none

Distinct cleanup tokens in the session tail. ≥3 → thorough, 1–2 → partial.

`multi_actor_indicators`

Values: solo | handoff_detected

Splits commands at the session midpoint and compares median intra-IKI of each half. Delta > 50 % with both halves having ≥4 commands → handoff_detected. Suggests the session was shared between two operators (initial access handed to a post-exploitation specialist, or a shared credential).

Emotional valence (4)

These primitives sit at the boundary of motor and stylometric signal. They require ≥80 typed letters and carry a hard confidence cap of 0.50 — they contribute to attribution but cannot dominate it.

`valence`

Values: positive | neutral | negative

Lexical positive/negative token counts. positive requires positive count

(negative + obscenity) with ≥2 positive tokens.

`arousal`

Values: low_calm | medium_engaged | high_agitated

high_agitated if ≥5 consecutive caps, ≥3 consecutive !, or fastest IKI < 60 ms on ≥30 keystrokes.

`stress_response`

Values: none | eustress_positive | distress_negative

Post-error vs baseline typing speed ratio. ≥1.20 → eustress (experienced, types faster under pressure). ≤ 1/1.20 → distress.

`frustration_venting`

Values: low | moderate | high

Post-error frustration token count plus obscenity count. A purely lexicometric signal.

Attribution

BEHAVE-SHELL does not define how observations are aggregated — that is the responsibility of the implementing system's attribution engine. The DECNET reference implementation uses a five-state machine per (identity_uuid, primitive):

State	Condition
`unknown`	< 3 observations
`stable`	Recent N agree, no drift from older N
`drifting`	Recent N agree but differ from older N
`conflicted`	Recent N are split
`multi_actor`	`conflicted` + cross-session alternation

Window N = 5 for categorical primitives. When ≥2 primitives independently reach multi_actor for the same identity, the engine emits a multi_actor_suspected signal — a strong indicator of a shared credential or a compromised operator account.

Calibration

The reference thresholds were calibrated against five behavioural classes across 15 sessions (424 total observations):

Class	Sessions	Observations	Description
`HUMAN`	1	34	Human operator, unassisted
`YOU-sim`	2	59	Human-simulated scripted attacker
`LW-sim`	5	136	Lightweight LLM-assisted operator
`CLAUDE-FF`	3	84	Claude (fast) assisted
`CLAUDE-CL`	4	111	Claude (standard) assisted

All classes emit ≥27 distinct primitives. The inter_command_latency_class LLM buckets are the primary discriminator between unassisted and LLM-assisted operators in single-session analysis; cross-session attribution uses the full primitive set.

Key thresholds (reference implementation)

All constants live in _thresholds.py.

Constant	Value
`PASTE_MIN_CHARS_PER_EVENT`	4
`PASTE_BURST_MAX_IAT_S`	0.20
`IKI_THINK_MAX_S`	2.0 (typing-burst split)
`TREMOR_FAST_FLOOR_S`	0.030
`CV_STEADY_MAX`	0.45
`INTER_CMD_INSTANT_MAX`	0.30 s
`INTER_CMD_LLM_LIGHTWEIGHT_MAX`	8.0 s
`INTER_CMD_LLM_HEAVYWEIGHT_MAX`	30.0 s
`BRANCH_DIVERSITY_LINEAR_MIN`	0.70
`FEEDBACK_CORRELATION_MIN`	0.30
`PAUSE_CV_METRONOMIC_MAX`	0.40
`PAUSE_CV_BIMODAL_MIN`	1.50
`SESSION_DURATION_SHORT_MAX`	60 s
`SESSION_DURATION_MEDIUM_MAX`	600 s
`SESSION_DURATION_LONG_MAX`	3600 s
`EMOTIONAL_VALENCE_CONFIDENCE_CAP`	0.50
`MIN_OBSERVATIONS_FOR_STATE`	3
`CATEGORICAL_WINDOW_N`	5

DECNET implementation

In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every attacker.session.ended bus event. The worker reads the PTY shard from disk, runs extract_session(), and upserts one ObservationRow per primitive per session. A UniqueConstraint(evidence_ref, primitive) makes re-processing idempotent.

The attribution worker consumes attacker.observation.* bus events and maintains one AttributionStateRow per (identity_uuid, primitive).

Source: decnet/profiler/behave_shell/ (~3 868 lines across 12 files).

DECNET

Home

User docs

Developer docs

Project meta

DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl

BEHAVE-SHELL

Scope

Design principles

Input format

Session context derivation

Key structures

The 37 primitives

Motor (9)

input_modality

paste_burst_rate

keystroke_cadence

motor_stability

error_correction

command_chunking

shell_mastery.tab_completion

shell_mastery.shortcut_usage

shell_mastery.pipe_chaining_depth

Cognitive (11)

inter_command_latency_class

command_branch_diversity

feedback_loop_engagement

inter_command_consistency

cognitive_load

exploration_style

planning_depth

tool_vocabulary

error_resilience.retry_tactic

error_resilience.frustration_typing

error_resilience.fallback_to_man