Table of Contents
- BEHAVE-SHELL
- Scope
- Design principles
- Input format
- Session context derivation
- The 37 primitives
- Motor (9)
- input_modality
- paste_burst_rate
- keystroke_cadence
- motor_stability
- error_correction
- command_chunking
- shell_mastery.tab_completion
- shell_mastery.shortcut_usage
- shell_mastery.pipe_chaining_depth
- Cognitive (11)
- inter_command_latency_class
- command_branch_diversity
- feedback_loop_engagement
- inter_command_consistency
- cognitive_load
- exploration_style
- planning_depth
- tool_vocabulary
- error_resilience.retry_tactic
- error_resilience.frustration_typing
- error_resilience.fallback_to_man
- Temporal (4)
- Environmental (5)
- Operational (4)
- Emotional valence (4)
- Attribution
- Calibration
- Key thresholds (reference implementation)
- DECNET implementation
- See also
BEHAVE-SHELL
BEHAVE-SHELL is a behavioural biometrics specification for interactive shell sessions. It defines a set of attribution primitives — observable, computable signals — that characterise how an operator works at a terminal, independently of what IP address, credential, or tooling they use.
The spec was born out of DECNET's need to correlate attackers across sessions and IP changes, but it is not DECNET-specific. Any system that records PTY sessions can implement BEHAVE-SHELL extraction and feed the resulting primitives into an attribution engine. DECNET is the reference implementation.
A sibling specification, BEHAVE-TEXT, defines equivalent primitives for written text: stylometry, lexicometry, and discourse structure. It's in being worked on in a different project.
Scope
BEHAVE-SHELL has grown beyond its original keystroke-dynamics focus. The current specification covers three broad domains:
| Domain | What it captures |
|---|---|
| Motor biometrics | Keystroke timing, error correction, paste vs. type habits, shell mastery signals |
| Cognitive / behavioural | Command planning depth, feedback loop engagement, tool vocabulary, exploration style, response to failure |
| Stylometry / lexicometry | Lexical choices, sentiment, OPSEC vocabulary, keyboard layout fingerprinting from bigram distributions |
The emotional valence cluster (valence, arousal, stress_response,
frustration_venting) sits at the boundary of motor and stylometric signal —
it measures both typing speed changes and lexical content after stress events.
Design principles
-
Extraction is pure. The spec defines a function
extract_session(events) → Observationsthat takes an iterable of timestamped PTY events and yields structured observations. No I/O. No database. No side effects. Implementations are free to run this in any context. -
PII by design. Command text is never stored in plain form. Only the SHA-256 of the first token is retained. Output is reduced to a byte count and an error verdict. Prompt lines are ANSI-stripped and capped at 256 characters. Raw bigram/unigram counts are used for layout fingerprinting — not the text itself.
-
Confidence is explicit. Every observation carries a confidence value [0.0–1.0]. Features that are inherently noisier have hard confidence caps (emotional valence: 0.50). Attribution engines must propagate confidence rather than treating all observations as equal.
-
Skip conditions over imputation. A feature that cannot be computed on a given session (e.g.
error_resiliencefeatures when no errors occurred) yields no observation rather than a default value. Attribution engines treat absence of an observation differently from anunknownstate.
Input format
BEHAVE-SHELL operates on asciinema-compatible event streams: sequences of
(t: float, ch: "i"|"o", d: str) tuples representing timestamped input and
output chunks from a PTY session. "i" is operator input; "o" is terminal
output. Non-UTF-8 bytes are handled via surrogateescape.
The DECNET implementation records these as JSONL shards via sessrec.c:
{"sid": "abc123", "t": 1.234, "ch": "i", "d": "ls -la\r"}
{"sid": "abc123", "t": 1.891, "ch": "o", "d": "total 48\r\n..."}
Session context derivation
Before feature extraction, a single-pass walk over the event stream builds a
SessionContext — a set of derived signals that all feature functions share.
The derivation steps, in order:
| Step | Output |
|---|---|
| Paste-burst detection | Groups consecutive paste-class events (≥4 chars, within 200 ms) into paste_bursts |
| Typing-burst segmentation | Splits keystroke stream at think-pauses > 2.0 s into typing_bursts[][]; drops bursts < 3 IKIs |
| Correction signals | Counts backspaces (0x7f, 0x08) and kill-line (0x15, 0x17); records IKI between each backspace and the preceding keystroke |
| Per-command intra-typing IKIs | For each command, IKIs from that command's span only |
| Command segmentation | Splits on \r/\n; per command: first_token_hash (SHA-256), tab count, readline shortcut count, pipe count |
| Inter-command IKI gaps | Time between consecutive commands |
| Error detection | Scans output for canonical error patterns ("command not found", "Permission denied", "No such file") to set command.errored |
| PS1 prompt detection | Regex for $, #, %, > suffix; ANSI-stripped, capped at 256 chars |
| Keyboard layout fingerprinting | Unigram and bigram histograms from typed letters |
| Lexical counters | Obscenity hits, positive/negative sentiment tokens, max caps run, max consecutive ! run |
Key structures
SessionContext
sid: str
t_start, t_end, duration_s: float
input_events, output_events: tuple[Event]
iats: tuple[float] # inter-keystroke intervals
paste_bursts: tuple[PasteBurst]
typing_bursts: tuple[tuple[float]]
backspace_count, kill_line_count: int
intra_command_iats: tuple[tuple[float]]
commands: tuple[Command]
inter_cmd_iats: tuple[float]
prompt_lines: tuple[PromptLine]
typed_unigram_counts, typed_bigram_counts: Mapping[str, int]
typed_letter_count: int
obscenity_hits, positive_lex_hits, negative_lex_hits: int
caps_run_max, bang_run_max: int
Command
start_ts, end_ts: float
first_token_hash: str # SHA-256, first token only
tab_count, shortcut_count, pipe_count: int
errored: bool
output_bytes: int
followed_by_prompt: bool
PromptLine
ts: float
suffix_char: str # $ # % >
raw_line: str # ANSI-stripped, ≤256 chars
is_root: bool
The 37 primitives
Motor (9)
Motor primitives capture muscle memory and physical interaction patterns. They are among the most stable signals across sessions and across different machines used by the same operator.
input_modality
Values: typed | pasted | mixed
Ratio of paste events to total input events. ≥40 % pasted and ≤5 % typed
→ pasted. ≤5 % pasted → typed. Otherwise mixed.
A script kiddie running pre-written one-liners pastes habitually. A seasoned operator types most commands from memory.
paste_burst_rate
Values: none | occasional | habitual
Coarser paste-ratio bucketing. ≥50 % → habitual, ≥10 % → occasional.
keystroke_cadence
Values: steady | bursty | hunt_and_peck | machine
Median coefficient of variation (CV) of within-burst inter-keystroke intervals (IKIs):
| CV | Mean IKI | Label |
|---|---|---|
| < 0.30 | < 30 ms | machine — inhumanly uniform |
| < 0.45 | any | steady — trained touch typist |
| < 0.70 | any | bursty — thinks between phrases |
| ≥ 0.70 | any | hunt_and_peck |
motor_stability
Values: steady | variable | tremor
Fraction of IKIs below 30 ms. ≥20 % → tremor (physiological or
tool-simulated). Otherwise CV classifies steady vs variable.
error_correction
Values: immediate | deferred | absent | route_around
Timing of backspace relative to the preceding keystroke. Median ≤500 ms
→ immediate. Median > 500 ms → deferred. No backspaces but kill-line
present → route_around (ctrl-u / ctrl-w). Nothing → absent.
command_chunking
Values: fluent | fragmented | single_command
Median CV of per-command intra-typing IKIs. < 0.40 → fluent (commands
typed as rehearsed phrases).
shell_mastery.tab_completion
Values: none | occasional | habitual
Fraction of commands containing ≥1 tab keystroke. 0 → none,
< 50 % → occasional, ≥50 % → habitual.
shell_mastery.shortcut_usage
Values: none | moderate | heavy
Readline control-byte count per command. < 0.05 → none,
< 0.15 → moderate, ≥0.15 → heavy.
shell_mastery.pipe_chaining_depth
Values: shallow | moderate | deep
Median pipe count per command. ≤1 → shallow, 2 → moderate, ≥3 → deep.
Cognitive (11)
Cognitive primitives capture decision-making style, planning depth, and how the operator processes feedback.
inter_command_latency_class
Values: instant | typing_speed | deliberate | llm_lightweight | llm_heavyweight | long
Median inter-command pause bucketed against calibrated thresholds:
| Threshold | Label | Interpretation |
|---|---|---|
| ≤ 0.30 s | instant |
Scripted or replay |
| ≤ 1.50 s | typing_speed |
Commands prepared, typing only |
| ≤ 2.00 s | deliberate |
Reads output before acting |
| ≤ 8.00 s | llm_lightweight |
Consulting a fast LLM or notes |
| ≤ 30.00 s | llm_heavyweight |
Consulting a slow LLM or manual reference |
| > 30.00 s | long |
Interrupted or cautious |
The llm_* thresholds were calibrated against real sessions of Claude-assisted
operators — a novel adversary class BEHAVE-SHELL is explicitly designed to
detect.
command_branch_diversity
Values: linear_playbook | adaptive_branching | unknown
Unique first-token ratio. < 5 commands → unknown. ≥70 % unique →
linear_playbook (following a prepared list). < 70 % →
adaptive_branching (iterating on a problem).
feedback_loop_engagement
Values: closed_loop | fire_and_forget | unknown
Pearson correlation between per-command output bytes and the following
inter-command pause. r > 0.30 → closed_loop (pauses longer when there
is more to read). Requires ≥5 triples.
inter_command_consistency
Values: metronomic | variable | bimodal
CV of inter-command IKIs. < 0.40 → metronomic (scripts, beacons).
1.50 →
bimodal(short commands interleaved with long waits for compiles or downloads).
cognitive_load
Values: low | medium | high
Composite: mean(intra-typing CV / 1.0, error rate, pause CV / 1.5).
exploration_style
Values: methodical | targeted | chaotic
backtrack_rate ≥30 % → chaotic. repetition_rate ≥50 % → targeted.
planning_depth
Values: deep | reactive | shallow
Fraction of inter-command IKIs > 2.0 s (deep) vs ≤ 0.30 s (reactive).
tool_vocabulary
Values: narrow | moderate | broad
Distinct first-token count. ≤3 → narrow, ≥10 → broad.
error_resilience.retry_tactic
Values: retry_same | pivot | fallback
Post-error behaviour pattern. Skipped if no errors.
error_resilience.frustration_typing
Values: low | moderate | high
Delta between median intra-IKI after an error vs. after a success.
error_resilience.fallback_to_man
Values: present | absent
After an error, does the next command start with man/help/info?
Temporal (4)
session_duration
Values: short | medium | long | marathon
< 60 s / < 600 s / < 3600 s / ≥ 3600 s.
escalation_pattern
Values: bursty | sustained
Dynamic window analysis of activity density over the session lifetime.
landing_ritual
Values: cleanup | exploration | passive
Intent of the first ~5 commands.
exit_behavior
Values: cleanup | standard | anomalous
Intent of the last ~5 commands.
Environmental (5)
Environmental primitives are stable across an operator's career — they change only when the operator switches machines or deliberately retools.
shell_type
Values: bash | sh | zsh | fish | unknown
Detected from PS1 prompt regex patterns.
terminal_multiplexer
Values: tmux | screen | none
Detected from PS1 markers and escape sequences.
locale
Values: en-US | en | other | unknown
Language-specific keywords in prompt lines and error messages.
keyboard_layout
Values: qwerty | dvorak | colemak | other
Bigram frequency analysis of the typed character stream. An operator who touch-types on Dvorak produces a statistically distinct bigram distribution that persists even when typing non-English commands — this is a pure stylometric signal derived from motor habit.
numpad_usage
Values: occasional | frequent | none
Operational (4)
objective
Values: recon | exfil | persistence | lateral | destructive
Token-based intent classification. Majority vote; skipped if < 3 classified tokens.
Example token mappings:
recon:id,whoami,uname,cat,find,ls,ps,netstatexfil:scp,curl,wget,base64,nc,rsyncpersistence:crontab,echo >> ~/.bashrc,systemctl enablelateral:ssh,xfreerdp,psexec,wmiexecdestructive:rm,shred,dd,mkfs,kill
opsec_discipline
Values: careful | learning | careless
Presence of history-disabling tokens and cleanup activity. Both →
careful. History only → learning. Neither → careless.
cleanup_behavior
Values: thorough | partial | none
Distinct cleanup tokens in the session tail. ≥3 → thorough,
1–2 → partial.
multi_actor_indicators
Values: solo | handoff_detected
Splits commands at the session midpoint and compares median intra-IKI of
each half. Delta > 50 % with both halves having ≥4 commands →
handoff_detected. Suggests the session was shared between two operators
(initial access handed to a post-exploitation specialist, or a shared
credential).
Emotional valence (4)
These primitives sit at the boundary of motor and stylometric signal. They require ≥80 typed letters and carry a hard confidence cap of 0.50 — they contribute to attribution but cannot dominate it.
valence
Values: positive | neutral | negative
Lexical positive/negative token counts. positive requires positive count
(negative + obscenity) with ≥2 positive tokens.
arousal
Values: low_calm | medium_engaged | high_agitated
high_agitated if ≥5 consecutive caps, ≥3 consecutive !, or fastest
IKI < 60 ms on ≥30 keystrokes.
stress_response
Values: none | eustress_positive | distress_negative
Post-error vs baseline typing speed ratio. ≥1.20 → eustress (experienced,
types faster under pressure). ≤ 1/1.20 → distress.
frustration_venting
Values: low | moderate | high
Post-error frustration token count plus obscenity count. A purely lexicometric signal.
Attribution
BEHAVE-SHELL does not define how observations are aggregated — that is the
responsibility of the implementing system's attribution engine. The DECNET
reference implementation uses a five-state machine per
(identity_uuid, primitive):
| State | Condition |
|---|---|
unknown |
< 3 observations |
stable |
Recent N agree, no drift from older N |
drifting |
Recent N agree but differ from older N |
conflicted |
Recent N are split |
multi_actor |
conflicted + cross-session alternation |
Window N = 5 for categorical primitives. When ≥2 primitives independently
reach multi_actor for the same identity, the engine emits a
multi_actor_suspected signal — a strong indicator of a shared credential
or a compromised operator account.
Calibration
The reference thresholds were calibrated against five behavioural classes across 15 sessions (424 total observations):
| Class | Sessions | Observations | Description |
|---|---|---|---|
HUMAN |
1 | 34 | Human operator, unassisted |
YOU-sim |
2 | 59 | Human-simulated scripted attacker |
LW-sim |
5 | 136 | Lightweight LLM-assisted operator |
CLAUDE-FF |
3 | 84 | Claude (fast) assisted |
CLAUDE-CL |
4 | 111 | Claude (standard) assisted |
All classes emit ≥27 distinct primitives. The inter_command_latency_class
LLM buckets are the primary discriminator between unassisted and
LLM-assisted operators in single-session analysis; cross-session attribution
uses the full primitive set.
Key thresholds (reference implementation)
All constants live in _thresholds.py.
| Constant | Value |
|---|---|
PASTE_MIN_CHARS_PER_EVENT |
4 |
PASTE_BURST_MAX_IAT_S |
0.20 |
IKI_THINK_MAX_S |
2.0 (typing-burst split) |
TREMOR_FAST_FLOOR_S |
0.030 |
CV_STEADY_MAX |
0.45 |
INTER_CMD_INSTANT_MAX |
0.30 s |
INTER_CMD_LLM_LIGHTWEIGHT_MAX |
8.0 s |
INTER_CMD_LLM_HEAVYWEIGHT_MAX |
30.0 s |
BRANCH_DIVERSITY_LINEAR_MIN |
0.70 |
FEEDBACK_CORRELATION_MIN |
0.30 |
PAUSE_CV_METRONOMIC_MAX |
0.40 |
PAUSE_CV_BIMODAL_MIN |
1.50 |
SESSION_DURATION_SHORT_MAX |
60 s |
SESSION_DURATION_MEDIUM_MAX |
600 s |
SESSION_DURATION_LONG_MAX |
3600 s |
EMOTIONAL_VALENCE_CONFIDENCE_CAP |
0.50 |
MIN_OBSERVATIONS_FOR_STATE |
3 |
CATEGORICAL_WINDOW_N |
5 |
DECNET implementation
In DECNET, BEHAVE-SHELL extraction is invoked by the profiler worker on every
attacker.session.ended bus event. The worker reads the PTY shard from disk,
runs extract_session(), and upserts one ObservationRow per primitive per
session. A UniqueConstraint(evidence_ref, primitive) makes re-processing
idempotent.
The attribution worker consumes attacker.observation.* bus events and
maintains one AttributionStateRow per (identity_uuid, primitive).
Source: decnet/profiler/behave_shell/ (~3 868 lines across 12 files).
See also
- BEHAVE-TEXT — sibling spec for written-text stylometry and lexicometry
- Fingerprinting — all DECNET fingerprint layers
- Identity-Resolution — how observations feed the identity clusterer
DECNET
User docs
- Quick-Start
- Installation
- Requirements-and-Python-Versions
- CLI-Reference
- INI-Config-Format
- Custom-Services
- Services-Catalog
- Service-Personas
- Archetypes
- Distro-Profiles
- OS-Fingerprint-Spoofing
- Networking-MACVLAN-IPVLAN
- Deployment-Modes
- SWARM-Mode
- Tailscale-Global-Deployment
- Resource-Footprint
- MazeNET
- Remote-Updates
- Environment-Variables
- Teardown-and-State
- Database-Drivers
- Systemd-Setup
- Logging-and-Syslog
- Fingerprinting
- Service-Bus
- Realism
- Web-Dashboard
- REST-API-Reference
- Mutation-and-Randomization
- Troubleshooting
Developer docs
DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl