BEHAVE/BEHAVE-SHELL/README.md

<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
# behave-shell

[← repo](../README.md)

Shell-session behavioral observation registry. Defines what can be observed
about an operator through their terminal interaction — typing mechanics, cognitive
style, operational patterns, infrastructure fingerprints, and cultural timing signals.

BEHAVE-SHELL does not read command content. It measures *how* someone operates a
terminal, not *what* they type. The observations are categorical labels, numeric
aggregates, and cryptographic hashes — never raw keystrokes or command text.

## Install

```bash
pip install behave-shell
```

For local development (pytest + ruff):

```bash
pip install -e ../core/ -e ".[dev]"
```

## Quickstart

```python
from behave_shell.spec import Observation, Window, TOPIC_PREFIX, event_topic_for

obs = Observation(
    primitive="motor.keystroke_cadence",
    value="bursty",
    confidence=0.87,
    window=Window(start_ts=1714000000.0, end_ts=1714003600.0),
    source="behave/shell-sensor/timing.py",
)
# Serialize to an event bus topic + payload:
topic = event_topic_for("motor.keystroke_cadence")
# → "attacker.observation.shell.motor.keystroke_cadence"
```

## Public API (`behave_shell.spec`)

| Symbol | Description |
|---|---|
| `Observation` | Registry-aware subclass of `behave_core.spec.Observation`. Validates `primitive` against `PRIMITIVE_REGISTRY` and `value` against the primitive's type spec. |
| `Window` | Re-exported from `behave_core` — measurement time window. |
| `ObservationValue` | Re-exported union type for valid value shapes. |
| `PRIMITIVE_REGISTRY` | `dict[str, ValueTypeSpec]` — the full primitive catalog (76 entries). |
| `ValueKind` | Enum: `CATEGORICAL`, `NUMERIC`, `HASH`, `ARRAY`, `FREE_STRING`, `BOOL`. |
| `ValueTypeSpec` | Pydantic model holding a primitive's kind, allowed values, bounds, and notes. |
| `is_known(primitive)` | `bool` — whether a primitive path is registered. |
| `get(primitive)` | Returns the `ValueTypeSpec` for a primitive; raises `KeyError` if unknown. |
| `TOPIC_PREFIX` | `"attacker.observation.shell"` |
| `event_topic_for(primitive)` | Returns the full event bus topic string. |
| `to_event_payload(obs)` | Serializes an `Observation` to a bus-ready `dict`. |
| `from_event_payload(payload)` | Reconstructs an `Observation` from a bus payload. |

## Primitives

76 primitives across 9 categories. Each observation captures one measured value
for one primitive over one time window. A behavioral profile is built by collecting
many observations across many sessions.

---

### `motor.*` — Physical typing mechanics (9 primitives)

Motor primitives capture the physical mechanics of keyboard interaction: rhythm,
precision, and habitual movements that are hard to fake and stable across sessions
even when operators change tools or objectives. These are the closest BEHAVE comes
to biometrics — they exploit the fact that typing style is unconscious and consistent.

| Primitive | Kind | Description |
|---|---|---|
| `motor.keystroke_cadence` | categorical | Overall rhythm of key input. `steady` = metronomic confident typist. `bursty` = fast bursts with thinking pauses. `hunt_and_peck` = search-first-type. `machine` = mechanically regular, suggesting scripted input. |
| `motor.motor_stability` | categorical | Consistency of key hold/flight times. `steady` = low variance. `variable` = high variance (cognitive load or unfamiliar keyboard). `tremor` = rhythmic instability distinct from load-induced variance. |
| `motor.error_correction` | categorical | Response to typing mistakes. `immediate` = backspace within ~1s (automatic monitoring). `deferred` = corrects after reading output. `absent` = proceeds despite errors (scripted behavior). `route_around` = uses history or rewrites rather than backspacing. |
| `motor.command_chunking` | categorical | Flow of command composition. `fluent` = typed in one pass from memory. `fragmented` = chunks with mid-command pauses (composing while typing). `single_command` = one complete command at a time, no inline pipelines. |
| `motor.paste_burst_rate` | categorical | Frequency of large clipboard-paste events relative to typed input. `habitual` = primarily works by pasting pre-prepared blocks. |
| `motor.input_modality` | categorical | Dominant input mode. `typed` = character-by-character. `pasted` = pre-prepared blocks. `mixed` = both substantially. |
| `motor.shell_mastery.tab_completion` | categorical | Tab completion usage. `habitual` = operator relies on it constantly (inferred from short pause then rapid continuation). Strong indicator of shell familiarity. |
| `motor.shell_mastery.shortcut_usage` | categorical | Use of shell shortcuts (Ctrl+R, Ctrl+A/E, Ctrl+L, Alt+.). `heavy` = deep shell muscle memory. |
| `motor.shell_mastery.pipe_chaining_depth` | categorical | Maximum pipeline depth (cmd \| cmd \| cmd). `shallow` = 0-1 pipes. `deep` = 4+. Reflects tool-composition fluency. |

---

### `cognitive.*` — Decision-making and cognition (11 primitives)

Cognitive primitives capture how the operator thinks: their planning style, how they
respond to uncertainty and failure, and whether their timing patterns are consistent
with a human, a script, or an LLM agent. These are among the most attribution-relevant
primitives — they're stable per-operator and hard to sustain as deliberate deception.

| Primitive | Kind | Description |
|---|---|---|
| `cognitive.cognitive_load` | categorical | Inferred mental workload from timing, error rate, and inter-command variance. `high` = long pauses, frequent error-retry cycles, fragmented chunking. Composite feature for downstream attribution. |
| `cognitive.exploration_style` | categorical | Navigation style in unfamiliar environments. `methodical` = systematic enumeration (ls→cat→id→uname). `chaotic` = non-sequential jumps. `targeted` = straight to objective without exploring. |
| `cognitive.planning_depth` | categorical | Whether the operator works from a pre-formed plan. `deep` = visible logical sequence (recon→pivot→exfil). `shallow` = opportunistic. `reactive` = responds only to errors. |
| `cognitive.tool_vocabulary` | categorical | Breadth of tools used. `narrow` = fixed small toolset. `broad` = reaches for the best tool per subtask. |
| `cognitive.inter_command_latency_class` | categorical | Time between commands. `instant` (<200ms), `typing_speed` (200ms-2s), `deliberate` (2s), `llm_lightweight` (2-8s, small model agent), `llm_heavyweight` (8-30s, reasoning-class agent), `long` (>30s, human-supervised LLM). |
| `cognitive.inter_command_consistency` | categorical | Dispersion of inter-command pauses. `metronomic` = LLM-pure. `variable` = human. `bimodal` = LLM-assisted human (LLM-paced bursts + human thinking gaps). |
| `cognitive.command_branch_diversity` | categorical | Content-based script vs. adaptive discriminator. `linear_playbook` = low first-token repetition (each step uses a different tool). `adaptive_branching` = high repetition of the same tool with varying arguments (operator following a thread). |
| `cognitive.feedback_loop_engagement` | categorical | Whether pace correlates with output volume. `closed_loop` = pause grows with preceding output (reading before continuing). `fire_and_forget` = paces independently of output (scripted or unread). Cuts across the LLM/human axis. |
| `cognitive.error_resilience.retry_tactic` | categorical | Response to command failure. `rerun` = identical retry. `modify` = adjusts before retrying. `switch` = tries a different tool. `abort` = gives up on objective. |
| `cognitive.error_resilience.frustration_typing` | categorical | Speed/error spike immediately after failure. `high` = sharp burst post-failure. Strong human indicator; absent in scripts. |
| `cognitive.error_resilience.fallback_to_man` | categorical | Whether the operator invokes `man`/`--help` when stuck. `present` signals unfamiliarity with the specific tool. |

---

### `temporal.*` — Session timing and lifecycle (7 primitives)

When and how long an operator works. These signals are stable per-campaign and
hard to fake consistently across many sessions, because they reflect biological and
social rhythms (sleep, work hours, habits) rather than conscious technical choices.

| Primitive | Kind | Description |
|---|---|---|
| `temporal.session_timing` | categorical | Hour-of-day distribution. `diurnal` = business-hours peaks. `nocturnal` = late-night peaks. `irregular` = no discernible daily pattern. Requires a known timezone from `cultural.*` to interpret. |
| `temporal.session_duration` | categorical | Typical session length. `short` <15min, `medium` 15-90min, `long` 90min-4hr, `marathon` >4hr. Stable per-operator characteristic. |
| `temporal.escalation_pattern` | categorical | Activity intensity across a session. `sustained` = constant rate. `bursty` = concentrated activity then silence (waiting for long-running processes). `erratic` = unpredictable spikes. |
| `temporal.persistence` | categorical | Cross-session return behavior. `hit_and_run` = few sessions then disappears. `return_visitor` = periodic return. `resident` = near-continuous presence. |
| `temporal.lifecycle_markers.landing_ritual` | categorical | Whether a recognizable start-of-session sequence is detected (whoami → id → uname → hostname → ip addr). `present` = fingerprinted checklist habit. |
| `temporal.lifecycle_markers.exit_behavior` | categorical | Session end pattern. `graceful` = explicit logout. `abrupt` = dropped connection. `cleanup` = deletes logs/tools before exiting — strongest opsec signal in this category. |
| `temporal.lifecycle_markers.idle_periodicity` | categorical | Whether in-session idle gaps (>30s) are statistically periodic or random. `periodic` = heartbeat-like — may indicate an LLM polling loop, an automated keepalive, or a human following a timed workflow. |

---

### `operational.*` — Mission and opsec (4 primitives)

Operational primitives are coarser inferences from command patterns — what the
operator is trying to accomplish and how carefully they're hiding their footprint.

| Primitive | Kind | Description |
|---|---|---|
| `operational.opsec_discipline` | categorical | Forensic footprint management. `careful` = history disabled, tools removed, proxy/VPN confirmed. `careless` = no precautions. `learning` = inconsistent and improving mid-campaign. |
| `operational.cleanup_behavior` | categorical | Artifact handling at session end. `thorough` = removes tools, temp files, bash history. `partial` = removes some but misses others. `none` = leaves everything. |
| `operational.objective` | categorical | Inferred mission from command patterns: `recon`, `exfil`, `persistence`, `lateral` (pivoting), `destructive`. |
| `operational.multi_actor_indicators` | categorical | Signs of multiple operators. `handoff_detected` = detectable style break mid-session. `team_coordinated` = multiple signatures interleaved or simultaneous. |

---

### `environmental.*` — Physical and software context (5 primitives)

Environmental primitives describe where the operator works from. Stable per-campaign;
often reveals national origin or infrastructure choices.

| Primitive | Kind | Description |
|---|---|---|
| `environmental.keyboard_layout` | categorical | Inferred layout from characteristic key-sequence errors. An AZERTY-trained typist on QWERTY makes specific substitutions (q↔a, z↔w, m→,) that are statistically distinguishable from random errors. Reliable when error volume is sufficient (>50 errors). |
| `environmental.locale` | free_string | BCP-47 tag (e.g. `en-US`, `pt-BR`). Inferred from layout, cultural timing, and command-line encoding artifacts. Free string — locale is not a closed enum. |
| `environmental.numpad_usage` | categorical | Numeric keypad use inferred from keycode patterns. `detected` signals a desktop keyboard. |
| `environmental.terminal_multiplexer` | categorical | Presence of tmux/screen, inferred from escape sequences (Ctrl+B / Ctrl+A prefixes) and window-switching patterns. |
| `environmental.shell_type` | categorical | Shell environment inferred from syntax (array syntax, quoting style, builtin names). `powershell`/`cmd.exe` immediately flags a Windows-native operator. |

---

### `cultural.*` — Social and biological rhythms (5 primitives)

Cultural primitives exploit the fact that human work patterns are shaped by local
time, religion, and social convention. These signals are hard to sustain as deliberate
deception across a long campaign because they reflect unconscious biological rhythms.

| Primitive | Kind | Description |
|---|---|---|
| `cultural.meal_break_gaps` | categorical | Whether activity gaps align with regional meal times (`morning`, `midday`, `evening`, `late_night`). Requires a known timezone to interpret. |
| `cultural.periodic_micro_pauses` | categorical | Short rhythmic pauses of 5-15 min recurring at consistent intervals. May correspond to Salah prayer times (5 daily, spaced ~2-3hr), smoke breaks, or other cultural micro-rituals. `regular_intervals_detected` rejects the null hypothesis of random pauses at p<0.05. |
| `cultural.dst_behavior` | categorical | Whether the operator's active hours shift by 1 hour at DST transitions. `shifts_with_dst` = follows local civil time. `anchored_to_utc` = schedule is clock-fixed (automated infrastructure or deliberate counter-analysis). |
| `cultural.weekend_cadence` | categorical | Which two-day block is low-activity. `fri_sat` = Middle Eastern/Israeli pattern. `sat_sun` = Western/East Asian. Reliable national-origin signal across multiple weeks. |
| `cultural.holiday_gaps` | categorical | Whether multi-day inactivity gaps align with public holiday calendars. Requires a multi-session corpus spanning calendar events. |

---

### `emotional_valence.*` — Affective state (4 primitives)

Emotional valence primitives infer affective state from **typing dynamics** — pace,
error rate, and key-input aggression. BEHAVE-SHELL is content-blind; these
observations are derived entirely from timing and motor signals, not from what
was typed.

| Primitive | Kind | Description |
|---|---|---|
| `emotional_valence.valence` | categorical | Overall affective tone: `positive` (fluent, low-error), `neutral`, `negative` (error-heavy, erratic). Coarse aggregate; see `arousal` and `stress_response` for finer breakdown. |
| `emotional_valence.arousal` | categorical | Activation level. `low_calm` = slow deliberate pace. `high_agitated` = fast error-prone bursts. Orthogonal to valence — a calm script and a calm professional are both `low_calm`. |
| `emotional_valence.stress_response` | categorical | Whether high arousal is positive (`eustress_positive` = speed-up with low error rate, operator in the zone) or negative (`distress_negative` = speed-up with rising errors, panic). |
| `emotional_valence.frustration_venting` | categorical | Transient outburst signal: sudden speed spike or rapid backspace/delete bursts after command failures. Absent in scripted runs; strong human indicator. |

---

### `toolchain.*` — Infrastructure fingerprints (19 primitives)

Toolchain primitives fingerprint the software stack the operator uses, from TLS
handshake parameters to SSH key exchange preferences to C2 beaconing behavior.
Even fully encrypted traffic leaves structural fingerprints that identify specific
tools, libraries, and operator configurations.

#### `toolchain.tls.*` — TLS fingerprints (6)

TLS fingerprints identify the client and server stacks by their handshake parameters.
Each tool, library, and OS produces recognizable fingerprints even when the payload
is encrypted.

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.tls.ja3_client` | hash | MD5 hash of TLS ClientHello parameters (SSLVersion, Ciphers, Extensions, EllipticCurves, EllipticCurvePointFormats). Salesforce, 2017. Each tool stack (curl, Metasploit, Cobalt Strike) produces a distinct hash. Searchable against databases like ja3er.com. |
| `toolchain.tls.ja3s_server` | hash | MD5 hash of TLS ServerHello (SSLVersion, Cipher, Extensions). Fingerprints the server stack — useful for identifying C2 servers by TLS response even when IPs rotate. |
| `toolchain.tls.ja4_client` | hash | FoxIO JA4 (2023): human-readable format (e.g. `t13d1516h2_8daaf6152771_e5627efa2ab1`) robust to TLS extension order randomization. Encodes TLS version, cipher count, extension count, ALPN, cipher hash, extension hash. Preferred over JA3 for new sensors. |
| `toolchain.tls.ja4s_server` | hash | JA4 server-side: fingerprints ServerHello using chosen cipher, extension list, and ALPN. More stable than JA3S when cipher ordering is randomized server-side. |
| `toolchain.tls.jarm_server` | hash | 62-char JARM hash (Salesforce, 2020). Actively probes the server with 10 crafted ClientHellos and hashes the responses. Reliably detects Cobalt Strike, Metasploit, and major C2 frameworks even with custom certificates. |
| `toolchain.tls.tls_cert_simhash` | hash | SHA-256 hex of the leaf certificate DER bytes. Tracks a specific certificate across infrastructure — useful for correlating C2 that reuses self-signed certs. |

#### `toolchain.transport.*` — Network stack fingerprints (3)

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.transport.tcp_stack` | free_string | p0f OS label (e.g. `Linux 5.x`). Inferred from TCP header quirks (TTL, window size, options order, DF bit). Identifies the connecting OS before any application protocol is visible. |
| `toolchain.transport.h2_akamai_fingerprint` | free_string | HTTP/2 SETTINGS + priority + pseudo-header order hash. Different HTTP/2 libraries emit distinct SETTINGS combinations (curl vs. Python requests vs. Go net/http). `status: planned` |
| `toolchain.transport.quic_client` | free_string | QUIC initial packet fingerprint from transport parameters and connection ID length. `status: planned` |

#### `toolchain.ssh.*` — SSH fingerprints (4)

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.ssh.hassh_client` | hash | MD5 hash of SSH client KEX parameters (kex_algorithms, encryption_algorithms, mac_algorithms, compression_algorithms). Salesforce, 2018. Each SSH library (OpenSSH, PuTTY, Paramiko, Impacket) produces a distinct HASSH. |
| `toolchain.ssh.hassh_server` | hash | MD5 hash of SSH server KEX parameters. Fingerprints the SSH daemon — detects honeypots, implants, or non-standard servers. `status: partial` |
| `toolchain.ssh.ssh_client_banner` | free_string | RFC 4253 protocol version string (e.g. `SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6`). Often unmodified even in offensive tooling. |
| `toolchain.ssh.kex_algorithm_order` | array[free_string] | Ordered KEX algorithm list from the SSH ClientHello. Different clients (OpenSSH, PuTTY, Impacket smbexec) advertise distinct orderings — secondary fingerprint beyond HASSH. |

#### `toolchain.http.*` — HTTP fingerprints (3)

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.http.user_agent_tool_class` | categorical | Tool class from User-Agent and HTTP behavior. Known offensive tools use default or absent User-Agents. Values: `nmap_nse`, `sqlmap`, `nuclei`, `masscan`, `curl`, `metasploit`, `ffuf`, `gobuster`, `feroxbuster`, `nikto`, `wpscan`, `evilwinrm`, `impacket`, `unknown`. |
| `toolchain.http.header_order_fingerprint` | free_string | Hash of HTTP request header name order. Different libraries emit distinct sequences. `status: planned` |
| `toolchain.http.body_oddities` | array[free_string] | Anomalous body characteristics (e.g. `multipart_boundary_static`, `json_key_order_fixed`). `status: planned` |

#### `toolchain.c2.*` — C2 beaconing (6)

C2 primitives characterize implant beaconing behavior. Even fully encrypted C2
traffic leaves timing and structural fingerprints.

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.c2.beacon_family` | categorical | C2 framework identified from traffic fingerprints: `cobalt_strike`, `sliver`, `havoc`, `mythic`, `merlin` *(planned)*, `brc4` *(planned)*, `nighthawk` *(planned)*, `unknown`. |
| `toolchain.c2.beacon_interval_ms` | numeric | Median IAT between callbacks, in milliseconds. Cobalt Strike default is 60000ms. Very short intervals (<1000ms) suggest an interactive shell, not a beacon. |
| `toolchain.c2.beacon_jitter_cv` | numeric | Coefficient of variation (std/mean) of beacon IATs. Higher CV = more randomized jitter. Cobalt Strike default jitter is 0% (CV≈0); operators who understand detection set it to 20-50%. |
| `toolchain.c2.sleep_skew` | categorical | Jitter type applied to sleep intervals. `none` = fixed (detectable). `gaussian` = normally distributed. `uniform` = flat random range. `walk` = random-walk drift. `status: partial` |
| `toolchain.c2.c2_callback_endpoint` | free_string | URL or `host:port` of the C2 callback endpoint. |
| `toolchain.c2.attack_software_id` | free_string | MITRE ATT&CK Software ID (e.g. `S0154` for Cobalt Strike). |

#### `toolchain.protocol_abuse.*` — Protocol abuse (6)

Non-standard or offensive use of standard protocols.

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.protocol_abuse.dns_exfil_tool` | categorical | DNS tunneling tool. `iodine` = base32-encoded data in subdomains with TYPE NULL queries. `dnscat2` = TYPE TXT queries with specific entropy patterns. `custom_high_entropy` = tunneling-consistent but no known-tool match. `status: planned` |
| `toolchain.protocol_abuse.smb_dialect` | categorical | SMB dialect negotiated by the client. SMB1 in 2024+ is a strong indicator of legacy tooling or deliberate EternalBlue-era downgrade. `status: planned` |
| `toolchain.protocol_abuse.kerberos_etype_offer` | hash | Hash of the Kerberos AS-REQ etype list. Clients offering RC4-HMAC (etype 23) alongside modern etypes are candidates for Kerberoasting (Rubeus, Impacket GetUserSPNs). `status: planned` |
| `toolchain.protocol_abuse.ldap_bind_pattern` | categorical | LDAP bind mechanism. `simple` = cleartext (immediately suspicious). `sasl_gssapi` = Kerberos-backed (normal). `ntlm`, `ntlmssp_v1`, `responder_like` = NTLM and Responder-class MITM. `status: partial` |
| `toolchain.protocol_abuse.responder_signature` | free_string | Responder detection. Convention: `'false'` or `'true:llmnr'` / `'true:nbtns'` / `'true:mdns'`. Responder poisons LLMNR/NBNS/mDNS broadcasts to capture Net-NTLMv2 hashes. `status: planned` |
| `toolchain.protocol_abuse.mitm6_signature` | bool | Whether mitm6 activity is detected. mitm6 abuses IPv6 router advertisement on IPv4-only networks to hijack DNS and enable credential relay attacks. `status: planned` |

#### `toolchain.payload.*` — Payload analysis (3)

| Primitive | Kind | Description |
|---|---|---|
| `toolchain.payload.payload_simhash` | hash | 64-bit SimHash of the payload binary/shellcode. Preserves near-duplicate relationships: payloads that are 90% similar have low Hamming distance (<4 bits on 64-bit), enabling family clustering despite minor obfuscation. 16-char hex. |
| `toolchain.payload.payload_entropy_class` | categorical | Shannon entropy of payload bytes. `packed` >7.2 bits/byte (UPX, encrypted shellcode, base64-compressed). `high` 6.5-7.2 (unencrypted compiled code). `low` <5.5 (scripts, plaintext). `status: planned` |
| `toolchain.payload.loader_family` | categorical | Shellcode/loader family from structural signatures. `donut` = Donut framework (TheWover), converts .NET/PE to PIC shellcode. `sgn` = Shikata-Ga-Nai XOR encoder (Metasploit), recognizable feedback register pattern. `pe2sh` = PE-to-shellcode. `nimcrypt` = Nim-based loader with AES-encrypted payload. `status: planned` |

---

## Schema

Machine-readable JSON Schema for the observation envelope:
[`json/observation.schema.json`](json/observation.schema.json)

Regenerate after model changes:
```bash
python scripts/generate_schema.py
```

## Tests

```bash
pytest tests/
```

## Attribution recipes

[`attribution-recipes.md`](attribution-recipes.md) — out-of-scope reference document
describing how an external attribution engine might consume `attacker.observation.shell.*`
topics to build operator profiles. Not part of the BEHAVE spec.

## License

Code and schemas: [GPL-3.0-or-later](../LICENSE)
Spec prose (this file, attribution-recipes.md): [CC-BY-SA-4.0](../LICENSE.docs)