Files
BEHAVE/BEHAVE-SHELL/README.md
anti 7f585027b3 docs: per-package READMEs with full primitive catalog and registry notes backfill
- core/README.md: envelope contract, field table, PII discipline, quickstart
- BEHAVE-SHELL/README.md: all 76 primitives documented across 9 categories;
  TLS/SSH/C2 fingerprint sections with [DRAFT — verify] markers on uncertain entries
- BEHAVE-TEXT/README.md: all 35 primitives across 6 categories; Rutify calibration
  notes inline; content.* layer marked EXPERIMENTAL throughout
- primitives.py (SHELL): backfilled notes for all previously undocumented primitives
- primitives.py (TEXT): backfilled notes for capitalization_habit, emoji_*, length,
  linebreak_style, sentence_complexity_class, question_formation_style,
  imperative_style, response_latency_class, message_burst_rate

License: CC-BY-SA-4.0 (prose) / GPL-3.0-or-later (code)
2026-05-10 08:33:02 -04:00

24 KiB

behave-shell

← repo

Shell-session behavioral observation registry. Defines what can be observed about an operator through their terminal interaction — typing mechanics, cognitive style, operational patterns, infrastructure fingerprints, and cultural timing signals.

BEHAVE-SHELL does not read command content. It measures how someone operates a terminal, not what they type. The observations are categorical labels, numeric aggregates, and cryptographic hashes — never raw keystrokes or command text.

Install

pip install -e ../core/ -e .
# development (pytest + ruff):
pip install -e ../core/ -e ".[dev]"

Quickstart

from behave_shell.spec import Observation, Window, TOPIC_PREFIX, event_topic_for

obs = Observation(
    primitive="motor.keystroke_cadence",
    value="bursty",
    confidence=0.87,
    window=Window(start_ts=1714000000.0, end_ts=1714003600.0),
    source="behave/shell-sensor/timing.py",
)
# Serialize to an event bus topic + payload:
topic = event_topic_for("motor.keystroke_cadence")
# → "attacker.observation.shell.motor.keystroke_cadence"

Public API (behave_shell.spec)

Symbol Description
Observation Registry-aware subclass of behave_core.spec.Observation. Validates primitive against PRIMITIVE_REGISTRY and value against the primitive's type spec.
Window Re-exported from behave_core — measurement time window.
ObservationValue Re-exported union type for valid value shapes.
PRIMITIVE_REGISTRY dict[str, ValueTypeSpec] — the full primitive catalog (76 entries).
ValueKind Enum: CATEGORICAL, NUMERIC, HASH, ARRAY, FREE_STRING, BOOL.
ValueTypeSpec Pydantic model holding a primitive's kind, allowed values, bounds, and notes.
is_known(primitive) bool — whether a primitive path is registered.
get(primitive) Returns the ValueTypeSpec for a primitive; raises KeyError if unknown.
TOPIC_PREFIX "attacker.observation.shell"
event_topic_for(primitive) Returns the full event bus topic string.
to_event_payload(obs) Serializes an Observation to a bus-ready dict.
from_event_payload(payload) Reconstructs an Observation from a bus payload.

Primitives

76 primitives across 9 categories. Each observation captures one measured value for one primitive over one time window. A behavioral profile is built by collecting many observations across many sessions.


motor.* — Physical typing mechanics (9 primitives)

Motor primitives capture the physical mechanics of keyboard interaction: rhythm, precision, and habitual movements that are hard to fake and stable across sessions even when operators change tools or objectives. These are the closest BEHAVE comes to biometrics — they exploit the fact that typing style is unconscious and consistent.

Primitive Kind Description
motor.keystroke_cadence categorical Overall rhythm of key input. steady = metronomic confident typist. bursty = fast bursts with thinking pauses. hunt_and_peck = search-first-type. machine = mechanically regular, suggesting scripted input.
motor.motor_stability categorical Consistency of key hold/flight times. steady = low variance. variable = high variance (cognitive load or unfamiliar keyboard). tremor = rhythmic instability distinct from load-induced variance.
motor.error_correction categorical Response to typing mistakes. immediate = backspace within ~1s (automatic monitoring). deferred = corrects after reading output. absent = proceeds despite errors (scripted behavior). route_around = uses history or rewrites rather than backspacing.
motor.command_chunking categorical Flow of command composition. fluent = typed in one pass from memory. fragmented = chunks with mid-command pauses (composing while typing). single_command = one complete command at a time, no inline pipelines.
motor.paste_burst_rate categorical Frequency of large clipboard-paste events relative to typed input. habitual = primarily works by pasting pre-prepared blocks.
motor.input_modality categorical Dominant input mode. typed = character-by-character. pasted = pre-prepared blocks. mixed = both substantially.
motor.shell_mastery.tab_completion categorical Tab completion usage. habitual = operator relies on it constantly (inferred from short pause then rapid continuation). Strong indicator of shell familiarity.
motor.shell_mastery.shortcut_usage categorical Use of shell shortcuts (Ctrl+R, Ctrl+A/E, Ctrl+L, Alt+.). heavy = deep shell muscle memory.
motor.shell_mastery.pipe_chaining_depth categorical Maximum pipeline depth (cmd | cmd | cmd). shallow = 0-1 pipes. deep = 4+. Reflects tool-composition fluency.

cognitive.* — Decision-making and cognition (11 primitives)

Cognitive primitives capture how the operator thinks: their planning style, how they respond to uncertainty and failure, and whether their timing patterns are consistent with a human, a script, or an LLM agent. These are among the most attribution-relevant primitives — they're stable per-operator and hard to sustain as deliberate deception.

Primitive Kind Description
cognitive.cognitive_load categorical Inferred mental workload from timing, error rate, and inter-command variance. high = long pauses, frequent error-retry cycles, fragmented chunking. Composite feature for downstream attribution.
cognitive.exploration_style categorical Navigation style in unfamiliar environments. methodical = systematic enumeration (ls→cat→id→uname). chaotic = non-sequential jumps. targeted = straight to objective without exploring.
cognitive.planning_depth categorical Whether the operator works from a pre-formed plan. deep = visible logical sequence (recon→pivot→exfil). shallow = opportunistic. reactive = responds only to errors.
cognitive.tool_vocabulary categorical Breadth of tools used. narrow = fixed small toolset. broad = reaches for the best tool per subtask.
cognitive.inter_command_latency_class categorical Time between commands. instant (<200ms), typing_speed (200ms-2s), deliberate (2s), llm_lightweight (2-8s, small model agent), llm_heavyweight (8-30s, reasoning-class agent), long (>30s, human-supervised LLM).
cognitive.inter_command_consistency categorical Dispersion of inter-command pauses. metronomic = LLM-pure. variable = human. bimodal = LLM-assisted human (LLM-paced bursts + human thinking gaps).
cognitive.command_branch_diversity categorical Content-based script vs. adaptive discriminator. linear_playbook = low first-token repetition (each step uses a different tool). adaptive_branching = high repetition of the same tool with varying arguments (operator following a thread).
cognitive.feedback_loop_engagement categorical Whether pace correlates with output volume. closed_loop = pause grows with preceding output (reading before continuing). fire_and_forget = paces independently of output (scripted or unread). Cuts across the LLM/human axis.
cognitive.error_resilience.retry_tactic categorical Response to command failure. rerun = identical retry. modify = adjusts before retrying. switch = tries a different tool. abort = gives up on objective.
cognitive.error_resilience.frustration_typing categorical Speed/error spike immediately after failure. high = sharp burst post-failure. Strong human indicator; absent in scripts.
cognitive.error_resilience.fallback_to_man categorical Whether the operator invokes man/--help when stuck. present signals unfamiliarity with the specific tool.

temporal.* — Session timing and lifecycle (7 primitives)

When and how long an operator works. These signals are stable per-campaign and hard to fake consistently across many sessions, because they reflect biological and social rhythms (sleep, work hours, habits) rather than conscious technical choices.

Primitive Kind Description
temporal.session_timing categorical Hour-of-day distribution. diurnal = business-hours peaks. nocturnal = late-night peaks. irregular = no discernible daily pattern. Requires a known timezone from cultural.* to interpret.
temporal.session_duration categorical Typical session length. short <15min, medium 15-90min, long 90min-4hr, marathon >4hr. Stable per-operator characteristic.
temporal.escalation_pattern categorical Activity intensity across a session. sustained = constant rate. bursty = concentrated activity then silence (waiting for long-running processes). erratic = unpredictable spikes.
temporal.persistence categorical Cross-session return behavior. hit_and_run = few sessions then disappears. return_visitor = periodic return. resident = near-continuous presence.
temporal.lifecycle_markers.landing_ritual categorical Whether a recognizable start-of-session sequence is detected (whoami → id → uname → hostname → ip addr). present = fingerprinted checklist habit.
temporal.lifecycle_markers.exit_behavior categorical Session end pattern. graceful = explicit logout. abrupt = dropped connection. cleanup = deletes logs/tools before exiting — strongest opsec signal in this category.
temporal.lifecycle_markers.idle_periodicity categorical Whether in-session idle gaps (>30s) are statistically periodic or random. periodic = heartbeat-like — may indicate an LLM polling loop, an automated keepalive, or a human following a timed workflow.

operational.* — Mission and opsec (4 primitives)

Operational primitives are coarser inferences from command patterns — what the operator is trying to accomplish and how carefully they're hiding their footprint.

Primitive Kind Description
operational.opsec_discipline categorical Forensic footprint management. careful = history disabled, tools removed, proxy/VPN confirmed. careless = no precautions. learning = inconsistent and improving mid-campaign.
operational.cleanup_behavior categorical Artifact handling at session end. thorough = removes tools, temp files, bash history. partial = removes some but misses others. none = leaves everything.
operational.objective categorical Inferred mission from command patterns: recon, exfil, persistence, lateral (pivoting), destructive.
operational.multi_actor_indicators categorical Signs of multiple operators. handoff_detected = detectable style break mid-session. team_coordinated = multiple signatures interleaved or simultaneous.

environmental.* — Physical and software context (5 primitives)

Environmental primitives describe where the operator works from. Stable per-campaign; often reveals national origin or infrastructure choices.

Primitive Kind Description
environmental.keyboard_layout categorical Inferred layout from characteristic key-sequence errors. An AZERTY-trained typist on QWERTY makes specific substitutions (q↔a, z↔w, m→,) that are statistically distinguishable from random errors. Reliable when error volume is sufficient (>50 errors).
environmental.locale free_string BCP-47 tag (e.g. en-US, pt-BR). Inferred from layout, cultural timing, and command-line encoding artifacts. Free string — locale is not a closed enum.
environmental.numpad_usage categorical Numeric keypad use inferred from keycode patterns. detected signals a desktop keyboard.
environmental.terminal_multiplexer categorical Presence of tmux/screen, inferred from escape sequences (Ctrl+B / Ctrl+A prefixes) and window-switching patterns.
environmental.shell_type categorical Shell environment inferred from syntax (array syntax, quoting style, builtin names). powershell/cmd.exe immediately flags a Windows-native operator.

cultural.* — Social and biological rhythms (5 primitives)

Cultural primitives exploit the fact that human work patterns are shaped by local time, religion, and social convention. These signals are hard to sustain as deliberate deception across a long campaign because they reflect unconscious biological rhythms.

Primitive Kind Description
cultural.meal_break_gaps categorical Whether activity gaps align with regional meal times (morning, midday, evening, late_night). Requires a known timezone to interpret.
cultural.periodic_micro_pauses categorical Short rhythmic pauses of 5-15 min recurring at consistent intervals. May correspond to Salah prayer times (5 daily, spaced ~2-3hr), smoke breaks, or other cultural micro-rituals. regular_intervals_detected rejects the null hypothesis of random pauses at p<0.05.
cultural.dst_behavior categorical Whether the operator's active hours shift by 1 hour at DST transitions. shifts_with_dst = follows local civil time. anchored_to_utc = schedule is clock-fixed (automated infrastructure or deliberate counter-analysis).
cultural.weekend_cadence categorical Which two-day block is low-activity. fri_sat = Middle Eastern/Israeli pattern. sat_sun = Western/East Asian. Reliable national-origin signal across multiple weeks.
cultural.holiday_gaps categorical Whether multi-day inactivity gaps align with public holiday calendars. Requires a multi-session corpus spanning calendar events.

emotional_valence.* — Affective state (4 primitives)

Emotional valence primitives infer affective state from typing dynamics — pace, error rate, and key-input aggression. BEHAVE-SHELL is content-blind; these observations are derived entirely from timing and motor signals, not from what was typed.

Primitive Kind Description
emotional_valence.valence categorical Overall affective tone: positive (fluent, low-error), neutral, negative (error-heavy, erratic). Coarse aggregate; see arousal and stress_response for finer breakdown.
emotional_valence.arousal categorical Activation level. low_calm = slow deliberate pace. high_agitated = fast error-prone bursts. Orthogonal to valence — a calm script and a calm professional are both low_calm.
emotional_valence.stress_response categorical Whether high arousal is positive (eustress_positive = speed-up with low error rate, operator in the zone) or negative (distress_negative = speed-up with rising errors, panic).
emotional_valence.frustration_venting categorical Transient outburst signal: sudden speed spike or rapid backspace/delete bursts after command failures. Absent in scripted runs; strong human indicator.

toolchain.* — Infrastructure fingerprints (19 primitives)

Toolchain primitives fingerprint the software stack the operator uses, from TLS handshake parameters to SSH key exchange preferences to C2 beaconing behavior. Even fully encrypted traffic leaves structural fingerprints that identify specific tools, libraries, and operator configurations.

toolchain.tls.* — TLS fingerprints (6)

TLS fingerprints identify the client and server stacks by their handshake parameters. Each tool, library, and OS produces recognizable fingerprints even when the payload is encrypted.

Primitive Kind Description
toolchain.tls.ja3_client hash MD5 hash of TLS ClientHello parameters (SSLVersion, Ciphers, Extensions, EllipticCurves, EllipticCurvePointFormats). Salesforce, 2017. Each tool stack (curl, Metasploit, Cobalt Strike) produces a distinct hash. Searchable against databases like ja3er.com. [DRAFT — verify]
toolchain.tls.ja3s_server hash MD5 hash of TLS ServerHello (SSLVersion, Cipher, Extensions). Fingerprints the server stack — useful for identifying C2 servers by TLS response even when IPs rotate. [DRAFT — verify]
toolchain.tls.ja4_client hash FoxIO JA4 (2023): human-readable format (e.g. t13d1516h2_8daaf6152771_e5627efa2ab1) robust to TLS extension order randomization. Encodes TLS version, cipher count, extension count, ALPN, cipher hash, extension hash. Preferred over JA3 for new sensors. [DRAFT — verify]
toolchain.tls.ja4s_server hash JA4 server-side: fingerprints ServerHello using chosen cipher, extension list, and ALPN. More stable than JA3S when cipher ordering is randomized server-side. [DRAFT — verify]
toolchain.tls.jarm_server hash 62-char JARM hash (Salesforce, 2020). Actively probes the server with 10 crafted ClientHellos and hashes the responses. Reliably detects Cobalt Strike, Metasploit, and major C2 frameworks even with custom certificates.
toolchain.tls.tls_cert_simhash hash SHA-256 hex of the leaf certificate DER bytes. Tracks a specific certificate across infrastructure — useful for correlating C2 that reuses self-signed certs.

toolchain.transport.* — Network stack fingerprints (3)

Primitive Kind Description
toolchain.transport.tcp_stack free_string p0f OS label (e.g. Linux 5.x). Inferred from TCP header quirks (TTL, window size, options order, DF bit). Identifies the connecting OS before any application protocol is visible.
toolchain.transport.h2_akamai_fingerprint free_string HTTP/2 SETTINGS + priority + pseudo-header order hash. Different HTTP/2 libraries emit distinct SETTINGS combinations (curl vs. Python requests vs. Go net/http). status: planned
toolchain.transport.quic_client free_string QUIC initial packet fingerprint from transport parameters and connection ID length. status: planned

toolchain.ssh.* — SSH fingerprints (4)

Primitive Kind Description
toolchain.ssh.hassh_client hash MD5 hash of SSH client KEX parameters (kex_algorithms, encryption_algorithms, mac_algorithms, compression_algorithms). Salesforce, 2018. Each SSH library (OpenSSH, PuTTY, Paramiko, Impacket) produces a distinct HASSH.
toolchain.ssh.hassh_server hash MD5 hash of SSH server KEX parameters. Fingerprints the SSH daemon — detects honeypots, implants, or non-standard servers. status: partial
toolchain.ssh.ssh_client_banner free_string RFC 4253 protocol version string (e.g. SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6). Often unmodified even in offensive tooling.
toolchain.ssh.kex_algorithm_order array[free_string] Ordered KEX algorithm list from the SSH ClientHello. Different clients (OpenSSH, PuTTY, Impacket smbexec) advertise distinct orderings — secondary fingerprint beyond HASSH. [DRAFT — verify]

toolchain.http.* — HTTP fingerprints (3)

Primitive Kind Description
toolchain.http.user_agent_tool_class categorical Tool class from User-Agent and HTTP behavior. Known offensive tools use default or absent User-Agents. Values: nmap_nse, sqlmap, nuclei, masscan, curl, metasploit, ffuf, gobuster, feroxbuster, nikto, wpscan, evilwinrm, impacket, unknown.
toolchain.http.header_order_fingerprint free_string Hash of HTTP request header name order. Different libraries emit distinct sequences. status: planned
toolchain.http.body_oddities array[free_string] Anomalous body characteristics (e.g. multipart_boundary_static, json_key_order_fixed). status: planned

toolchain.c2.* — C2 beaconing (6)

C2 primitives characterize implant beaconing behavior. Even fully encrypted C2 traffic leaves timing and structural fingerprints.

Primitive Kind Description
toolchain.c2.beacon_family categorical C2 framework identified from traffic fingerprints: cobalt_strike, sliver, havoc, mythic, merlin (planned), brc4 (planned), nighthawk (planned), unknown.
toolchain.c2.beacon_interval_ms numeric Median IAT between callbacks, in milliseconds. Cobalt Strike default is 60000ms. Very short intervals (<1000ms) suggest an interactive shell, not a beacon.
toolchain.c2.beacon_jitter_cv numeric Coefficient of variation (std/mean) of beacon IATs. Higher CV = more randomized jitter. Cobalt Strike default jitter is 0% (CV≈0); operators who understand detection set it to 20-50%.
toolchain.c2.sleep_skew categorical Jitter type applied to sleep intervals. none = fixed (detectable). gaussian = normally distributed. uniform = flat random range. walk = random-walk drift. status: partial
toolchain.c2.c2_callback_endpoint free_string URL or host:port of the C2 callback endpoint.
toolchain.c2.attack_software_id free_string MITRE ATT&CK Software ID (e.g. S0154 for Cobalt Strike).

toolchain.protocol_abuse.* — Protocol abuse (6)

Non-standard or offensive use of standard protocols.

Primitive Kind Description
toolchain.protocol_abuse.dns_exfil_tool categorical DNS tunneling tool. iodine = base32-encoded data in subdomains with TYPE NULL queries. dnscat2 = TYPE TXT queries with specific entropy patterns. custom_high_entropy = tunneling-consistent but no known-tool match. status: planned
toolchain.protocol_abuse.smb_dialect categorical SMB dialect negotiated by the client. SMB1 in 2024+ is a strong indicator of legacy tooling or deliberate EternalBlue-era downgrade. status: planned
toolchain.protocol_abuse.kerberos_etype_offer hash Hash of the Kerberos AS-REQ etype list. Clients offering RC4-HMAC (etype 23) alongside modern etypes are candidates for Kerberoasting (Rubeus, Impacket GetUserSPNs). status: planned [DRAFT — verify]
toolchain.protocol_abuse.ldap_bind_pattern categorical LDAP bind mechanism. simple = cleartext (immediately suspicious). sasl_gssapi = Kerberos-backed (normal). ntlm, ntlmssp_v1, responder_like = NTLM and Responder-class MITM. status: partial
toolchain.protocol_abuse.responder_signature free_string Responder detection. Convention: 'false' or 'true:llmnr' / 'true:nbtns' / 'true:mdns'. Responder poisons LLMNR/NBNS/mDNS broadcasts to capture Net-NTLMv2 hashes. status: planned
toolchain.protocol_abuse.mitm6_signature bool Whether mitm6 activity is detected. mitm6 abuses IPv6 router advertisement on IPv4-only networks to hijack DNS and enable credential relay attacks. status: planned

toolchain.payload.* — Payload analysis (3)

Primitive Kind Description
toolchain.payload.payload_simhash hash 64-bit SimHash of the payload binary/shellcode. Preserves near-duplicate relationships: payloads that are 90% similar have low Hamming distance (<4 bits on 64-bit), enabling family clustering despite minor obfuscation. 16-char hex.
toolchain.payload.payload_entropy_class categorical Shannon entropy of payload bytes. packed >7.2 bits/byte (UPX, encrypted shellcode, base64-compressed). high 6.5-7.2 (unencrypted compiled code). low <5.5 (scripts, plaintext). status: planned
toolchain.payload.loader_family categorical Shellcode/loader family from structural signatures. donut = Donut framework (TheWover), converts .NET/PE to PIC shellcode. sgn = Shikata-Ga-Nai XOR encoder (Metasploit), recognizable feedback register pattern. pe2sh = PE-to-shellcode. nimcrypt = Nim-based loader with AES-encrypted payload. status: planned

Schema

Machine-readable JSON Schema for the observation envelope: json/observation.schema.json

Regenerate after model changes:

python scripts/generate_schema.py

Tests

pytest tests/

Attribution recipes

attribution-recipes.md — out-of-scope reference document describing how an external attribution engine might consume attacker.observation.shell.* topics to build operator profiles. Not part of the BEHAVE spec.

License

Code and schemas: GPL-3.0-or-later Spec prose (this file, attribution-recipes.md): CC-BY-SA-4.0