docs: expand Fingerprinting page — all 6 layers, BEHAVE-SHELL primitives, SMTP signals, TTP detection

2026-05-10 04:19:09 -04:00
parent d45fb08b6d
commit e7d3353bfe
1 changed files with 306 additions and 88 deletions
--- a/Fingerprinting.md
+++ b/Fingerprinting.md
@@ -1,37 +1,48 @@
 # Fingerprinting

-DECNET builds a multi-layer fingerprint of every attacker from three
-independent sources: **passive wire capture**, **active probing**, and
-**inline HTTP inspection**.  Each layer contributes distinct evidence;
-together they let you tell a curl script from a Metasploit operator from a
+DECNET builds a multi-layer fingerprint of every attacker from four
+independent sources: **passive wire capture**, **active probing**,
+**inline HTTP/protocol inspection**, and **behavioural profiling** of
+interactive sessions.  Each layer contributes distinct evidence; together
+they let you tell a curl script from a Metasploit operator from a
 nation-state implant even when the source IP changes.

-All fingerprint data is stored as `bounty` rows in the DECNET database and
-surfaces in the **Attacker detail** page under the *Fingerprints* tab.
+All fingerprint data is stored as `bounty` rows (type `fingerprint`) or
+`ObservationRow` entries in the DECNET database and surfaces in the
+**Attacker detail** page.

 ---

-## Layer 1 — Passive sniffer (network layer)
+## Layer 1 — Passive sniffer (network / TLS layer)

 The sniffer runs fleet-wide on the host interface and reads raw packets
 without touching any decky service.  It fires on the first packet of each
 connection, so it captures the attacker's stack signature before any
 application-level exchange.

-| Fingerprint | What it captures | Algorithm |
+### TLS ClientHello fingerprints
+
+| Fingerprint | Description | Key fields |
 |---|---|---|
-| **JA3 / JA3S** | TLS ClientHello / ServerHello cipher suite and extension order | MD5 of normalised fields per Salesforce spec |
-| **JA4 / JA4S / JA4L** | TLS 1.3-aware version; JA4L adds latency timing | FoxIO JA4 spec |
-| **TCP SYN OS** | MSS, window scale, TCP option order from the SYN | Mini-p0f classifier (`decnet/sniffer/p0f.py`) |
-| **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | FoxIO JA4-QUIC spec |
-| **Flow timing** | Round-trip latency and inter-packet timing | Raw timestamps from the sniffer |
+| **JA3** | MD5 of normalised ClientHello fields (cipher suites, extensions, elliptic curves) | `ja3`, `tls_version`, `sni`, `raw_ciphers`, `raw_extensions` |
+| **JA3S** | MD5 of the ServerHello response | `ja3s` |
+| **JA4** | TLS 1.3-aware successor to JA3 (FoxIO spec) | `ja4`, `alpn`, `dst_port` |
+| **JA4S** | ServerHello counterpart to JA4 | `ja4s` |
+| **JA4L** | JA4 + latency: client TTL and measured RTT | `ja4l`, `rtt_ms`, `client_ttl` |
+| **TLS certificate** | Server cert metadata — useful when the attacker runs their own TLS service | `subject_cn`, `issuer`, `self_signed`, `not_before`, `not_after`, `sans`, `cert_sha256`, `sni`, `target_ip`, `target_port` |
+| **TLS resumption** | Session resumption mechanisms advertised (tickets, session IDs) | `mechanisms` |

-Sniffer events land as `attacker.observed` or `attacker.fingerprinted` bus
-events consumed by the correlator and ingester.
+### Network stack fingerprints

-> **Limitation:** the sniffer only sees the TLS handshake — it cannot read
-> HTTP headers or QUIC stream frames inside an encrypted session.  Layers 2
-> and 3 fill that gap.
+| Fingerprint | Description | Key fields |
+|---|---|---|
+| **TCP SYN OS** | Passive OS classifier from SYN options (mini-p0f) | `os_guess`, `mss`, `window_scale`, `sack_ok`, `timestamp`, `options_order` |
+| **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | `ja4_quic`, `sni`, `alpn`, `raw_ciphers` |
+| **Flow timing** | Inter-packet timing and RTT from the first few packets | stored as `tcp_flow_timing` event |
+
+> The sniffer only sees the TLS handshake — it cannot read HTTP headers or
+> QUIC stream frames inside an encrypted session.  Layers 3 and 4 fill
+> that gap.

 ---

@@ -39,127 +50,334 @@ events consumed by the correlator and ingester.

 After a new attacker is first observed, the prober worker reaches back
 out to the attacker's IP on a set of default ports to collect
-application-level fingerprints.
+application-level fingerprints.  Probes are stealthy — no DECNET banner,
+ordinary client behaviour.  See [Security-and-Stealth](Security-and-Stealth).

-| Fingerprint | Protocol | Ports probed |
-|---|---|---|
-| **JARM** | TLS (any HTTPS-ish service) | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 |
-| **HASSH** | SSH server | 22, 2222, 22222, 2022 |
-| **TCP fingerprint** | TCP SYN response analysis | 22, 80, 443, 8080, 8443, 445, 3389 |
+| Fingerprint | Protocol | Ports probed | Key fields |
+|---|---|---|---|
+| **JARM** | TLS server fingerprint — 10 hand-crafted ClientHellos, 62-char hash of the responses | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 | `hash`, `target_ip`, `target_port` |
+| **HASSH** | SSH server fingerprint — MD5 of `kex;encryption;mac;compression` from the server `KEXINIT` | 22, 2222, 22222, 2022 | `hash`, `ssh_banner`, `kex_algorithms`, `encryption_s2c`, `mac_s2c`, `compression_s2c`, `target_ip`, `target_port` |
+| **TCP fingerprint** | TCP/IP stack OS probe — SYN response TTL, window, options | 22, 80, 443, 8080, 8443, 445, 3389 | `hash`, `raw`, `ttl`, `window_size`, `df_bit`, `mss`, `window_scale`, `options_order` |

-Active probes are stealthy: they look like ordinary clients, carry no
-DECNET-specific banner, and use the same port-rotation patterns an
-informed scanner would use.  See [Security-and-Stealth](Security-and-Stealth).
-
-When a fingerprint changes between probes, a `attacker.fingerprint_rotated`
-bus event fires — that is a strong signal of infrastructure churn (VPS
-swap, cert rotation, banner rewrite).
+When any fingerprint changes between probes, an `attacker.fingerprint_rotated`
+bus event fires — a strong signal of infrastructure churn (VPS swap, cert
+rotation, banner rewrite).

 ---

-## Layer 3 — Inline HTTP fingerprinting (Caddy fp module)
+## Layer 3 — Inline protocol inspection (decky services)
+
+### HTTP header fingerprinting (Caddy `decnet_fp` module)

 The `http` and `https` decky templates ship with a custom Caddy module
-(`decnet_fp`) that intercepts connections at the byte level, before
-Caddy's HTTP parser sees them.  This gives wire-accurate fingerprints
-that cannot be faked by HTTP-level header manipulation.
+that intercepts connections at the **byte level**, before Caddy's HTTP
+parser.  This gives wire-accurate fingerprints that cannot be faked by
+HTTP-level middleware.

-### JA4H (HTTP request header order)
+#### JA4H (HTTP request header order)

-The `decnet_fp` listener wrapper taps the raw TLS stream and buffers the
-first request headers of each connection before replaying them to Caddy's
-parser.
+The listener wrapper taps the raw TLS stream:

- **h1:** headers are split by `\r\n` in arrival order.
- **h2:** a per-connection HPACK decoder maintains the dynamic table and
-  emits headers in HPACK decode order — pseudo-headers
-  (`:method`, `:path`, `:scheme`, `:authority`) appear first, then regular
-  headers in the order the client encoded them.
+- **HTTP/1.1:** headers split by `\r\n` in arrival order.
+- **HTTP/2:** a per-connection HPACK decoder maintains the dynamic table
+  and emits headers in HPACK decode order — pseudo-headers (`:method`,
+  `:path`, `:scheme`, `:authority`) appear first, then regular headers in
+  the order the client encoded them.

-The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, which
-produces a JA4H hash per the FoxIO spec.
+The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, producing a
+JA4H hash per the FoxIO spec.  Stored with: `ja4h`, `protocol`, `method`,
+`path`, `remote_port`.

 > Map-iteration order in Go is randomised; DECNET captures order at the
-> *byte level*, not from `http.Header`, so the JA4H is reproducible and
-> meaningful.
+> *byte level*, so the JA4H is reproducible and meaningful.

-### H2 SETTINGS
+#### Header order and header quirks

-During the h2 connection preface, the client sends a `SETTINGS` frame
-listing its implementation parameters.  The fp module parses the raw
-6-byte `(id, value)` tuples in wire order and records:
+Beyond the JA4H hash, the raw ordered list of header names is stored
+(`headers_ordered`).  This lets you cluster:

- `settings` — map of setting name → value
-  (e.g. `HEADER_TABLE_SIZE`, `MAX_CONCURRENT_STREAMS`, `INITIAL_WINDOW_SIZE`)
- `frame_order` — setting IDs in the exact order the client sent them
+- **Presence/absence of headers** — curl sends no `Accept-Encoding` on
+  certain invocations; browsers always send it.
+- **Header ordering** — different HTTP clients and frameworks have
+  characteristic orderings even when they send the same headers.
+- **Header casing** — some tools send `content-type` (lowercase), others
+  send `Content-Type`; stored verbatim before normalisation.

-Different HTTP/2 implementations (curl, Chrome, Firefox, Go net/http,
-Java HttpClient) have characteristic SETTINGS maps and orderings.
+#### HTTP/2 SETTINGS frame

-### H3 SETTINGS
+During the h2 connection preface the client sends a `SETTINGS` frame.
+Stored: `settings` (map of name → value) and `frame_order` (IDs in wire
+order).  Different h2 implementations have characteristic SETTINGS maps
+and orderings.

-For HTTP/3, the QUIC server is Caddy with native h3 support.  Caddy
-exposes the client's h3 SETTINGS frame via the `http3.Settingser`
-interface on the `ResponseWriter`.  The fp module captures:
+Known settings captured by name:
+`HEADER_TABLE_SIZE`, `ENABLE_PUSH`, `MAX_CONCURRENT_STREAMS`,
+`INITIAL_WINDOW_SIZE`, `MAX_FRAME_SIZE`, `MAX_HEADER_LIST_SIZE`.

- `EnableDatagrams` — whether the client advertised H3 datagram support
- `EnableExtendedConnect` — extended CONNECT (used by WebTransport)
- `Other` — any additional settings (including GREASE entries)
+#### HTTP/3 SETTINGS

-### Source port as fingerprint signal
+For HTTP/3, the module reads client SETTINGS via the `http3.Settingser`
+interface: `EnableDatagrams`, `EnableExtendedConnect`, and any additional
+settings (including GREASE entries stored as `GREASE_<hex>`).

-`remote_addr` in every fp record is the full `host:port` string from
-Go's network layer.  The collector strips the port before resolving
-attacker identity (so 50 connections from the same IP do not produce 50
-attackers), but preserves it as `remote_port` in the structured fields.
+#### User-Agent classification

-An attacker whose tooling consistently originates from the same source
-port (or a narrow range) is a meaningful signal — some NAT devices, VPN
-clients, and C2 frameworks exhibit this behaviour.  `remote_port` is
-stored in the `fingerprint` bounty payload and visible in the Attacker
-detail page.
+Every HTTP request captures the `User-Agent` header and classifies it:
+
+| Signal | Description |
+|---|---|
+| Tool category | browser, scanner, curl, python-requests, Go net/http, Java, custom, unknown |
+| Tool name | specific tool if detectable (e.g. `Nikto`, `sqlmap`, `Masscan`) |
+| Signals | flags such as `headless_browser`, `vuln_scanner`, `exploit_framework` |
+
+Stored as bounty type `fingerprint`, `fingerprint_type: "http_useragent"`.
+
+#### IP leak / source IP signals
+
+Proxy and forwarding headers are inspected on every HTTP request:
+
+- **`ip_leak`** — the attacker's real public IP appeared in `X-Forwarded-For`,
+  `Forwarded`, `X-Real-IP`, `CF-Connecting-IP`, or `True-Client-IP`.  This
+  happens when an attacker routes through a misconfigured proxy.
+  Fields: `claimed_ip`, `header_name`, `source_ip`.
+
+- **`spoofed_source`** — a non-routable IP (RFC1918, loopback, link-local,
+  reserved) appeared in a proxy header — a WAF bypass attempt.
+  Fields: `claimed_ip`, `header_name`, `category`.
+
+#### Source port as fingerprint signal
+
+`remote_addr` from Go's network layer is `host:port`.  The collector
+strips the port before resolving attacker identity (so 50 connections from
+the same IP do not produce 50 attacker rows), but preserves it as
+`remote_port` in the bounty payload.  An attacker whose tooling
+consistently originates from the same source port is a meaningful signal
+(some NAT devices, VPN clients, and C2 frameworks exhibit this behaviour).
+
+### VNC
+
+| Signal | Description | Field |
+|---|---|---|
+| **VNC client version** | RFB protocol version string from the VNC client's greeting | `value` |
+
+### SSH / Telnet — session recording and keystroke dynamics
+
+The `sessrec` module records the full PTY byte stream of every interactive
+shell session.  Two signals are extracted:
+
+#### Commands executed
+
+Every command entered at the shell prompt is captured with:
+- `command` — the raw command string
+- `timestamp`, `session_id`, `attacker_ip`, `decky`, `service`
+- Aggregated on session end into a command list on the `session_recorded`
+  event.
+
+Command content reveals intent directly: reconnaissance (`id`, `whoami`,
+`uname -a`, `cat /etc/passwd`), lateral movement (`ssh`, `scp`),
+persistence (`crontab -e`, `echo >> ~/.bashrc`), exfiltration
+(`curl`, `wget`, `base64`, `scp`).
+
+#### Keystroke dynamics (BEHAVE-SHELL spec)
+
+The BEHAVE-SHELL spec (`decnet/profiler/behave_shell/`) extracts
+fine-grained typing and session behaviour from the PTY stream.  These
+become **attribution primitives** — per-`(identity_uuid, primitive)`
+state-machine entries that accumulate evidence across sessions.
+
+**Motor patterns** (muscle memory, latency):
+
+| Primitive | Description |
+|---|---|
+| `interarrival_mean_sec` | Mean time between keystrokes/commands |
+| `interarrival_p75_sec`, `interarrival_p99_sec` | Tail latency — distinguishes human from bot |
+| `flow_rate_cmd_per_sec` | Command execution rate |
+| `burst_event_count` | Clustering in time (burst size) |
+| `typing_speed_wpm` | Estimated words per minute |
+| `error_correction_ratio` | Backspace and correction frequency |
+
+**Cognitive patterns** (decision-making):
+
+| Primitive | Description |
+|---|---|
+| `command_error_rate` | Failure-command ratio |
+| `retry_on_failure_ratio` | Persistence on error |
+| `command_redo_rate` | Repeating the same failed command |
+| `pipeline_breadth`, `pipeline_depth` | Command composition style |
+| `distinct_tools_used` | Toolkit diversity per session |
+| `tool_switch_frequency` | How often the operator changes tool |
+| `verbose_flag_usage` | `-v`/`-vv` flag frequency (confidence proxy) |
+
+**Temporal patterns** (working hours, rhythm):
+
+| Primitive | Description |
+|---|---|
+| `activity_hour_of_day_entropy` | Consistency of working hours |
+| `activity_day_of_week_entropy` | Weekly routine |
+| `session_duration_p50_sec`, `p95_sec` | Session length distribution |
+| `gaps_between_sessions_p50_sec` | Rest period / tool pacing |
+
+**Environmental patterns** (operator setup):
+
+| Primitive | Description |
+|---|---|
+| `shell_type` | bash / sh / zsh / fish / etc. |
+| `environment_vars_entropy` | Degree of environment customisation |
+| `working_directory_volatility` | Directory-jumping frequency |
+| `tty_capabilities` | Terminal rows, cols, and `$TERM` value |
+
+**Operational patterns** (technique selection):
+
+| Primitive | Description |
+|---|---|
+| `privilege_escalation_attempts` | `sudo` / `su` frequency |
+| `lateral_movement_attempts` | SSH/RDP connection attempts |
+| `data_exfiltration_indicators` | `scp`, `curl`, `wget`, `base64`, `zcat` |
+| `credential_access_attempts` | Greping for passwords, SSH key files |
+| `persistence_technique_count` | Crontab edits, `.bashrc` modifications |
+
+Each primitive has a state machine: `unknown → stable → drifting →
+conflicted → multi_actor`.  When two or more primitives independently flag
+`multi_actor` (e.g. two distinct shell types alternating per session),
+an `attribution.profile.multi_actor_suspected` bus event fires — a strong
+indicator of a shared credential or a compromised operator account.
+
+---
+
+## Layer 4 — SMTP / email identity signals
+
+Every inbound email to an `smtp` or `smtp_relay` decky produces a rich set
+of identity signals:
+
+### Attacker domains and sender identity
+
+| Signal | Description |
+|---|---|
+| `mail_from_domain` | Domain in the SMTP envelope `MAIL FROM` |
+| `from_domain` | Domain in the `From:` header (may differ from envelope) |
+| `return_path_domain` | `Return-Path:` domain |
+| `x_mailer` | `X-Mailer` header — identifies the mail client or framework |
+| `dkim_signed` | DKIM signature present (bool) |
+| `spf_pass` | SPF check result (bool) |
+
+### Victim domain targeting
+
+| Signal | Description |
+|---|---|
+| `rcpt_domains` | Set of unique domains in the `RCPT TO` list |
+| `rcpt_count` | Number of recipients (bulk vs. targeted) |
+
+### Payload and attachment fingerprints
+
+| Signal | Description |
+|---|---|
+| `body_simhash` | 16-hex similarity hash of the email body — clusters phishing campaigns |
+| `body_sha256` | Exact body hash |
+| `attachment_sha256s` | Per-attachment SHA-256 list |
+| `attachment_extensions` | File extension set |
+| `attachment_macros` | Macro-bearing Office documents detected (bool) |
+| `attachment_password_protected` | Encrypted attachment (evasion signal) |
+| `html_smuggling` | HTML obfuscation / JS blob smuggling detected (bool) |
+| `mal_hash_match` | Any attachment hash matched MalwareBazaar bulk feed (bool) |
+| `urls` | Extracted URLs from body |
+
+---
+
+## Layer 5 — TTP and tool detection
+
+The TTP engine (`decnet/ttp/`) maps collected events onto MITRE ATT&CK
+techniques.  Detected techniques are stored as `ttp_tag` rows and surfaced
+in the Attacker detail page.
+
+**Detected tools** are inferred from:
+- Command strings matched against known-tool signatures (nmap, Metasploit,
+  BloodHound, Mimikatz, linpeas, pspy, etc.)
+- User-Agent strings for HTTP tools
+- SSH banner strings from the HASSH probe
+- TLS fingerprints matching known C2 frameworks (Cobalt Strike JARM, etc.)
+
+---
+
+## Layer 6 — Inter-event timing and phase sequence
+
+The correlator and attribution engine track **how** an attacker behaves
+across an entire engagement, not just individual connections.
+
+### Inter-event timing
+
+Time deltas between successive events of the same type reveal automation
+vs. human operation:
+
+- Sub-second, uniform intervals → scripted scanner or bot.
+- Variable intervals with human-range pauses (2–30 s) → interactive
+  operator.
+- Long gaps between sessions with consistent inter-session intervals →
+  scheduled beacon or cron-driven implant.
+
+These are captured as attribution primitives (`interarrival_*`) via the
+BEHAVE-SHELL profiler and as raw timestamps on `bounty` rows.
+
+### Phase sequence
+
+The correlator classifies each event into an engagement phase:
+`reconnaisance`, `exploitation`, `post-exploitation`, `exfiltration`,
+`persistence`, `lateral movement`.  The sequence of phases across a
+session is a fingerprint in itself — some toolkits always run
+reconnaissance before exploitation; human operators often skip phases or
+return to earlier ones.
+
+Phase-sequence analysis drives the `phase_sequence` attribution primitive
+and feeds the campaign clusterer.

 ---

 ## Where fingerprints are stored

-Every fingerprint event produces a `bounty` row:
-
 | Bounty `fingerprint_type` | Source | Key discriminating fields |
 |---|---|---|
-| `ja3` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `ciphers` |
+| `ja3` / `ja3s` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `sni`, `raw_ciphers` |
+| `ja4l` | Sniffer | `rtt_ms`, `client_ttl` |
 | `ja4_quic` | Sniffer | `ja4_quic`, `sni`, `alpn` |
-| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale` |
-| `jarm` | Prober | `jarm_hash`, `port` |
-| `hassh` | Prober | `hassh_server`, `port` |
-| `tcpfp` | Prober | `tcp_fp_hash`, `port` |
+| `tls_certificate` | Sniffer + prober | `cert_sha256`, `subject_cn`, `sans` |
+| `tls_resumption` | Sniffer | `mechanisms` |
+| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale`, `options_order` |
+| `jarm` | Prober | `hash`, `target_port` |
+| `hassh_server` | Prober | `hash`, `ssh_banner`, `kex_algorithms` |
+| `tcpfp` | Prober | `hash`, `ttl`, `window_size`, `df_bit` |
 | `ja4h` | Caddy fp module | `ja4h`, `protocol`, `method`, `remote_port` |
 | `http2_settings` | Caddy fp module | `settings`, `frame_order`, `remote_port` |
 | `http3_settings` | Caddy fp module | `settings`, `remote_port` |
+| `http_useragent` | Ingester (HTTP events) | `category`, `tool`, `signals` |
+| `http_header_quirks` | Ingester (HTTP events) | `headers_ordered` |
+| `vnc_client_version` | Ingester (VNC events) | `value` |

 Bounties are deduplicated per `(attacker_uuid, fingerprint_type, hash)` so
-repeated connections from the same attacker produce one row, not thousands.
+repeated connections produce one row, not thousands.
+
+Non-fingerprint bounty types: `ip_leak`, `spoofed_source`, `artifact`
+(captured files and emails), `credential` (harvested secrets).

 ---

 ## Enabling inline HTTP fingerprinting

 The Caddy fp module is **built into the `http` and `https` decky templates
-automatically** — no extra configuration is needed.  The module activates
-when the template is deployed.
+automatically** — no configuration is needed.  For HTTP/3, ensure `http/3`
+is listed in the service's `http_versions` setting.

-For HTTP/3, ensure `http/3` is listed in the service's `http_versions`
-setting.  Caddy's native h3 stack handles UDP/443; the fp module hooks into
-it via the `http3.Settingser` interface.
+SSH/Telnet keystroke dynamics require the `behave_shell` feature to be
+enabled on the service (see [Service-Personas](Service-Personas)).

 ---

 ## Related pages

 - [Identity-Resolution](Identity-Resolution) — how fingerprints are
-  clustered into attacker identities
+  clustered into attacker identities and campaigns
 - [OS-Fingerprint-Spoofing](OS-Fingerprint-Spoofing) — how DECNET spoofs
  *its own* OS fingerprint to look like the target OS
 - [Security-and-Stealth](Security-and-Stealth) — probe stealth measures
 - [Logging-and-Syslog](Logging-and-Syslog) — how fp socket records flow
  through syslog_bridge to the collector
+- [Service-Personas](Service-Personas) — configuring BEHAVE-SHELL and
+  session recording per service