docs: expand Fingerprinting page — all 6 layers, BEHAVE-SHELL primitives, SMTP signals, TTP detection

2026-05-10 04:19:09 -04:00
parent d45fb08b6d
commit e7d3353bfe
1 changed files with 306 additions and 88 deletions
--- a/Fingerprinting.md
+++ b/Fingerprinting.md
@@ -1,37 +1,48 @@
 # Fingerprinting
-DECNET builds a multi-layer fingerprint of every attacker from three
+DECNET builds a multi-layer fingerprint of every attacker from four
-independent sources: **passive wire capture**, **active probing**, and
+independent sources: **passive wire capture**, **active probing**,
-**inline HTTP inspection**.  Each layer contributes distinct evidence;
+**inline HTTP/protocol inspection**, and **behavioural profiling** of
-together they let you tell a curl script from a Metasploit operator from a
+interactive sessions.  Each layer contributes distinct evidence; together
 they let you tell a curl script from a Metasploit operator from a
 nation-state implant even when the source IP changes.
-All fingerprint data is stored as `bounty` rows in the DECNET database and
+All fingerprint data is stored as `bounty` rows (type `fingerprint`) or
-surfaces in the **Attacker detail** page under the *Fingerprints* tab.
+`ObservationRow` entries in the DECNET database and surfaces in the
 **Attacker detail** page.
 ---
-## Layer 1 — Passive sniffer (network layer)
+## Layer 1 — Passive sniffer (network / TLS layer)
 The sniffer runs fleet-wide on the host interface and reads raw packets
 without touching any decky service.  It fires on the first packet of each
 connection, so it captures the attacker's stack signature before any
 application-level exchange.
-| Fingerprint | What it captures | Algorithm |
+### TLS ClientHello fingerprints
 | Fingerprint | Description | Key fields |
 |---|---|---|
-| **JA3 / JA3S** | TLS ClientHello / ServerHello cipher suite and extension order | MD5 of normalised fields per Salesforce spec |
+| **JA3** | MD5 of normalised ClientHello fields (cipher suites, extensions, elliptic curves) | `ja3`, `tls_version`, `sni`, `raw_ciphers`, `raw_extensions` |
-| **JA4 / JA4S / JA4L** | TLS 1.3-aware version; JA4L adds latency timing | FoxIO JA4 spec |
+| **JA3S** | MD5 of the ServerHello response | `ja3s` |
-| **TCP SYN OS** | MSS, window scale, TCP option order from the SYN | Mini-p0f classifier (`decnet/sniffer/p0f.py`) |
+| **JA4** | TLS 1.3-aware successor to JA3 (FoxIO spec) | `ja4`, `alpn`, `dst_port` |
-| **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | FoxIO JA4-QUIC spec |
+| **JA4S** | ServerHello counterpart to JA4 | `ja4s` |
-| **Flow timing** | Round-trip latency and inter-packet timing | Raw timestamps from the sniffer |
+| **JA4L** | JA4 + latency: client TTL and measured RTT | `ja4l`, `rtt_ms`, `client_ttl` |
 | **TLS certificate** | Server cert metadata — useful when the attacker runs their own TLS service | `subject_cn`, `issuer`, `self_signed`, `not_before`, `not_after`, `sans`, `cert_sha256`, `sni`, `target_ip`, `target_port` |
 | **TLS resumption** | Session resumption mechanisms advertised (tickets, session IDs) | `mechanisms` |
-Sniffer events land as `attacker.observed` or `attacker.fingerprinted` bus
+### Network stack fingerprints
 events consumed by the correlator and ingester.
-> **Limitation:** the sniffer only sees the TLS handshake — it cannot read
+| Fingerprint | Description | Key fields |
-> HTTP headers or QUIC stream frames inside an encrypted session.  Layers 2
+|---|---|---|
-> and 3 fill that gap.
+| **TCP SYN OS** | Passive OS classifier from SYN options (mini-p0f) | `os_guess`, `mss`, `window_scale`, `sack_ok`, `timestamp`, `options_order` |
 | **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | `ja4_quic`, `sni`, `alpn`, `raw_ciphers` |
 | **Flow timing** | Inter-packet timing and RTT from the first few packets | stored as `tcp_flow_timing` event |
 > The sniffer only sees the TLS handshake — it cannot read HTTP headers or
 > QUIC stream frames inside an encrypted session.  Layers 3 and 4 fill
 > that gap.
 ---
@@ -39,127 +50,334 @@ events consumed by the correlator and ingester.
 After a new attacker is first observed, the prober worker reaches back
 out to the attacker's IP on a set of default ports to collect
-application-level fingerprints.
+application-level fingerprints.  Probes are stealthy — no DECNET banner,
 ordinary client behaviour.  See [Security-and-Stealth](Security-and-Stealth).
-| Fingerprint | Protocol | Ports probed |
+| Fingerprint | Protocol | Ports probed | Key fields |
-|---|---|---|
+|---|---|---|---|
-| **JARM** | TLS (any HTTPS-ish service) | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 |
+| **JARM** | TLS server fingerprint — 10 hand-crafted ClientHellos, 62-char hash of the responses | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 | `hash`, `target_ip`, `target_port` |
-| **HASSH** | SSH server | 22, 2222, 22222, 2022 |
+| **HASSH** | SSH server fingerprint — MD5 of `kex;encryption;mac;compression` from the server `KEXINIT` | 22, 2222, 22222, 2022 | `hash`, `ssh_banner`, `kex_algorithms`, `encryption_s2c`, `mac_s2c`, `compression_s2c`, `target_ip`, `target_port` |
-| **TCP fingerprint** | TCP SYN response analysis | 22, 80, 443, 8080, 8443, 445, 3389 |
+| **TCP fingerprint** | TCP/IP stack OS probe — SYN response TTL, window, options | 22, 80, 443, 8080, 8443, 445, 3389 | `hash`, `raw`, `ttl`, `window_size`, `df_bit`, `mss`, `window_scale`, `options_order` |
-Active probes are stealthy: they look like ordinary clients, carry no
+When any fingerprint changes between probes, an `attacker.fingerprint_rotated`
-DECNET-specific banner, and use the same port-rotation patterns an
+bus event fires — a strong signal of infrastructure churn (VPS swap, cert
-informed scanner would use.  See [Security-and-Stealth](Security-and-Stealth).
+rotation, banner rewrite).
 When a fingerprint changes between probes, a `attacker.fingerprint_rotated`
 bus event fires — that is a strong signal of infrastructure churn (VPS
 swap, cert rotation, banner rewrite).
 ---
-## Layer 3 — Inline HTTP fingerprinting (Caddy fp module)
+## Layer 3 — Inline protocol inspection (decky services)
 ### HTTP header fingerprinting (Caddy `decnet_fp` module)
 The `http` and `https` decky templates ship with a custom Caddy module
-(`decnet_fp`) that intercepts connections at the byte level, before
+that intercepts connections at the **byte level**, before Caddy's HTTP
-Caddy's HTTP parser sees them.  This gives wire-accurate fingerprints
+parser.  This gives wire-accurate fingerprints that cannot be faked by
-that cannot be faked by HTTP-level header manipulation.
+HTTP-level middleware.
-### JA4H (HTTP request header order)
+#### JA4H (HTTP request header order)
-The `decnet_fp` listener wrapper taps the raw TLS stream and buffers the
+The listener wrapper taps the raw TLS stream:
 first request headers of each connection before replaying them to Caddy's
 parser.
- **h1:** headers are split by `\r\n` in arrival order.
+- **HTTP/1.1:** headers split by `\r\n` in arrival order.
- **h2:** a per-connection HPACK decoder maintains the dynamic table and
+- **HTTP/2:** a per-connection HPACK decoder maintains the dynamic table
-  emits headers in HPACK decode order — pseudo-headers
+  and emits headers in HPACK decode order — pseudo-headers (`:method`,
-  (`:method`, `:path`, `:scheme`, `:authority`) appear first, then regular
+  `:path`, `:scheme`, `:authority`) appear first, then regular headers in
-  headers in the order the client encoded them.
+  the order the client encoded them.
-The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, which
+The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, producing a
-produces a JA4H hash per the FoxIO spec.
+JA4H hash per the FoxIO spec.  Stored with: `ja4h`, `protocol`, `method`,
 `path`, `remote_port`.
 > Map-iteration order in Go is randomised; DECNET captures order at the
-> *byte level*, not from `http.Header`, so the JA4H is reproducible and
+> *byte level*, so the JA4H is reproducible and meaningful.
 > meaningful.
-### H2 SETTINGS
+#### Header order and header quirks
-During the h2 connection preface, the client sends a `SETTINGS` frame
+Beyond the JA4H hash, the raw ordered list of header names is stored
-listing its implementation parameters.  The fp module parses the raw
+(`headers_ordered`).  This lets you cluster:
 6-byte `(id, value)` tuples in wire order and records:
- `settings` — map of setting name → value
+- **Presence/absence of headers** — curl sends no `Accept-Encoding` on
-  (e.g. `HEADER_TABLE_SIZE`, `MAX_CONCURRENT_STREAMS`, `INITIAL_WINDOW_SIZE`)
+  certain invocations; browsers always send it.
- `frame_order` — setting IDs in the exact order the client sent them
+- **Header ordering** — different HTTP clients and frameworks have
  characteristic orderings even when they send the same headers.
 - **Header casing** — some tools send `content-type` (lowercase), others
  send `Content-Type`; stored verbatim before normalisation.
-Different HTTP/2 implementations (curl, Chrome, Firefox, Go net/http,
+#### HTTP/2 SETTINGS frame
 Java HttpClient) have characteristic SETTINGS maps and orderings.
-### H3 SETTINGS
+During the h2 connection preface the client sends a `SETTINGS` frame.
 Stored: `settings` (map of name → value) and `frame_order` (IDs in wire
 order).  Different h2 implementations have characteristic SETTINGS maps
 and orderings.
-For HTTP/3, the QUIC server is Caddy with native h3 support.  Caddy
+Known settings captured by name:
-exposes the client's h3 SETTINGS frame via the `http3.Settingser`
+`HEADER_TABLE_SIZE`, `ENABLE_PUSH`, `MAX_CONCURRENT_STREAMS`,
-interface on the `ResponseWriter`.  The fp module captures:
+`INITIAL_WINDOW_SIZE`, `MAX_FRAME_SIZE`, `MAX_HEADER_LIST_SIZE`.
- `EnableDatagrams` — whether the client advertised H3 datagram support
+#### HTTP/3 SETTINGS
 - `EnableExtendedConnect` — extended CONNECT (used by WebTransport)
 - `Other` — any additional settings (including GREASE entries)
-### Source port as fingerprint signal
+For HTTP/3, the module reads client SETTINGS via the `http3.Settingser`
 interface: `EnableDatagrams`, `EnableExtendedConnect`, and any additional
 settings (including GREASE entries stored as `GREASE_<hex>`).
-`remote_addr` in every fp record is the full `host:port` string from
+#### User-Agent classification
 Go's network layer.  The collector strips the port before resolving
 attacker identity (so 50 connections from the same IP do not produce 50
 attackers), but preserves it as `remote_port` in the structured fields.
-An attacker whose tooling consistently originates from the same source
+Every HTTP request captures the `User-Agent` header and classifies it:
-port (or a narrow range) is a meaningful signal — some NAT devices, VPN
+
-clients, and C2 frameworks exhibit this behaviour.  `remote_port` is
+| Signal | Description |
-stored in the `fingerprint` bounty payload and visible in the Attacker
+|---|---|
-detail page.
+| Tool category | browser, scanner, curl, python-requests, Go net/http, Java, custom, unknown |
 | Tool name | specific tool if detectable (e.g. `Nikto`, `sqlmap`, `Masscan`) |
 | Signals | flags such as `headless_browser`, `vuln_scanner`, `exploit_framework` |
 Stored as bounty type `fingerprint`, `fingerprint_type: "http_useragent"`.
 #### IP leak / source IP signals
 Proxy and forwarding headers are inspected on every HTTP request:
 - **`ip_leak`** — the attacker's real public IP appeared in `X-Forwarded-For`,
  `Forwarded`, `X-Real-IP`, `CF-Connecting-IP`, or `True-Client-IP`.  This
  happens when an attacker routes through a misconfigured proxy.
  Fields: `claimed_ip`, `header_name`, `source_ip`.
 - **`spoofed_source`** — a non-routable IP (RFC1918, loopback, link-local,
  reserved) appeared in a proxy header — a WAF bypass attempt.
  Fields: `claimed_ip`, `header_name`, `category`.
 #### Source port as fingerprint signal
 `remote_addr` from Go's network layer is `host:port`.  The collector
 strips the port before resolving attacker identity (so 50 connections from
 the same IP do not produce 50 attacker rows), but preserves it as
 `remote_port` in the bounty payload.  An attacker whose tooling
 consistently originates from the same source port is a meaningful signal
 (some NAT devices, VPN clients, and C2 frameworks exhibit this behaviour).
 ### VNC
 | Signal | Description | Field |
 |---|---|---|
 | **VNC client version** | RFB protocol version string from the VNC client's greeting | `value` |
 ### SSH / Telnet — session recording and keystroke dynamics
 The `sessrec` module records the full PTY byte stream of every interactive
 shell session.  Two signals are extracted:
 #### Commands executed
 Every command entered at the shell prompt is captured with:
 - `command` — the raw command string
 - `timestamp`, `session_id`, `attacker_ip`, `decky`, `service`
 - Aggregated on session end into a command list on the `session_recorded`
  event.
 Command content reveals intent directly: reconnaissance (`id`, `whoami`,
 `uname -a`, `cat /etc/passwd`), lateral movement (`ssh`, `scp`),
 persistence (`crontab -e`, `echo >> ~/.bashrc`), exfiltration
 (`curl`, `wget`, `base64`, `scp`).
 #### Keystroke dynamics (BEHAVE-SHELL spec)
 The BEHAVE-SHELL spec (`decnet/profiler/behave_shell/`) extracts
 fine-grained typing and session behaviour from the PTY stream.  These
 become **attribution primitives** — per-`(identity_uuid, primitive)`
 state-machine entries that accumulate evidence across sessions.
 **Motor patterns** (muscle memory, latency):
 | Primitive | Description |
 |---|---|
 | `interarrival_mean_sec` | Mean time between keystrokes/commands |
 | `interarrival_p75_sec`, `interarrival_p99_sec` | Tail latency — distinguishes human from bot |
 | `flow_rate_cmd_per_sec` | Command execution rate |
 | `burst_event_count` | Clustering in time (burst size) |
 | `typing_speed_wpm` | Estimated words per minute |
 | `error_correction_ratio` | Backspace and correction frequency |
 **Cognitive patterns** (decision-making):
 | Primitive | Description |
 |---|---|
 | `command_error_rate` | Failure-command ratio |
 | `retry_on_failure_ratio` | Persistence on error |
 | `command_redo_rate` | Repeating the same failed command |
 | `pipeline_breadth`, `pipeline_depth` | Command composition style |
 | `distinct_tools_used` | Toolkit diversity per session |
 | `tool_switch_frequency` | How often the operator changes tool |
 | `verbose_flag_usage` | `-v`/`-vv` flag frequency (confidence proxy) |
 **Temporal patterns** (working hours, rhythm):
 | Primitive | Description |
 |---|---|
 | `activity_hour_of_day_entropy` | Consistency of working hours |
 | `activity_day_of_week_entropy` | Weekly routine |
 | `session_duration_p50_sec`, `p95_sec` | Session length distribution |
 | `gaps_between_sessions_p50_sec` | Rest period / tool pacing |
 **Environmental patterns** (operator setup):
 | Primitive | Description |
 |---|---|
 | `shell_type` | bash / sh / zsh / fish / etc. |
 | `environment_vars_entropy` | Degree of environment customisation |
 | `working_directory_volatility` | Directory-jumping frequency |
 | `tty_capabilities` | Terminal rows, cols, and `$TERM` value |
 **Operational patterns** (technique selection):
 | Primitive | Description |
 |---|---|
 | `privilege_escalation_attempts` | `sudo` / `su` frequency |
 | `lateral_movement_attempts` | SSH/RDP connection attempts |
 | `data_exfiltration_indicators` | `scp`, `curl`, `wget`, `base64`, `zcat` |
 | `credential_access_attempts` | Greping for passwords, SSH key files |
 | `persistence_technique_count` | Crontab edits, `.bashrc` modifications |
 Each primitive has a state machine: `unknown → stable → drifting →
 conflicted → multi_actor`.  When two or more primitives independently flag
 `multi_actor` (e.g. two distinct shell types alternating per session),
 an `attribution.profile.multi_actor_suspected` bus event fires — a strong
 indicator of a shared credential or a compromised operator account.
 ---
 ## Layer 4 — SMTP / email identity signals
 Every inbound email to an `smtp` or `smtp_relay` decky produces a rich set
 of identity signals:
 ### Attacker domains and sender identity
 | Signal | Description |
 |---|---|
 | `mail_from_domain` | Domain in the SMTP envelope `MAIL FROM` |
 | `from_domain` | Domain in the `From:` header (may differ from envelope) |
 | `return_path_domain` | `Return-Path:` domain |
 | `x_mailer` | `X-Mailer` header — identifies the mail client or framework |
 | `dkim_signed` | DKIM signature present (bool) |
 | `spf_pass` | SPF check result (bool) |
 ### Victim domain targeting
 | Signal | Description |
 |---|---|
 | `rcpt_domains` | Set of unique domains in the `RCPT TO` list |
 | `rcpt_count` | Number of recipients (bulk vs. targeted) |
 ### Payload and attachment fingerprints
 | Signal | Description |
 |---|---|
 | `body_simhash` | 16-hex similarity hash of the email body — clusters phishing campaigns |
 | `body_sha256` | Exact body hash |
 | `attachment_sha256s` | Per-attachment SHA-256 list |
 | `attachment_extensions` | File extension set |
 | `attachment_macros` | Macro-bearing Office documents detected (bool) |
 | `attachment_password_protected` | Encrypted attachment (evasion signal) |
 | `html_smuggling` | HTML obfuscation / JS blob smuggling detected (bool) |
 | `mal_hash_match` | Any attachment hash matched MalwareBazaar bulk feed (bool) |
 | `urls` | Extracted URLs from body |
 ---
 ## Layer 5 — TTP and tool detection
 The TTP engine (`decnet/ttp/`) maps collected events onto MITRE ATT&CK
 techniques.  Detected techniques are stored as `ttp_tag` rows and surfaced
 in the Attacker detail page.
 **Detected tools** are inferred from:
 - Command strings matched against known-tool signatures (nmap, Metasploit,
  BloodHound, Mimikatz, linpeas, pspy, etc.)
 - User-Agent strings for HTTP tools
 - SSH banner strings from the HASSH probe
 - TLS fingerprints matching known C2 frameworks (Cobalt Strike JARM, etc.)
 ---
 ## Layer 6 — Inter-event timing and phase sequence
 The correlator and attribution engine track **how** an attacker behaves
 across an entire engagement, not just individual connections.
 ### Inter-event timing
 Time deltas between successive events of the same type reveal automation
 vs. human operation:
 - Sub-second, uniform intervals → scripted scanner or bot.
 - Variable intervals with human-range pauses (2–30 s) → interactive
  operator.
 - Long gaps between sessions with consistent inter-session intervals →
  scheduled beacon or cron-driven implant.
 These are captured as attribution primitives (`interarrival_*`) via the
 BEHAVE-SHELL profiler and as raw timestamps on `bounty` rows.
 ### Phase sequence
 The correlator classifies each event into an engagement phase:
 `reconnaisance`, `exploitation`, `post-exploitation`, `exfiltration`,
 `persistence`, `lateral movement`.  The sequence of phases across a
 session is a fingerprint in itself — some toolkits always run
 reconnaissance before exploitation; human operators often skip phases or
 return to earlier ones.
 Phase-sequence analysis drives the `phase_sequence` attribution primitive
 and feeds the campaign clusterer.
 ---
 ## Where fingerprints are stored
 Every fingerprint event produces a `bounty` row:
 | Bounty `fingerprint_type` | Source | Key discriminating fields |
 |---|---|---|
-| `ja3` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `ciphers` |
+| `ja3` / `ja3s` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `sni`, `raw_ciphers` |
 | `ja4l` | Sniffer | `rtt_ms`, `client_ttl` |
 | `ja4_quic` | Sniffer | `ja4_quic`, `sni`, `alpn` |
-| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale` |
+| `tls_certificate` | Sniffer + prober | `cert_sha256`, `subject_cn`, `sans` |
-| `jarm` | Prober | `jarm_hash`, `port` |
+| `tls_resumption` | Sniffer | `mechanisms` |
-| `hassh` | Prober | `hassh_server`, `port` |
+| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale`, `options_order` |
-| `tcpfp` | Prober | `tcp_fp_hash`, `port` |
+| `jarm` | Prober | `hash`, `target_port` |
 | `hassh_server` | Prober | `hash`, `ssh_banner`, `kex_algorithms` |
 | `tcpfp` | Prober | `hash`, `ttl`, `window_size`, `df_bit` |
 | `ja4h` | Caddy fp module | `ja4h`, `protocol`, `method`, `remote_port` |
 | `http2_settings` | Caddy fp module | `settings`, `frame_order`, `remote_port` |
 | `http3_settings` | Caddy fp module | `settings`, `remote_port` |
 | `http_useragent` | Ingester (HTTP events) | `category`, `tool`, `signals` |
 | `http_header_quirks` | Ingester (HTTP events) | `headers_ordered` |
 | `vnc_client_version` | Ingester (VNC events) | `value` |
 Bounties are deduplicated per `(attacker_uuid, fingerprint_type, hash)` so
-repeated connections from the same attacker produce one row, not thousands.
+repeated connections produce one row, not thousands.
 Non-fingerprint bounty types: `ip_leak`, `spoofed_source`, `artifact`
 (captured files and emails), `credential` (harvested secrets).
 ---
 ## Enabling inline HTTP fingerprinting
 The Caddy fp module is **built into the `http` and `https` decky templates
-automatically** — no extra configuration is needed.  The module activates
+automatically** — no configuration is needed.  For HTTP/3, ensure `http/3`
-when the template is deployed.
+is listed in the service's `http_versions` setting.
-For HTTP/3, ensure `http/3` is listed in the service's `http_versions`
+SSH/Telnet keystroke dynamics require the `behave_shell` feature to be
-setting.  Caddy's native h3 stack handles UDP/443; the fp module hooks into
+enabled on the service (see [Service-Personas](Service-Personas)).
 it via the `http3.Settingser` interface.
 ---
 ## Related pages
 - [Identity-Resolution](Identity-Resolution) — how fingerprints are
-  clustered into attacker identities
+  clustered into attacker identities and campaigns
 - [OS-Fingerprint-Spoofing](OS-Fingerprint-Spoofing) — how DECNET spoofs
  *its own* OS fingerprint to look like the target OS
 - [Security-and-Stealth](Security-and-Stealth) — probe stealth measures
 - [Logging-and-Syslog](Logging-and-Syslog) — how fp socket records flow
  through syslog_bridge to the collector
 - [Service-Personas](Service-Personas) — configuring BEHAVE-SHELL and
  session recording per service