docs: expand Fingerprinting page — all 6 layers, BEHAVE-SHELL primitives, SMTP signals, TTP detection

2026-05-10 04:19:09 -04:00
parent d45fb08b6d
commit e7d3353bfe

@@ -1,37 +1,48 @@
# Fingerprinting
DECNET builds a multi-layer fingerprint of every attacker from three
independent sources: **passive wire capture**, **active probing**, and
**inline HTTP inspection**. Each layer contributes distinct evidence;
together they let you tell a curl script from a Metasploit operator from a
DECNET builds a multi-layer fingerprint of every attacker from four
independent sources: **passive wire capture**, **active probing**,
**inline HTTP/protocol inspection**, and **behavioural profiling** of
interactive sessions. Each layer contributes distinct evidence; together
they let you tell a curl script from a Metasploit operator from a
nation-state implant even when the source IP changes.
All fingerprint data is stored as `bounty` rows in the DECNET database and
surfaces in the **Attacker detail** page under the *Fingerprints* tab.
All fingerprint data is stored as `bounty` rows (type `fingerprint`) or
`ObservationRow` entries in the DECNET database and surfaces in the
**Attacker detail** page.
---
## Layer 1 — Passive sniffer (network layer)
## Layer 1 — Passive sniffer (network / TLS layer)
The sniffer runs fleet-wide on the host interface and reads raw packets
without touching any decky service. It fires on the first packet of each
connection, so it captures the attacker's stack signature before any
application-level exchange.
| Fingerprint | What it captures | Algorithm |
### TLS ClientHello fingerprints
| Fingerprint | Description | Key fields |
|---|---|---|
| **JA3 / JA3S** | TLS ClientHello / ServerHello cipher suite and extension order | MD5 of normalised fields per Salesforce spec |
| **JA4 / JA4S / JA4L** | TLS 1.3-aware version; JA4L adds latency timing | FoxIO JA4 spec |
| **TCP SYN OS** | MSS, window scale, TCP option order from the SYN | Mini-p0f classifier (`decnet/sniffer/p0f.py`) |
| **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | FoxIO JA4-QUIC spec |
| **Flow timing** | Round-trip latency and inter-packet timing | Raw timestamps from the sniffer |
| **JA3** | MD5 of normalised ClientHello fields (cipher suites, extensions, elliptic curves) | `ja3`, `tls_version`, `sni`, `raw_ciphers`, `raw_extensions` |
| **JA3S** | MD5 of the ServerHello response | `ja3s` |
| **JA4** | TLS 1.3-aware successor to JA3 (FoxIO spec) | `ja4`, `alpn`, `dst_port` |
| **JA4S** | ServerHello counterpart to JA4 | `ja4s` |
| **JA4L** | JA4 + latency: client TTL and measured RTT | `ja4l`, `rtt_ms`, `client_ttl` |
| **TLS certificate** | Server cert metadata — useful when the attacker runs their own TLS service | `subject_cn`, `issuer`, `self_signed`, `not_before`, `not_after`, `sans`, `cert_sha256`, `sni`, `target_ip`, `target_port` |
| **TLS resumption** | Session resumption mechanisms advertised (tickets, session IDs) | `mechanisms` |
Sniffer events land as `attacker.observed` or `attacker.fingerprinted` bus
events consumed by the correlator and ingester.
### Network stack fingerprints
> **Limitation:** the sniffer only sees the TLS handshake — it cannot read
> HTTP headers or QUIC stream frames inside an encrypted session. Layers 2
> and 3 fill that gap.
| Fingerprint | Description | Key fields |
|---|---|---|
| **TCP SYN OS** | Passive OS classifier from SYN options (mini-p0f) | `os_guess`, `mss`, `window_scale`, `sack_ok`, `timestamp`, `options_order` |
| **JA4-QUIC** | QUIC Initial ClientHello — QUIC-specific extensions and transport params | `ja4_quic`, `sni`, `alpn`, `raw_ciphers` |
| **Flow timing** | Inter-packet timing and RTT from the first few packets | stored as `tcp_flow_timing` event |
> The sniffer only sees the TLS handshake — it cannot read HTTP headers or
> QUIC stream frames inside an encrypted session. Layers 3 and 4 fill
> that gap.
---
@@ -39,127 +50,334 @@ events consumed by the correlator and ingester.
After a new attacker is first observed, the prober worker reaches back
out to the attacker's IP on a set of default ports to collect
application-level fingerprints.
application-level fingerprints. Probes are stealthy — no DECNET banner,
ordinary client behaviour. See [Security-and-Stealth](Security-and-Stealth).
| Fingerprint | Protocol | Ports probed |
|---|---|---|
| **JARM** | TLS (any HTTPS-ish service) | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 |
| **HASSH** | SSH server | 22, 2222, 22222, 2022 |
| **TCP fingerprint** | TCP SYN response analysis | 22, 80, 443, 8080, 8443, 445, 3389 |
| Fingerprint | Protocol | Ports probed | Key fields |
|---|---|---|---|
| **JARM** | TLS server fingerprint — 10 hand-crafted ClientHellos, 62-char hash of the responses | 443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001 | `hash`, `target_ip`, `target_port` |
| **HASSH** | SSH server fingerprint — MD5 of `kex;encryption;mac;compression` from the server `KEXINIT` | 22, 2222, 22222, 2022 | `hash`, `ssh_banner`, `kex_algorithms`, `encryption_s2c`, `mac_s2c`, `compression_s2c`, `target_ip`, `target_port` |
| **TCP fingerprint** | TCP/IP stack OS probe — SYN response TTL, window, options | 22, 80, 443, 8080, 8443, 445, 3389 | `hash`, `raw`, `ttl`, `window_size`, `df_bit`, `mss`, `window_scale`, `options_order` |
Active probes are stealthy: they look like ordinary clients, carry no
DECNET-specific banner, and use the same port-rotation patterns an
informed scanner would use. See [Security-and-Stealth](Security-and-Stealth).
When a fingerprint changes between probes, a `attacker.fingerprint_rotated`
bus event fires — that is a strong signal of infrastructure churn (VPS
swap, cert rotation, banner rewrite).
When any fingerprint changes between probes, an `attacker.fingerprint_rotated`
bus event fires — a strong signal of infrastructure churn (VPS swap, cert
rotation, banner rewrite).
---
## Layer 3 — Inline HTTP fingerprinting (Caddy fp module)
## Layer 3 — Inline protocol inspection (decky services)
### HTTP header fingerprinting (Caddy `decnet_fp` module)
The `http` and `https` decky templates ship with a custom Caddy module
(`decnet_fp`) that intercepts connections at the byte level, before
Caddy's HTTP parser sees them. This gives wire-accurate fingerprints
that cannot be faked by HTTP-level header manipulation.
that intercepts connections at the **byte level**, before Caddy's HTTP
parser. This gives wire-accurate fingerprints that cannot be faked by
HTTP-level middleware.
### JA4H (HTTP request header order)
#### JA4H (HTTP request header order)
The `decnet_fp` listener wrapper taps the raw TLS stream and buffers the
first request headers of each connection before replaying them to Caddy's
parser.
The listener wrapper taps the raw TLS stream:
- **h1:** headers are split by `\r\n` in arrival order.
- **h2:** a per-connection HPACK decoder maintains the dynamic table and
emits headers in HPACK decode order — pseudo-headers
(`:method`, `:path`, `:scheme`, `:authority`) appear first, then regular
headers in the order the client encoded them.
- **HTTP/1.1:** headers split by `\r\n` in arrival order.
- **HTTP/2:** a per-connection HPACK decoder maintains the dynamic table
and emits headers in HPACK decode order — pseudo-headers (`:method`,
`:path`, `:scheme`, `:authority`) appear first, then regular headers in
the order the client encoded them.
The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, which
produces a JA4H hash per the FoxIO spec.
The ordered list feeds `_compute_ja4h` in `syslog_bridge.py`, producing a
JA4H hash per the FoxIO spec. Stored with: `ja4h`, `protocol`, `method`,
`path`, `remote_port`.
> Map-iteration order in Go is randomised; DECNET captures order at the
> *byte level*, not from `http.Header`, so the JA4H is reproducible and
> meaningful.
> *byte level*, so the JA4H is reproducible and meaningful.
### H2 SETTINGS
#### Header order and header quirks
During the h2 connection preface, the client sends a `SETTINGS` frame
listing its implementation parameters. The fp module parses the raw
6-byte `(id, value)` tuples in wire order and records:
Beyond the JA4H hash, the raw ordered list of header names is stored
(`headers_ordered`). This lets you cluster:
- `settings` — map of setting name → value
(e.g. `HEADER_TABLE_SIZE`, `MAX_CONCURRENT_STREAMS`, `INITIAL_WINDOW_SIZE`)
- `frame_order` — setting IDs in the exact order the client sent them
- **Presence/absence of headers** — curl sends no `Accept-Encoding` on
certain invocations; browsers always send it.
- **Header ordering** — different HTTP clients and frameworks have
characteristic orderings even when they send the same headers.
- **Header casing** — some tools send `content-type` (lowercase), others
send `Content-Type`; stored verbatim before normalisation.
Different HTTP/2 implementations (curl, Chrome, Firefox, Go net/http,
Java HttpClient) have characteristic SETTINGS maps and orderings.
#### HTTP/2 SETTINGS frame
### H3 SETTINGS
During the h2 connection preface the client sends a `SETTINGS` frame.
Stored: `settings` (map of name → value) and `frame_order` (IDs in wire
order). Different h2 implementations have characteristic SETTINGS maps
and orderings.
For HTTP/3, the QUIC server is Caddy with native h3 support. Caddy
exposes the client's h3 SETTINGS frame via the `http3.Settingser`
interface on the `ResponseWriter`. The fp module captures:
Known settings captured by name:
`HEADER_TABLE_SIZE`, `ENABLE_PUSH`, `MAX_CONCURRENT_STREAMS`,
`INITIAL_WINDOW_SIZE`, `MAX_FRAME_SIZE`, `MAX_HEADER_LIST_SIZE`.
- `EnableDatagrams` — whether the client advertised H3 datagram support
- `EnableExtendedConnect` — extended CONNECT (used by WebTransport)
- `Other` — any additional settings (including GREASE entries)
#### HTTP/3 SETTINGS
### Source port as fingerprint signal
For HTTP/3, the module reads client SETTINGS via the `http3.Settingser`
interface: `EnableDatagrams`, `EnableExtendedConnect`, and any additional
settings (including GREASE entries stored as `GREASE_<hex>`).
`remote_addr` in every fp record is the full `host:port` string from
Go's network layer. The collector strips the port before resolving
attacker identity (so 50 connections from the same IP do not produce 50
attackers), but preserves it as `remote_port` in the structured fields.
#### User-Agent classification
An attacker whose tooling consistently originates from the same source
port (or a narrow range) is a meaningful signal — some NAT devices, VPN
clients, and C2 frameworks exhibit this behaviour. `remote_port` is
stored in the `fingerprint` bounty payload and visible in the Attacker
detail page.
Every HTTP request captures the `User-Agent` header and classifies it:
| Signal | Description |
|---|---|
| Tool category | browser, scanner, curl, python-requests, Go net/http, Java, custom, unknown |
| Tool name | specific tool if detectable (e.g. `Nikto`, `sqlmap`, `Masscan`) |
| Signals | flags such as `headless_browser`, `vuln_scanner`, `exploit_framework` |
Stored as bounty type `fingerprint`, `fingerprint_type: "http_useragent"`.
#### IP leak / source IP signals
Proxy and forwarding headers are inspected on every HTTP request:
- **`ip_leak`** — the attacker's real public IP appeared in `X-Forwarded-For`,
`Forwarded`, `X-Real-IP`, `CF-Connecting-IP`, or `True-Client-IP`. This
happens when an attacker routes through a misconfigured proxy.
Fields: `claimed_ip`, `header_name`, `source_ip`.
- **`spoofed_source`** — a non-routable IP (RFC1918, loopback, link-local,
reserved) appeared in a proxy header — a WAF bypass attempt.
Fields: `claimed_ip`, `header_name`, `category`.
#### Source port as fingerprint signal
`remote_addr` from Go's network layer is `host:port`. The collector
strips the port before resolving attacker identity (so 50 connections from
the same IP do not produce 50 attacker rows), but preserves it as
`remote_port` in the bounty payload. An attacker whose tooling
consistently originates from the same source port is a meaningful signal
(some NAT devices, VPN clients, and C2 frameworks exhibit this behaviour).
### VNC
| Signal | Description | Field |
|---|---|---|
| **VNC client version** | RFB protocol version string from the VNC client's greeting | `value` |
### SSH / Telnet — session recording and keystroke dynamics
The `sessrec` module records the full PTY byte stream of every interactive
shell session. Two signals are extracted:
#### Commands executed
Every command entered at the shell prompt is captured with:
- `command` — the raw command string
- `timestamp`, `session_id`, `attacker_ip`, `decky`, `service`
- Aggregated on session end into a command list on the `session_recorded`
event.
Command content reveals intent directly: reconnaissance (`id`, `whoami`,
`uname -a`, `cat /etc/passwd`), lateral movement (`ssh`, `scp`),
persistence (`crontab -e`, `echo >> ~/.bashrc`), exfiltration
(`curl`, `wget`, `base64`, `scp`).
#### Keystroke dynamics (BEHAVE-SHELL spec)
The BEHAVE-SHELL spec (`decnet/profiler/behave_shell/`) extracts
fine-grained typing and session behaviour from the PTY stream. These
become **attribution primitives** — per-`(identity_uuid, primitive)`
state-machine entries that accumulate evidence across sessions.
**Motor patterns** (muscle memory, latency):
| Primitive | Description |
|---|---|
| `interarrival_mean_sec` | Mean time between keystrokes/commands |
| `interarrival_p75_sec`, `interarrival_p99_sec` | Tail latency — distinguishes human from bot |
| `flow_rate_cmd_per_sec` | Command execution rate |
| `burst_event_count` | Clustering in time (burst size) |
| `typing_speed_wpm` | Estimated words per minute |
| `error_correction_ratio` | Backspace and correction frequency |
**Cognitive patterns** (decision-making):
| Primitive | Description |
|---|---|
| `command_error_rate` | Failure-command ratio |
| `retry_on_failure_ratio` | Persistence on error |
| `command_redo_rate` | Repeating the same failed command |
| `pipeline_breadth`, `pipeline_depth` | Command composition style |
| `distinct_tools_used` | Toolkit diversity per session |
| `tool_switch_frequency` | How often the operator changes tool |
| `verbose_flag_usage` | `-v`/`-vv` flag frequency (confidence proxy) |
**Temporal patterns** (working hours, rhythm):
| Primitive | Description |
|---|---|
| `activity_hour_of_day_entropy` | Consistency of working hours |
| `activity_day_of_week_entropy` | Weekly routine |
| `session_duration_p50_sec`, `p95_sec` | Session length distribution |
| `gaps_between_sessions_p50_sec` | Rest period / tool pacing |
**Environmental patterns** (operator setup):
| Primitive | Description |
|---|---|
| `shell_type` | bash / sh / zsh / fish / etc. |
| `environment_vars_entropy` | Degree of environment customisation |
| `working_directory_volatility` | Directory-jumping frequency |
| `tty_capabilities` | Terminal rows, cols, and `$TERM` value |
**Operational patterns** (technique selection):
| Primitive | Description |
|---|---|
| `privilege_escalation_attempts` | `sudo` / `su` frequency |
| `lateral_movement_attempts` | SSH/RDP connection attempts |
| `data_exfiltration_indicators` | `scp`, `curl`, `wget`, `base64`, `zcat` |
| `credential_access_attempts` | Greping for passwords, SSH key files |
| `persistence_technique_count` | Crontab edits, `.bashrc` modifications |
Each primitive has a state machine: `unknown → stable → drifting →
conflicted → multi_actor`. When two or more primitives independently flag
`multi_actor` (e.g. two distinct shell types alternating per session),
an `attribution.profile.multi_actor_suspected` bus event fires — a strong
indicator of a shared credential or a compromised operator account.
---
## Layer 4 — SMTP / email identity signals
Every inbound email to an `smtp` or `smtp_relay` decky produces a rich set
of identity signals:
### Attacker domains and sender identity
| Signal | Description |
|---|---|
| `mail_from_domain` | Domain in the SMTP envelope `MAIL FROM` |
| `from_domain` | Domain in the `From:` header (may differ from envelope) |
| `return_path_domain` | `Return-Path:` domain |
| `x_mailer` | `X-Mailer` header — identifies the mail client or framework |
| `dkim_signed` | DKIM signature present (bool) |
| `spf_pass` | SPF check result (bool) |
### Victim domain targeting
| Signal | Description |
|---|---|
| `rcpt_domains` | Set of unique domains in the `RCPT TO` list |
| `rcpt_count` | Number of recipients (bulk vs. targeted) |
### Payload and attachment fingerprints
| Signal | Description |
|---|---|
| `body_simhash` | 16-hex similarity hash of the email body — clusters phishing campaigns |
| `body_sha256` | Exact body hash |
| `attachment_sha256s` | Per-attachment SHA-256 list |
| `attachment_extensions` | File extension set |
| `attachment_macros` | Macro-bearing Office documents detected (bool) |
| `attachment_password_protected` | Encrypted attachment (evasion signal) |
| `html_smuggling` | HTML obfuscation / JS blob smuggling detected (bool) |
| `mal_hash_match` | Any attachment hash matched MalwareBazaar bulk feed (bool) |
| `urls` | Extracted URLs from body |
---
## Layer 5 — TTP and tool detection
The TTP engine (`decnet/ttp/`) maps collected events onto MITRE ATT&CK
techniques. Detected techniques are stored as `ttp_tag` rows and surfaced
in the Attacker detail page.
**Detected tools** are inferred from:
- Command strings matched against known-tool signatures (nmap, Metasploit,
BloodHound, Mimikatz, linpeas, pspy, etc.)
- User-Agent strings for HTTP tools
- SSH banner strings from the HASSH probe
- TLS fingerprints matching known C2 frameworks (Cobalt Strike JARM, etc.)
---
## Layer 6 — Inter-event timing and phase sequence
The correlator and attribution engine track **how** an attacker behaves
across an entire engagement, not just individual connections.
### Inter-event timing
Time deltas between successive events of the same type reveal automation
vs. human operation:
- Sub-second, uniform intervals → scripted scanner or bot.
- Variable intervals with human-range pauses (230 s) → interactive
operator.
- Long gaps between sessions with consistent inter-session intervals →
scheduled beacon or cron-driven implant.
These are captured as attribution primitives (`interarrival_*`) via the
BEHAVE-SHELL profiler and as raw timestamps on `bounty` rows.
### Phase sequence
The correlator classifies each event into an engagement phase:
`reconnaisance`, `exploitation`, `post-exploitation`, `exfiltration`,
`persistence`, `lateral movement`. The sequence of phases across a
session is a fingerprint in itself — some toolkits always run
reconnaissance before exploitation; human operators often skip phases or
return to earlier ones.
Phase-sequence analysis drives the `phase_sequence` attribution primitive
and feeds the campaign clusterer.
---
## Where fingerprints are stored
Every fingerprint event produces a `bounty` row:
| Bounty `fingerprint_type` | Source | Key discriminating fields |
|---|---|---|
| `ja3` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `ciphers` |
| `ja3` / `ja3s` / `ja4` / `ja4s` | Sniffer | `hash`, `tls_version`, `sni`, `raw_ciphers` |
| `ja4l` | Sniffer | `rtt_ms`, `client_ttl` |
| `ja4_quic` | Sniffer | `ja4_quic`, `sni`, `alpn` |
| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale` |
| `jarm` | Prober | `jarm_hash`, `port` |
| `hassh` | Prober | `hassh_server`, `port` |
| `tcpfp` | Prober | `tcp_fp_hash`, `port` |
| `tls_certificate` | Sniffer + prober | `cert_sha256`, `subject_cn`, `sans` |
| `tls_resumption` | Sniffer | `mechanisms` |
| `tcp_os` | Sniffer | `os_guess`, `mss`, `window_scale`, `options_order` |
| `jarm` | Prober | `hash`, `target_port` |
| `hassh_server` | Prober | `hash`, `ssh_banner`, `kex_algorithms` |
| `tcpfp` | Prober | `hash`, `ttl`, `window_size`, `df_bit` |
| `ja4h` | Caddy fp module | `ja4h`, `protocol`, `method`, `remote_port` |
| `http2_settings` | Caddy fp module | `settings`, `frame_order`, `remote_port` |
| `http3_settings` | Caddy fp module | `settings`, `remote_port` |
| `http_useragent` | Ingester (HTTP events) | `category`, `tool`, `signals` |
| `http_header_quirks` | Ingester (HTTP events) | `headers_ordered` |
| `vnc_client_version` | Ingester (VNC events) | `value` |
Bounties are deduplicated per `(attacker_uuid, fingerprint_type, hash)` so
repeated connections from the same attacker produce one row, not thousands.
repeated connections produce one row, not thousands.
Non-fingerprint bounty types: `ip_leak`, `spoofed_source`, `artifact`
(captured files and emails), `credential` (harvested secrets).
---
## Enabling inline HTTP fingerprinting
The Caddy fp module is **built into the `http` and `https` decky templates
automatically** — no extra configuration is needed. The module activates
when the template is deployed.
automatically** — no configuration is needed. For HTTP/3, ensure `http/3`
is listed in the service's `http_versions` setting.
For HTTP/3, ensure `http/3` is listed in the service's `http_versions`
setting. Caddy's native h3 stack handles UDP/443; the fp module hooks into
it via the `http3.Settingser` interface.
SSH/Telnet keystroke dynamics require the `behave_shell` feature to be
enabled on the service (see [Service-Personas](Service-Personas)).
---
## Related pages
- [Identity-Resolution](Identity-Resolution) — how fingerprints are
clustered into attacker identities
clustered into attacker identities and campaigns
- [OS-Fingerprint-Spoofing](OS-Fingerprint-Spoofing) — how DECNET spoofs
*its own* OS fingerprint to look like the target OS
- [Security-and-Stealth](Security-and-Stealth) — probe stealth measures
- [Logging-and-Syslog](Logging-and-Syslog) — how fp socket records flow
through syslog_bridge to the collector
- [Service-Personas](Service-Personas) — configuring BEHAVE-SHELL and
session recording per service