Files
DECNET/development/HARDENING.md
anti 62a67f3d1d docs(HARDENING): rewrite roadmap based on live scan findings
Phase 1 is complete. Live testing revealed:
- Window size (64240) is already correct — Phase 2 window mangling unnecessary
- TI=Z (IP ID = 0) is the single remaining blocker for Windows spoofing
- ip_no_pmtu_disc does NOT fix TI=Z (tested and confirmed)

Revised phase plan:
- Phase 2: ICMP tuning (icmp_ratelimit + icmp_ratemask sysctls)
- Phase 3: NFQUEUE daemon for IP ID rewriting (fixes TI=Z)
- Phase 4: diminishing returns, not recommended

Added detailed NFQUEUE architecture, TCPOPTSTRIP notes, and
note clarifying P= field in nmap output.
2026-04-10 16:38:27 -04:00

249 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OS Fingerprint Spoofing — Hardening Roadmap
This document describes the current state of OS fingerprint spoofing in DECNET
and the planned improvements to make `nmap -O`, `p0f`, and similar passive/active
scanners see the intended OS rather than a generic Linux kernel.
---
## Current State (Post-Phase 1)
Phase 1 is **implemented and tested against live scans**. Each archetype declares
an `nmap_os` slug (e.g. `"windows"`, `"linux"`, `"embedded"`). The **composer**
resolves that slug via `os_fingerprint.get_os_sysctls()` and injects the resulting
kernel parameters into the **base container** as Docker `sysctls`. Service
containers inherit the same network namespace via `network_mode: "service:<base>"`
and therefore appear identical to outside scanners.
### Implemented sysctls (8 per OS profile)
| Sysctl | Purpose | Win | Linux | Embedded |
|---|---|---|---|---|
| `net.ipv4.ip_default_ttl` | TTL discriminator | `128` | `64` | `255` |
| `net.ipv4.tcp_syn_retries` | SYN retransmit count | `2` | `6` | `3` |
| `net.ipv4.tcp_timestamps` | TCP timestamp option (OPS probes) | `0` | `1` | `0` |
| `net.ipv4.tcp_window_scaling` | Window scale option | `1` | `1` | `0` |
| `net.ipv4.tcp_sack` | Selective ACK option | `1` | `1` | `0` |
| `net.ipv4.tcp_ecn` | ECN negotiation | `0` | `2` | `0` |
| `net.ipv4.ip_no_pmtu_disc` | DF bit in ICMP replies | `0` | `0` | `1` |
| `net.ipv4.tcp_fin_timeout` | FIN_WAIT_2 timeout (seconds) | `30` | `60` | `15` |
### Live scan results (Windows decky, 2026-04-10)
**What works:**
| nmap field | Expected | Got | Status |
|---|---|---|---|
| TTL (`T=`) | `80` (128 dec) | `T=80` | ✅ |
| TCP timestamps (`TS=`) | `U` (unsupported) | `TS=U` | ✅ |
| ECN (`CC=`) | `N` | `CC=N` | ✅ |
| TCP window (`W1=`) | `FAF0` (64240) | `W1=FAF0` | ✅ |
| Window options (`O1=`) | `M5B4NNSNWA` | `O1=M5B4NNSNWA` | ✅ |
| SACK | present | present | ✅ |
| DF bit | `DF=Y` | `DF=Y` | ✅ |
**What fails:**
| nmap field | Expected (Win) | Got | Impact |
|---|---|---|---|
| IP ID (`TI=`) | `I` (incremental) | `Z` (all zeros) | **Critical** — no Windows fingerprint in nmap's DB has `TI=Z`. This alone causes 91% confidence "Linux 2.4/2.6 embedded" |
| ICMP rate limiting | unlimited | Linux default rate | Minor — affects `IE`/`U1` probe groups |
**Key finding:** `TI=Z` is the **single remaining blocker** for a convincing
Windows fingerprint. Everything else (TTL, window, timestamps, ECN, SACK, DF)
is already correct. The Phase 2 window mangling originally planned is
**unnecessary** — the kernel already produces the correct 64240 value.
---
## Remaining Improvement Phases
### Phase 2 — ICMP Tuning via Sysctls (Low effort, Medium impact)
Two additional namespace-scoped sysctls control ICMP error rate limiting.
nmap's `IE` and `U1` probe groups measure how quickly the target responds to
ICMP and UDP-to-closed-port probes.
**Changes required:** add to `OS_SYSCTLS` in `decnet/os_fingerprint.py`.
| Sysctl | What it controls | Windows | Linux | Embedded |
|---|---|---|---|---|
| `net.ipv4.icmp_ratelimit` | Minimum ms between ICMP error messages | `0` (none) | `1000` (1/sec) | `1000` |
| `net.ipv4.icmp_ratemask` | Bitmask of ICMP types subject to rate limiting | `0` | `6168` | `6168` |
**Why:** Windows does not rate-limit ICMP error responses. Linux defaults to
1000ms between ICMP errors (effectively 1 per second per destination). When
nmap sends rapid-fire UDP probes to closed ports, a Windows machine replies to
all of them instantly while a Linux machine throttles responses. Setting
`icmp_ratelimit=0` for Windows makes the `U1` probe response timing match.
**Estimated effort:** 15 min — same pattern as Phase 1, just two more entries.
---
### Phase 3 — NFQUEUE IP ID Rewriting (Medium effort, Very high impact)
This is the **highest-priority remaining item** and the only way to fix `TI=Z`.
#### Root cause of `TI=Z`
The Linux kernel's `ip_select_ident()` function sets the IP Identification
field to `0` for all TCP packets where DF=1 (don't-fragment bit set). This is
correct behavior per RFC 6864 ("IP ID is meaningless when DF=1") but no Windows
fingerprint in nmap's database has `TI=Z`. **No namespace-scoped sysctl can
change this** — it's hardcoded in the kernel's TCP stack.
Note: `ip_no_pmtu_disc` does NOT fix this. That sysctl controls Path MTU
Discovery for UDP/ICMP paths only, not TCP IP ID generation. Setting it to 1
for Windows was tested and confirmed to have no effect on `TI=Z`.
#### Solution: NFQUEUE userspace packet rewriting
Use `iptables -t mangle` to send outgoing TCP packets to an NFQUEUE, where a
small Python daemon rewrites the IP ID field before release.
```
┌──────────────────────────┐
TCP SYN-ACK ───► │ iptables mangle/OUTPUT │
│ -j NFQUEUE --queue-num 0 │
└───────────┬──────────────┘
┌──────────────────────────┐
│ Python NFQUEUE daemon │
│ 1. Read IP ID field │
│ 2. Replace with target │
│ pattern (sequential │
│ for Windows, zero │
│ for embedded, etc.) │
│ 3. Recalculate checksum │
│ 4. Accept packet │
└───────────┬──────────────┘
Packet goes out
```
**Target IP ID patterns by OS:**
| OS | nmap label | Pattern | Implementation |
|---|---|---|---|
| Windows | `TI=I` | Sequential, incrementing by 1 per packet | Global atomic counter |
| Linux 3.x+ | `TI=Z` | Zero (DF=1) or randomized | Leave untouched (already correct) |
| Embedded/Cisco | `TI=I` or `TI=Z` | Varies by device | Sequential or zero |
| BSD | `TI=RI` | Randomized incremental | Counter + small random delta |
**Two possible approaches:**
1. **TCPOPTSTRIP + NFQUEUE (comprehensive)**
- `TCPOPTSTRIP` can strip/modify TCP options (window scale, SACK, etc.)
via pure iptables rules, no userspace needed
- `NFQUEUE` handles IP-layer rewriting (IP ID) in userspace
- Combined: full control over the TCP/IP fingerprint
2. **NFQUEUE only (simpler)**
- Single Python daemon handles everything: IP ID rewriting, and optionally
TCP option/window manipulation if ever needed
- Fewer moving parts, one daemon to monitor
**Required changes:**
- `templates/base/Dockerfile` — new, installs `iptables` + `python3-netfilterqueue`
- `templates/base/entrypoint.sh` — new, sets up iptables rules + launches daemon
- `templates/base/nfq_spoofer.py` — new, the NFQUEUE packet rewriting daemon
- `os_fingerprint.py` — add `ip_id_pattern` field to each OS profile
- `composer.py` — pass `SPOOF_IP_ID` env var + use `templates/base/Dockerfile`
instead of bare distro images for base containers
**Dependencies on the host kernel:**
- `nfnetlink_queue` module (`modprobe nfnetlink_queue`)
- `xt_NFQUEUE` module (standard in all distro kernels)
- `NET_ADMIN` capability (already granted)
**Dependencies in the base container image:**
- `iptables` package
- `python3` + `python3-netfilterqueue` (or `scapy` with `NetfilterQueue`)
**Estimated effort:** 46 hours + tests
---
### Phase 4 — Full Fingerprint Database Matching (Hard, Low marginal impact)
After Phases 23, the remaining fingerprint differences are increasingly minor:
| Signal | Current | Notes |
|---|---|---|
| TCP initial sequence number (ISN) pattern (`SP=`, `ISR=`) | Linux kernel default | Kernel-level, not spoofable without userspace TCP |
| TCP window variance across probes | Constant (`FAF0` × 6) | Real Windows sometimes varies slightly |
| T2/T3 responses | `R=N` (no response) | Correct for some Windows, wrong for others |
| ICMP data payload echo | Linux default | Difficult to control per-namespace |
These are diminishing returns. With Phases 13 complete, `nmap -O` should
correctly identify the OS family in >90% of scans.
> Phase 4 is **not recommended** for the near term. Effort is measured in days
> for single-digit percentage improvements.
---
## Implementation Priority (revised)
```
Phase 1 ✅ DONE ─────────────────────────────
└─ 8 sysctls per OS in os_fingerprint.py
└─ Verified: TTL, window, timestamps, ECN, SACK all correct
Phase 2 ──────────────────────────────── (implement next)
└─ 2 more sysctls: icmp_ratelimit + icmp_ratemask
└─ Estimated effort: 15 min
Phase 3 ──────────────────────────────── (high priority)
└─ NFQUEUE daemon in templates/base/
└─ Fix TI=Z for Windows (THE remaining blocker)
└─ Estimated effort: 46 hours + tests
Phase 4 ──────────────────────────────── (not recommended)
└─ ISN pattern, T2/T3, ICMP payload echo
└─ Estimated effort: days, diminishing returns
```
---
## Testing Strategy
After each phase, validate with:
```bash
# Active OS fingerprint scan against a deployed decky
sudo nmap -O --osscan-guess <decky_ip>
# Aggressive scan with version detection
sudo nmap -sV -O -A --osscan-guess <decky_ip>
# Passive fingerprinting (run on host while generating traffic to decky)
sudo p0f -i <macvlan_interface> -p
# Quick TTL + window check
hping3 -S -p 445 <decky_ip> # inspect TTL and window in reply
# Test INI (all OS families, 10 deckies)
sudo .venv/bin/decnet deploy --config arche-test.ini --interface eth0
```
### Expected outcomes by phase
| Check | Pre-Phase 1 | Post-Phase 1 ✅ | Post-Phase 2 | Post-Phase 3 |
|---|---|---|---|---|
| TTL | ✅ | ✅ | ✅ | ✅ |
| TCP timestamps | ❌ | ✅ | ✅ | ✅ |
| TCP window size | ❌ | ✅ (kernel default OK) | ✅ | ✅ |
| ECN | ❌ | ✅ | ✅ | ✅ |
| ICMP rate limiting | ❌ | ❌ | ✅ | ✅ |
| IP ID sequence (`TI=`) | ❌ | ❌ | ❌ | ✅ |
| `nmap -O` family match | ⚠️ | ⚠️ (TI=Z blocks) | ⚠️ | ✅ |
| `p0f` match | ⚠️ | ⚠️ | ✅ | ✅ |
### Note on `P=` field in nmap output
The `P=x86_64-redhat-linux-gnu` that appears in the `SCAN(...)` block is the
**GNU build triple of the nmap binary itself**, not a fingerprint of the target.
It cannot be changed and is not relevant to OS spoofing.