feat(attackers): XFF mismatch detection — attacker IP leak bounties

Attackers routinely front their scanners with VPNs/proxies, so the TCP source we log is the proxy egress, not the real host. But a surprising number of attacker setups are misconfigured: the proxy forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP / CDN-variant) header. From our side that's a free attribution leak. New _detect_ip_leak extractor in decnet/web/ingester.py fires at ingest time per HTTP request. Logic: 1. Require service=http, source_ip present, headers present. 2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or CIDRs) → legitimate reverse-proxy forwarding, skip. 3. Walk proxy-family headers in priority order: Forwarded (RFC 7239) → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP. 4. Extract the left-most parseable IP from the winning header. 5. If that IP differs from the TCP source → emit a bounty with bounty_type="ip_leak" carrying {source_ip, real_ip_claim, source_header, headers_seen, path, method}. Storage is the existing Bounty table — no schema change; de-dup is handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so repeat requests with the same leaked IP don't spam. AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN listing distinct real_ip_claim values; hover tooltip shows the source header + path of the most recent leak. Only shown when at least one ip_leak bounty exists. RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4, IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only IPs that actually parse. Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For mismatches". Phase 3 of the three-phase Attacker Intelligence series (phases 1: scanned-vs-interacted, 2: PTR records already shipped). DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's "revisit when verified-proxy config lands" note — same token set future rate-limit work will consume.
2026-04-24 17:39:03 -04:00
parent 5a34371009
commit 2a0c5ca410
7 changed files with 518 additions and 1 deletions
--- a/decnet/web/db/repository.py
+++ b/decnet/web/db/repository.py
@@ -257,6 +257,15 @@ class BaseRepository(ABC):
        query."""
        raise NotImplementedError

+    async def get_attacker_ip_leaks(
+        self, attacker_uuid: str
+    ) -> list[dict[str, Any]]:
+        """Return ``bounty_type='ip_leak'`` rows for the attacker, newest
+        first. Each row's payload carries the TCP source IP, the header
+        that leaked, and the claimed real IP — see the XFF-mismatch
+        extractor in ``decnet.web.ingester`` for the shape."""
+        raise NotImplementedError
+
    @abstractmethod
    async def get_session_log(self, sid: str) -> Optional[dict[str, Any]]:
        """Look up the `session_recorded` Log row for a given session UUID."""
--- a/decnet/web/db/sqlmodel_repo.py
+++ b/decnet/web/db/sqlmodel_repo.py
@@ -907,6 +907,39 @@ class SQLModelRepository(BaseRepository):
            )
            return [(svc, evt) for svc, evt in rows.all()]

+    async def get_attacker_ip_leaks(
+        self, attacker_uuid: str
+    ) -> list[dict[str, Any]]:
+        """Return ``bounty_type='ip_leak'`` rows for this attacker, newest
+        first.  Shape matches the XFF-mismatch payload emitted by the
+        ingester: keys include ``real_ip_claim``, ``source_header``,
+        ``headers_seen``, ``path``, ``method``."""
+        async with self._session() as session:
+            ip_res = await session.execute(
+                select(Attacker.ip).where(Attacker.uuid == attacker_uuid)
+            )
+            ip = ip_res.scalar_one_or_none()
+            if not ip:
+                return []
+            rows = await session.execute(
+                select(Bounty)
+                .where(Bounty.attacker_ip == ip)
+                .where(Bounty.bounty_type == "ip_leak")
+                .order_by(desc(Bounty.timestamp))
+            )
+            out: list[dict[str, Any]] = []
+            for row in rows.scalars().all():
+                rec = row.model_dump(mode="json")
+                # Bounty.payload is stored JSON-encoded; pre-decode for UX.
+                raw = rec.get("payload")
+                if isinstance(raw, str):
+                    try:
+                        rec["payload"] = json.loads(raw)
+                    except (ValueError, TypeError):
+                        rec["payload"] = {}
+                out.append(rec)
+            return out
+
    async def get_attacker_artifacts(self, uuid: str) -> list[dict[str, Any]]:
        """Return `file_captured` logs for the attacker identified by UUID.