fix(bounties): strip per-request fields from fingerprint payloads

add_bounty dedups on (attacker_ip, bounty_type, full payload JSON).
Three fingerprint-family bounties (http_useragent, ip_leak,
http_quirks) were including method/path / header_count in their
payloads — fields that vary per request — so a scanner hitting 100
paths produced 100 rows instead of 1, which is what was swelling
AttackerDetail.

Payloads now carry identity-only fields:

- http_useragent: {fingerprint_type, value}. UA + path combinations
  no longer collide; one row per distinct User-Agent string.
- ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}.
  One row per distinct (proxy source, leaked IP, leaking header)
  triple; repeat hits with the same header on different paths dedup.
- http_quirks: {fingerprint_type, order_hash, order, casing_hash,
  casing_category, stable_count, tool_guess}. No more header_count
  (included volatile headers; Cookie-presence variance broke dedup).

Per-request context (path, method, etc.) was never load-bearing for
analysts — the logs table already answers "when + where" at
per-event resolution. The bounty table is for stable identity.

UI:
- FpHttpQuirks renderer drops the method/path footer line and the
  header_count/duplicates tags; shows stable_count instead.
- LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for
  "Leaked via X; source 203.0.113.42" — same information, stable.

Tests add a "payload stable across paths and methods" assertion on
http_quirks — locks the contract so a future regression that sneaks
a per-request field back in fails loudly.

Existing duplicate bounty rows don't retroactively collapse.
Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and
restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type,
payload) — trivial but not automated.
This commit is contained in:
2026-04-24 17:58:54 -04:00
parent dccb410bb3
commit 2c876b4d86
5 changed files with 54 additions and 39 deletions

View File

@@ -268,8 +268,9 @@ class TestGetAttackerDetail:
"source_ip": "203.0.113.42",
"real_ip_claim": "198.51.100.7",
"source_header": "X-Forwarded-For",
"path": "/wp-admin/",
"method": "GET",
"headers_seen": {
"X-Forwarded-For": "198.51.100.7",
},
},
},
]

View File

@@ -80,7 +80,9 @@ def test_different_casing_different_hash():
def test_volatile_headers_excluded_from_hash():
"""Content-Length, Cookie, XFF etc. are per-request; the identity
hash shouldn't depend on them."""
hash must not depend on them, otherwise two requests from the same
stack — one with Cookie, one without — would dedup-miss at the
bounty layer and spam the AttackerDetail page."""
row_a = _log_row({
"Host": "x", "User-Agent": "a", "Content-Length": "100",
})
@@ -90,13 +92,11 @@ def test_volatile_headers_excluded_from_hash():
})
fa = _http_quirks_fingerprint(row_a, row_a["fields"]["headers"])
fb = _http_quirks_fingerprint(row_b, row_b["fields"]["headers"])
assert fa["order_hash"] == fb["order_hash"]
# Count reflects ALL headers (the volatile ones WERE there).
assert fa["header_count"] == 3
assert fb["header_count"] == 4
# Stable count excludes the volatile ones.
# Whole payload must be identical — add_bounty dedups on the full
# serialized payload, so ANY per-request-varying field would spawn
# new rows. This assertion is the contract.
assert fa == fb
assert fa["stable_count"] == 2
assert fb["stable_count"] == 2
# ─── tool guesses ──────────────────────────────────────────────────────────
@@ -148,11 +148,10 @@ def test_empty_headers_skipped():
def test_only_volatile_headers_still_emits():
"""If every header is in the volatile set we still want a fingerprint,
just with empty order — header count alone is still a signal."""
just with empty order — "zero stable headers" is itself a signal."""
row = _log_row({"Content-Length": "10", "Cookie": "a=b"})
f = _http_quirks_fingerprint(row, row["fields"]["headers"])
assert f is not None
assert f["header_count"] == 2
assert f["stable_count"] == 0
assert f["order"] == []
@@ -198,6 +197,27 @@ async def test_extract_bounty_non_http_skips_quirks():
assert payload.get("fingerprint_type") != "http_quirks"
def test_payload_stable_across_paths_and_methods():
"""Two requests from the same stack hitting different paths/methods
must produce byte-identical payloads so (ip, type, payload) dedup
collapses them into one bounty row. If this test breaks, check
whether a per-request field snuck back into _http_quirks_fingerprint."""
headers = {"Host": "target", "User-Agent": "curl/8.0", "Accept": "*/*"}
row_get = {
"decky": "http-01", "service": "http", "attacker_ip": "1.2.3.4",
"event_type": "request",
"fields": {"method": "GET", "path": "/admin", "headers": headers},
}
row_post = {
"decky": "http-01", "service": "http", "attacker_ip": "1.2.3.4",
"event_type": "request",
"fields": {"method": "POST", "path": "/wp-login.php", "headers": headers},
}
fa = _http_quirks_fingerprint(row_get, headers)
fb = _http_quirks_fingerprint(row_post, headers)
assert fa == fb, "payload must not depend on request method/path"
# ─── hash stability across restarts ─────────────────────────────────────────
def test_short_hash_deterministic():

View File

@@ -39,7 +39,10 @@ def test_xff_leftmost_differs_from_source_emits_leak():
assert result["source_ip"] == "203.0.113.42"
assert result["real_ip_claim"] == "198.51.100.7"
assert result["source_header"] == "X-Forwarded-For"
assert result["path"] == "/wp-admin/"
# Identity-only payload — method/path intentionally omitted so the
# bounty dedup collapses repeat hits from the same attacker.
assert "method" not in result
assert "path" not in result
def test_xff_matches_source_no_leak():