feat(prober-cert): roll up fingerprints onto AttackerIdentity

Brings the federation-gossip columns on AttackerIdentity to life —
ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting
the union of every member observation's fingerprints JSON onto the
identity at clusterer create / link / merge time.

- decnet/profiler/identity_rollup.py: pure extract_fp_summaries()
  reads the production bounty shape (payload.fingerprint_type +
  payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON
  list[str] per family, or None when a family has no signal so the
  column stays NULL instead of '[]'.
- BaseRepository.update_identity_fingerprints + SQLModel impl: one
  idempotent write that overwrites the three summary columns and
  bumps updated_at.
- ConnectedComponentsClusterer: after every per-component
  reconciliation (fresh-create OR existing-merge+link), recomputes
  and writes the rollup for the target identity. Wrapped in a
  best-effort helper so a write failure logs but never breaks the
  tick.
- Tests: extract_fp_summaries unit (dedup, sort determinism,
  unknown types ignored, malformed JSON, nested-stringified
  payloads, non-string values); end-to-end clusterer ticks
  populate the columns on create + on later observation links;
  no-fingerprint clusters keep the columns NULL.
This commit is contained in:
2026-04-28 11:28:54 -04:00
parent 9ab43b4ea4
commit 72cc928ebf
6 changed files with 416 additions and 2 deletions

View File

@@ -41,6 +41,7 @@ from decnet.clustering.impl.similarity import (
combined_edge_weight,
)
from decnet.logging import get_logger
from decnet.profiler.identity_rollup import extract_fp_summaries
from decnet.web.db.repository import BaseRepository
log = get_logger("clustering.connected_components")
@@ -217,6 +218,9 @@ class ConnectedComponentsClusterer(Clusterer):
"identity_uuid": identity_uuid,
"observation_uuids": linked,
})
await _roll_up_fingerprints(
repo, identity_uuid, [row_by_id[m] for m in member_ids],
)
continue
# Deterministic winner so two clusterer runs produce the
@@ -250,6 +254,14 @@ class ConnectedComponentsClusterer(Clusterer):
"observation_uuid": obs_id,
})
# Re-roll the winner's fingerprint summary across every
# observation now in this component (including the loser
# side — the merge unifies their evidence even though the
# loser's identity row stays FK'd via merged_into_uuid).
await _roll_up_fingerprints(
repo, winner_uuid, [row_by_id[m] for m in member_ids],
)
# Pass 2 — revocable-merge undo. For each currently-merged-out
# identity, check whether its observations still cluster with
# the winner's. If not, the merge is contradicted by new
@@ -341,6 +353,25 @@ async def _link(
return False
async def _roll_up_fingerprints(
repo: BaseRepository,
identity_uuid: str,
member_rows: list[dict[str, Any]],
) -> None:
"""Project member observations' fingerprint blobs onto the identity's
summary columns. Best-effort: a write failure is logged but never
breaks the clusterer tick — the columns just stay stale until the
next pass."""
summaries = extract_fp_summaries(member_rows)
try:
await repo.update_identity_fingerprints(identity_uuid, **summaries)
except Exception: # noqa: BLE001
log.exception(
"clusterer: failed to roll up fingerprints for identity=%s",
identity_uuid,
)
__all__ = [
"ConnectedComponentsClusterer",
"cluster_observations",