Files
DECNET/decnet/clustering/impl/connected_components.py
anti e545f7d8d3 feat(clustering): identity clusterer worker skeleton
Adds the decnet clusterer master-only command + provider-subpackage
shape (base.py + factory.py + impl/connected_components.py) so
subsequent commits can land similarity-graph features without
churning callers.

The skeleton ConnectedComponentsClusterer.tick is a no-op; the
worker shell is fully wired (bus consumer on attacker.observed +
attacker.scored, slow-tick fallback, health heartbeat, control
listener, ClusterResult fan-out to identity.formed/observation.linked
/merged). Subscribers on identity.> see no events from this clusterer
until edge functions land, but the lifecycle is in place.
2026-04-26 08:09:11 -04:00

49 lines
1.7 KiB
Python

"""Connected-components identity clusterer (v1).
Builds a similarity graph over observations (per-IP attacker rows),
runs connected-components over edges that pass a confidence threshold,
and writes one ``attacker_identities`` row per component.
This module is the **skeleton**. The ``tick`` method is a no-op until
the similarity-graph features land in subsequent commits. Subscribers
on ``identity.>`` see no traffic from this clusterer until the edge
functions are wired in.
Subsequent commits add, in order:
1. Similarity-graph scaffolding (``impl/similarity.py``).
2. High-weight edges (JA3/JA4/HASSH/payload/C2 exact match).
3. Medium-weight edges (command-sequence Jaccard bucketed by UKC phase).
4. Phase-handoff edges (designed for fixture 5).
5. Low-weight edges (credential Jaccard, ASN) — must NOT cluster F1/F2 alone.
6. Revocable merges (``identity.merged`` / ``identity.unmerged``).
Edges MUST stay time-agnostic — fixture 7 proves recency-decay clustering
fragments multi-month APT campaigns.
"""
from __future__ import annotations
from decnet.clustering.base import Clusterer, ClusterResult
from decnet.logging import get_logger
from decnet.web.db.repository import BaseRepository
log = get_logger("clustering.connected_components")
class ConnectedComponentsClusterer(Clusterer):
"""Connected-components clusterer.
Skeleton implementation: ``tick`` is a no-op. Wiring lands in
subsequent commits.
"""
name = "connected_components"
async def tick(self, repo: BaseRepository) -> ClusterResult:
# No similarity edges defined yet; produce an empty result.
# Subsequent commits replace this with the real pass.
return ClusterResult()
__all__ = ["ConnectedComponentsClusterer"]