Adds the four weight-tier edge functions as pure, time-agnostic
scoring primitives over an Observation projection. Each returns a
score in [0, 1]; the connected-components impl will combine + threshold
in subsequent commits.
Tier semantics (from IDENTITY_RESOLUTION.md):
- high — JA3/HASSH/payload-hash/C2-endpoint exact match
- medium — phase-bucketed command-sequence Jaccard
- low — credential-attempt-set Jaccard (defeated alone by F1)
- very low — ASN equality (defeated alone by F2)
Time-agnostic invariant is a static test: Observation has no time
fields, so no edge function can silently start using them. Fixture 7
forbids recency-decay clustering on multi-month APT campaigns.
A from_synthetic() adapter projects SyntheticAttacker corpora into
Observation; the production-row adapter lands when the clusterer
starts reading the attackers table.
Revocable merges (a contradiction-driven undo of identity.merged) ship
in the clusterer work; this reserves the topic up-front so identity.>
subscribers receive it day one without a re-subscribe.
The clusterer worker's ClusterResult fan-out now publishes on
identity.unmerged when populated. The skeleton clusterer never
populates it; the revocable-merge commit will.
Wiki update lives in wiki-checkout/Service-Bus.md (separate repo).
Adds the decnet clusterer master-only command + provider-subpackage
shape (base.py + factory.py + impl/connected_components.py) so
subsequent commits can land similarity-graph features without
churning callers.
The skeleton ConnectedComponentsClusterer.tick is a no-op; the
worker shell is fully wired (bus consumer on attacker.observed +
attacker.scored, slow-tick fallback, health heartbeat, control
listener, ClusterResult fan-out to identity.formed/observation.linked
/merged). Subscribers on identity.> see no events from this clusterer
until edge functions land, but the lifecycle is in place.
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.
* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
with future MITRE ATT&CK tagging so synthetic data and runtime phase
inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
generator emitting truth-labeled SyntheticAttacker / SyntheticSession
records. Validates phase names, warns on unobservable phases, supports
multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
completeness / singleton_recall (no sklearn dep). Decided before any
algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
from the design doc; simplest of the six, exercises the full pipeline
with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
(concurrent phases, multi-actor per phase) deferred post-v1.