Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.
* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
with future MITRE ATT&CK tagging so synthetic data and runtime phase
inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
generator emitting truth-labeled SyntheticAttacker / SyntheticSession
records. Validates phase names, warns on unobservable phases, supports
multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
completeness / singleton_recall (no sklearn dep). Decided before any
algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
from the design doc; simplest of the six, exercises the full pipeline
with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
(concurrent phases, multi-actor per phase) deferred post-v1.
18 lines
542 B
YAML
18 lines
542 B
YAML
# Bounds for fixture 3 (lone_wolf).
|
|
#
|
|
# Every actor in this fixture is a singleton (the wolf itself, plus
|
|
# every background-noise scanner). A correct clusterer puts each in
|
|
# its own cluster; that's a perfect score.
|
|
#
|
|
# Bounds are deliberately loose at first — we ratchet them up as the
|
|
# algorithm matures. Loosening any bound to make CI pass requires
|
|
# justification in the PR description (per CAMPAIGN_CLUSTERING.md §2).
|
|
adjusted_rand_index:
|
|
min: 0.85
|
|
homogeneity:
|
|
min: 0.90
|
|
completeness:
|
|
min: 0.80
|
|
singleton_recall:
|
|
min: 0.95
|