feat(clustering): UKC phase enum + synthetic campaign factory + metric harness
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.
* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
with future MITRE ATT&CK tagging so synthetic data and runtime phase
inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
generator emitting truth-labeled SyntheticAttacker / SyntheticSession
records. Validates phase names, warns on unobservable phases, supports
multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
completeness / singleton_recall (no sklearn dep). Decided before any
algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
from the design doc; simplest of the six, exercises the full pipeline
with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
(concurrent phases, multi-actor per phase) deferred post-v1.
This commit is contained in:
17
tests/fixtures/campaigns/lone_wolf.expected.yaml
vendored
Normal file
17
tests/fixtures/campaigns/lone_wolf.expected.yaml
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
# Bounds for fixture 3 (lone_wolf).
|
||||
#
|
||||
# Every actor in this fixture is a singleton (the wolf itself, plus
|
||||
# every background-noise scanner). A correct clusterer puts each in
|
||||
# its own cluster; that's a perfect score.
|
||||
#
|
||||
# Bounds are deliberately loose at first — we ratchet them up as the
|
||||
# algorithm matures. Loosening any bound to make CI pass requires
|
||||
# justification in the PR description (per CAMPAIGN_CLUSTERING.md §2).
|
||||
adjusted_rand_index:
|
||||
min: 0.85
|
||||
homogeneity:
|
||||
min: 0.90
|
||||
completeness:
|
||||
min: 0.80
|
||||
singleton_recall:
|
||||
min: 0.95
|
||||
32
tests/fixtures/campaigns/lone_wolf.yaml
vendored
Normal file
32
tests/fixtures/campaigns/lone_wolf.yaml
vendored
Normal file
@@ -0,0 +1,32 @@
|
||||
# Fixture 3 (lone_wolf) — see development/CAMPAIGN_CLUSTERING.md §2.
|
||||
#
|
||||
# One opportunistic scanner, Delivery phase only, no follow-up, no shared
|
||||
# signals with anyone else. Surrounded by background noise. The clusterer
|
||||
# must keep the wolf and every noise scanner as their own singleton —
|
||||
# none should be absorbed into anyone else.
|
||||
#
|
||||
# This is the simplest of the six fixtures and exists primarily to prove
|
||||
# the end-to-end pipeline (DSL → factory → clusterer → metrics) before
|
||||
# we invest in the harder scenarios.
|
||||
corpus:
|
||||
campaigns:
|
||||
- campaign:
|
||||
id: lone-wolf-001
|
||||
actors:
|
||||
- id: wolf-a
|
||||
asn: 14061
|
||||
ip_pool: sticky
|
||||
ja3: null
|
||||
hassh: null
|
||||
hours_active_utc: [3, 4, 5]
|
||||
jitter_seconds: 30
|
||||
phases:
|
||||
- name: delivery
|
||||
actor: wolf-a
|
||||
target_selector:
|
||||
service: any
|
||||
count: 1
|
||||
dwell_seconds: 1
|
||||
duration_days: 1
|
||||
noise:
|
||||
scanner_count: 8
|
||||
Reference in New Issue
Block a user