test(clustering): fixture 5 multi_operator + c2/shift/composite refs

Three new reference clusterers in fixture_harness: * c2_callback_clusterer — union-find on overlapping C2 callback sets across an attacker's sessions. Pass-clusterer for fixture 5 where two operators with distinct tooling share a C2 endpoint as the campaign signal. * shift_clusterer — deliberately-bad reference that buckets attackers by majority session-start hour into night/day/swing. Adversarial reference for fixture 5; proves operational schedule is NOT a campaign signal. * composite_signals_clusterer — union-find combining (ja3, hassh) match OR overlapping C2 callback. Will serve as the pass- clusterer for fixture 6 (noise_floor) where multiple campaigns with heterogeneous signal types are scored together. Also factored a small _union_find helper for the new clusterers (existing time_window/credential_jaccard left untouched to avoid mixing refactor with feature work). Fixture 5 (multi_operator): one campaign, two operators with distinct UKC roles. Actor A (broker, night shift): Delivery → Exploitation → Persistence → C2. Actor B (post-ex, day shift): Discovery → Lateral Movement → Collection → Exfiltration. Distinct JA3/HASSH/ASN/IPs; shared C2 + payload hash. Four tests: corpus shape (distinct fingerprints, shared C2, disjoint shifts), pipeline pass via c2_callback_clusterer, explicit harness sanity that fingerprint_clusterer cannot resolve this fixture (documents which signal carries the campaign), and adversarial shift_clusterer fragmentation. Phase-handoff edges (the real load-bearing signal per the design doc) wait for the production clusterer; this fixture will prove they're needed when it ships.
2026-04-26 07:46:14 -04:00
parent 304592abfe
commit 27f7de9886
4 changed files with 428 additions and 0 deletions
--- a/tests/fixtures/campaigns/multi_operator.expected.yaml
+++ b/tests/fixtures/campaigns/multi_operator.expected.yaml
@@ -0,0 +1,25 @@
+# Bounds for fixture 5 (multi_operator).
+#
+# Ground truth at campaign-level: 1 campaign of 2 observation rows
+# (one per DSL actor). A correct algorithm scores 1.0 across every
+# metric on this fixture.
+#
+# Completeness is the load-bearing metric: a clusterer that splits
+# the two operators by shift / by tooling / by ASN tanks
+# completeness (the one true class is split across two predicted
+# clusters). The adversarial shift_clusterer demonstrates this and
+# the bound below rejects it.
+#
+# Campaign-level fixture only — the two DSL actors model two
+# distinct identities (different tooling, different operators) by
+# design. See the YAML header for the modeling note.
+#
+# Bounds are loose at v1; tighten as the algorithm matures.
+adjusted_rand_index:
+  min: 0.85
+homogeneity:
+  min: 0.90
+completeness:
+  min: 0.80
+singleton_recall:
+  min: 0.95