3 Commits

Author SHA1 Message Date
304592abfe test(clustering): fixture 4 paused_campaign + active_days/time_window
Adds the actor.active_days primitive to the campaign factory so a
DSL actor can be bound to specific day indexes. Falls back to the
non-paused day pool when absent (existing fixtures unchanged).
Intersects with pause_windows so the campaign-wide silence still
wins if both are set.

Adds time_window_clusterer reference to fixture_harness — union-find
over attackers, edge if their session time-ranges are within
gap_days of each other. Deliberately-bad reference for fixture 4:
multi-day silent stretches fragment a single campaign because the
clusterer has no signal that bridges the gap.

Fixture 4 (paused_campaign): one campaign modeled as two DSL actors
representing the operator's two operational windows (active days
1-2 and 6-7), separated by a silent stretch (days 3-5). Both share
JA3 + HASSH + payload + C2 callback; only their active_days differ.

Five tests: corpus shape (rows in their windows, shared signals),
pipeline pass via fingerprint_clusterer at level=campaign,
adversarial fragmentation via time_window_clusterer (1-day union
threshold cannot bridge the 4-day silence → completeness collapses),
huge-gap sanity (gap_days=10 unions both halves), silent-stretch
invariant (no session leaks into the configured pause window).

Identity-level scoring is fixture 2's job; this fixture is
campaign-level only — modeling caveat documented in the YAML.
2026-04-26 07:39:46 -04:00
f6b83755eb test(clustering): factory honors ip_pool: rotating + 3-level truth labels
Fifth and final commit of the identity-resolution substrate. Unblocks
fixture 2 (vpn_hopping) by making the synthetic factory match
production shape: an actor rotating across N IPs produces N
SyntheticAttacker rows that share fingerprints + truth_identity_id but
differ on ip / asn — exactly the shape the future clusterer needs to
recover via JA3/HASSH match.

Factory:
* SyntheticSession + SyntheticAttacker gain truth_identity_id field.
* DSL: ip_pool: rotating + rotation_count: N produces N observation
  rows per actor. Optional rotation_asns: [...] cycles ASN per row;
  defaults to the actor's primary asn.
* Sessions distribute round-robin across the actor's rotated rows.
* Noise scanners get truth_identity_id == truth_actor_id ==
  truth_campaign_id (each is its own singleton at every level).
* GeneratedCorpus.truth_labels(level=) accepts "campaign" (default,
  back-compat), "identity", or "actor" — picks the oracle the
  metric harness scores against.

Harness:
* assert_fixture_bounds gains truth_level kwarg (default "campaign")
  so identity-resolution fixtures can score against truth_identity_id
  without churning the campaign-clustering test files.

Tests: 9 new (rotation_count emits N rows, shared identity +
fingerprints, distinct IPs, rotation_asns distribution + cycling,
round-robin session distribution, identity-level truth labels,
sticky default unchanged, sessions inherit identity label).
598 tests green across clustering / factories / db / web / bus /
profiler / correlation.
2026-04-26 07:19:39 -04:00
00254629f8 feat(clustering): UKC phase enum + synthetic campaign factory + metric harness
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.

* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
  stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
  with future MITRE ATT&CK tagging so synthetic data and runtime phase
  inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
  generator emitting truth-labeled SyntheticAttacker / SyntheticSession
  records. Validates phase names, warns on unobservable phases, supports
  multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
  completeness / singleton_recall (no sklearn dep). Decided before any
  algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
  from the design doc; simplest of the six, exercises the full pipeline
  with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
  (concurrent phases, multi-actor per phase) deferred post-v1.
2026-04-26 06:29:10 -04:00