Commit Graph

4 Commits

Author SHA1 Message Date
304592abfe test(clustering): fixture 4 paused_campaign + active_days/time_window
Adds the actor.active_days primitive to the campaign factory so a
DSL actor can be bound to specific day indexes. Falls back to the
non-paused day pool when absent (existing fixtures unchanged).
Intersects with pause_windows so the campaign-wide silence still
wins if both are set.

Adds time_window_clusterer reference to fixture_harness — union-find
over attackers, edge if their session time-ranges are within
gap_days of each other. Deliberately-bad reference for fixture 4:
multi-day silent stretches fragment a single campaign because the
clusterer has no signal that bridges the gap.

Fixture 4 (paused_campaign): one campaign modeled as two DSL actors
representing the operator's two operational windows (active days
1-2 and 6-7), separated by a silent stretch (days 3-5). Both share
JA3 + HASSH + payload + C2 callback; only their active_days differ.

Five tests: corpus shape (rows in their windows, shared signals),
pipeline pass via fingerprint_clusterer at level=campaign,
adversarial fragmentation via time_window_clusterer (1-day union
threshold cannot bridge the 4-day silence → completeness collapses),
huge-gap sanity (gap_days=10 unions both halves), silent-stretch
invariant (no session leaks into the configured pause window).

Identity-level scoring is fixture 2's job; this fixture is
campaign-level only — modeling caveat documented in the YAML.
2026-04-26 07:39:46 -04:00
0def6f7e37 test(clustering): fixture 2 vpn_hopping + fingerprint/asn references
One campaign, one DSL actor, ip_pool: rotating + rotation_count: 5
across 5 synthetic private-use ASNs (RFC 6996 64512-64516). Stable
JA3, HASSH, and payload_hash across every rotation — these are the
"signals the attacker can't cheaply rotate" per IDENTITY_RESOLUTION.md
and the load-bearing reason all 5 observation rows must resolve to
one identity / one campaign.

Two new reference clusterers in fixture_harness.py:

* fingerprint_clusterer — groups by (ja3, hassh). Un-fingerprinted
  rows stay singleton so it doesn't trivially fuse all noise into one
  mega-cluster. Approximates the stable-signal arm of the planned
  similarity graph.

* asn_clusterer — deliberately-bad reference for fixture 2's
  adversarial test. Group-by-ASN shatters the campaign into 5
  singletons; completeness collapses to 0.

Four tests in test_vpn_hopping_fixture.py: corpus shape (5 rows, 1
identity, 1 campaign, 5 distinct ASNs/IPs, stable fingerprints),
pass at campaign level, pass at identity level (asserts ARI exactly
1.0), asn_clusterer breaches the completeness floor.
2026-04-26 07:34:18 -04:00
e80f3eec54 test(clustering): fixture 1 (shared_wordlist) + fixture-harness extraction
Two campaigns sharing a credential wordlist; everything else (ASN, IPs,
JA3, HASSH, active hours) divergent. Pass condition: clusterer must NOT
merge. Protects against the "credential overlap is identity" failure
mode that commodity wordlists invite.

* tests/clustering/fixture_harness.py — shared assert_fixture_bounds
  helper + identity_clusterer (placeholder, trivially correct on
  all-singleton fixtures) + credential_jaccard_clusterer (deliberately-
  bad reference used to PROVE the fixture catches what it should).
* tests/clustering/test_shared_wordlist_fixture.py — bounds pass with
  identity, bounds FAIL (homogeneity → 0) with the bad credential
  clusterer. The latter is the proof the fixture earns its keep.
* tests/fixtures/campaigns/shared_wordlist.{yaml,expected.yaml}.
* tests/clustering/test_lone_wolf_fixture.py — refactored onto the
  shared harness. No behavior change.
2026-04-26 06:38:17 -04:00
00254629f8 feat(clustering): UKC phase enum + synthetic campaign factory + metric harness
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.

* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
  stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
  with future MITRE ATT&CK tagging so synthetic data and runtime phase
  inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
  generator emitting truth-labeled SyntheticAttacker / SyntheticSession
  records. Validates phase names, warns on unobservable phases, supports
  multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
  completeness / singleton_recall (no sklearn dep). Decided before any
  algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
  from the design doc; simplest of the six, exercises the full pipeline
  with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
  (concurrent phases, multi-actor per phase) deferred post-v1.
2026-04-26 06:29:10 -04:00