2 Commits

Author SHA1 Message Date
75af00c9c8 test(clustering): full-bound passes through production campaign clusterer
Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.

Three substrate fixes surfaced by the ratchet:

- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
  into cohort_weight to prevent fleet-scarcity false-merges (F1's
  shared_wordlist failure mode). Tier weight raised to 1.0 so
  shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
  started_at + duration_s for session_windows and per-decky
  timestamps (the production-row adapter still uses start_ts/end_ts
  when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
  vpn_hopping.yaml's (same TLS extension list). The collision
  fused two unrelated campaigns under the chained identity layer
  in the noise_floor composite. Made paused's JA3 distinct.

Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
2026-04-26 09:13:59 -04:00
7021fda0e6 test(clustering): fixture 6 noise_floor (composite + cross-corpus)
Bundles all five prior fixtures' campaigns into one corpus alongside
10 fresh Delivery-only noise scanners (on top of lone_wolf's 8
inherited). The fixture covers cross-corpus interference — signal
collisions across fixtures' JA3/HASSH/C2 strings, factory ID re-use,
clusterer ambiguity that only manifests when multiple campaigns
score together. Each constituent fixture already ships its own
in-fixture adversarial test; this one is the control for the class
of failures that single-corpus fixtures cannot catch.

Composition is declared via a fixture-6-specific include_fixtures
block in noise_floor.yaml. The test file's loader expands it into
a full corpus.campaigns spec at runtime so the factory itself stays
unaware — no factory primitive added for what only this fixture
needs. The 8 noise scanners declared by lone_wolf flow through
naturally; the extra_noise_scanners count adds 10 more.

composite_signals_clusterer (added in the fixture-5 commit) is the
pass clusterer — union-find combining (ja3, hassh) match OR
overlapping C2 callback. Approximates the planned similarity graph
well enough that every campaign resolves and every singleton stays
singleton in the merged corpus.

Three tests: corpus integrity (every campaign id present, 12
campaign-driven attackers + 18 noise = 30 total), pipeline pass
against the global bounds, and an explicit singleton-recall
assertion (21 truth-singletons — 1 lone wolf, 18 noise, 2
shared_wordlist actors whose campaigns are size 1 — all kept
singleton by the composite clusterer). Singleton recall is the
load-bearing metric here: noise absorption is the failure mode
that makes campaign attribution useless in practice.
2026-04-26 07:49:36 -04:00