Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.
Three substrate fixes surfaced by the ratchet:
- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
into cohort_weight to prevent fleet-scarcity false-merges (F1's
shared_wordlist failure mode). Tier weight raised to 1.0 so
shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
started_at + duration_s for session_windows and per-decky
timestamps (the production-row adapter still uses start_ts/end_ts
when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
vpn_hopping.yaml's (same TLS extension list). The collision
fused two unrelated campaigns under the chained identity layer
in the noise_floor composite. Made paused's JA3 distinct.
Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
Three new reference clusterers in fixture_harness:
* c2_callback_clusterer — union-find on overlapping C2 callback
sets across an attacker's sessions. Pass-clusterer for fixture 5
where two operators with distinct tooling share a C2 endpoint as
the campaign signal.
* shift_clusterer — deliberately-bad reference that buckets
attackers by majority session-start hour into night/day/swing.
Adversarial reference for fixture 5; proves operational schedule
is NOT a campaign signal.
* composite_signals_clusterer — union-find combining (ja3, hassh)
match OR overlapping C2 callback. Will serve as the pass-
clusterer for fixture 6 (noise_floor) where multiple campaigns
with heterogeneous signal types are scored together.
Also factored a small _union_find helper for the new clusterers
(existing time_window/credential_jaccard left untouched to avoid
mixing refactor with feature work).
Fixture 5 (multi_operator): one campaign, two operators with
distinct UKC roles. Actor A (broker, night shift): Delivery →
Exploitation → Persistence → C2. Actor B (post-ex, day shift):
Discovery → Lateral Movement → Collection → Exfiltration.
Distinct JA3/HASSH/ASN/IPs; shared C2 + payload hash.
Four tests: corpus shape (distinct fingerprints, shared C2,
disjoint shifts), pipeline pass via c2_callback_clusterer,
explicit harness sanity that fingerprint_clusterer cannot resolve
this fixture (documents which signal carries the campaign), and
adversarial shift_clusterer fragmentation.
Phase-handoff edges (the real load-bearing signal per the design
doc) wait for the production clusterer; this fixture will prove
they're needed when it ships.