test(clustering): full-bound passes through production campaign clusterer
Runs the chained identity + campaign clustering pipeline against all seven fixtures via from_synthetic / from_synthetic_identity adapters and ratchets every YAML floor to 1.0 — the production clusterer (and the reference clusterers used in the per-fixture tests) all score perfectly across ARI / homogeneity / completeness / singleton_recall on each fixture. Three substrate fixes surfaced by the ratchet: - Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved into cohort_weight to prevent fleet-scarcity false-merges (F1's shared_wordlist failure mode). Tier weight raised to 1.0 so shared payload+C2 alone crosses threshold (F5's intended pass). - Adapter: from_synthetic_identity now reads SyntheticSession started_at + duration_s for session_windows and per-decky timestamps (the production-row adapter still uses start_ts/end_ts when available). - Fixture data: paused_campaign.yaml's JA3 collided exactly with vpn_hopping.yaml's (same TLS extension list). The collision fused two unrelated campaigns under the chained identity layer in the noise_floor composite. Made paused's JA3 distinct. Also wires Campaign / CampaignsResponse into models/__init__.py's __all__ that was missed in the schema commit.
This commit is contained in:
@@ -185,21 +185,27 @@ def _directed_handoff(
|
|||||||
|
|
||||||
|
|
||||||
def shared_infra_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
def shared_infra_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
||||||
"""Jaccard over payload-hashes ∪ C2-endpoints ∪ decky-set.
|
"""Jaccard over payload-hashes ∪ C2-endpoints.
|
||||||
|
|
||||||
|
Excludes ``decky_set`` deliberately: decky overlap is a *fleet
|
||||||
|
scarcity* artifact (a small fleet means many distinct campaigns
|
||||||
|
hit the same deckies) and would fuse F1's two unrelated campaigns
|
||||||
|
on shared targeting. Payload hashes and C2 endpoints are
|
||||||
|
operational artifacts; distinct campaigns rarely share them.
|
||||||
|
|
||||||
At identity level this gets vetoed by the fingerprint-disagreement
|
At identity level this gets vetoed by the fingerprint-disagreement
|
||||||
rule (``ed32358``); at campaign level it's the *primary* positive
|
rule (``ed32358``); at campaign level it's the *primary* positive
|
||||||
signal — distinct identities sharing infra is the canonical co-op
|
signal — distinct identities sharing payload + C2 is the canonical
|
||||||
pattern. We treat all three sets as one combined alphabet so a
|
co-op pattern (F5 multi_operator).
|
||||||
single shared payload + C2 + decky add together rather than
|
|
||||||
averaging away a strong signal in one set with weak overlap in
|
|
||||||
another.
|
|
||||||
|
|
||||||
Returns Jaccard across the union of the three set families,
|
The decky-overlap signal lives in :func:`cohort_weight` instead
|
||||||
|
where its weak-tier multiplier prevents F1-style false merges.
|
||||||
|
|
||||||
|
Returns Jaccard across the union of the two set families,
|
||||||
``0.0`` when both sides are empty.
|
``0.0`` when both sides are empty.
|
||||||
"""
|
"""
|
||||||
a_set = a.payload_hashes | a.c2_endpoints | a.decky_set
|
a_set = a.payload_hashes | a.c2_endpoints
|
||||||
b_set = b.payload_hashes | b.c2_endpoints | b.decky_set
|
b_set = b.payload_hashes | b.c2_endpoints
|
||||||
if not a_set and not b_set:
|
if not a_set and not b_set:
|
||||||
return 0.0
|
return 0.0
|
||||||
union = a_set | b_set
|
union = a_set | b_set
|
||||||
@@ -246,12 +252,16 @@ def temporal_overlap_weight(
|
|||||||
|
|
||||||
|
|
||||||
def cohort_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
def cohort_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
||||||
"""ASN-cohort + tooling-cohort weak signal.
|
"""ASN-cohort + tooling-cohort + decky-overlap weak signal.
|
||||||
|
|
||||||
Jaccard over the union of ASN cohort and tooling cohort. F2's
|
Jaccard over the union of ASN cohort, tooling cohort, and decky
|
||||||
failure mode (one identity rotating across many ASNs) doesn't
|
set. F2's failure mode (one identity rotating across many ASNs)
|
||||||
apply at *campaign* level — but multiple identities cooperating
|
doesn't apply at *campaign* level — but multiple identities
|
||||||
out of the same hosting cohort is plausible co-op evidence.
|
cooperating out of the same hosting cohort is plausible co-op
|
||||||
|
evidence. Decky overlap lives here (not in :func:`shared_infra`)
|
||||||
|
because decky scarcity in a small honeypot fleet would otherwise
|
||||||
|
fuse unrelated campaigns hitting the same SSH targets (F1
|
||||||
|
shared_wordlist).
|
||||||
|
|
||||||
Weak by design: the combined-weight tier multiplier keeps this
|
Weak by design: the combined-weight tier multiplier keeps this
|
||||||
from crossing threshold alone.
|
from crossing threshold alone.
|
||||||
@@ -259,10 +269,12 @@ def cohort_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
|||||||
a_set: frozenset = frozenset(
|
a_set: frozenset = frozenset(
|
||||||
{("asn", str(x)) for x in a.asn_cohort}
|
{("asn", str(x)) for x in a.asn_cohort}
|
||||||
| {("tool", x) for x in a.tooling_cohort}
|
| {("tool", x) for x in a.tooling_cohort}
|
||||||
|
| {("decky", x) for x in a.decky_set}
|
||||||
)
|
)
|
||||||
b_set: frozenset = frozenset(
|
b_set: frozenset = frozenset(
|
||||||
{("asn", str(x)) for x in b.asn_cohort}
|
{("asn", str(x)) for x in b.asn_cohort}
|
||||||
| {("tool", x) for x in b.tooling_cohort}
|
| {("tool", x) for x in b.tooling_cohort}
|
||||||
|
| {("decky", x) for x in b.decky_set}
|
||||||
)
|
)
|
||||||
if not a_set and not b_set:
|
if not a_set and not b_set:
|
||||||
return 0.0
|
return 0.0
|
||||||
@@ -277,20 +289,24 @@ def cohort_weight(a: IdentityFeatures, b: IdentityFeatures) -> float:
|
|||||||
|
|
||||||
#: Tier multipliers for the campaign graph. Tuned so:
|
#: Tier multipliers for the campaign graph. Tuned so:
|
||||||
#:
|
#:
|
||||||
#: * Phase-handoff alone (1.0 → 1.0) crosses threshold — a clean
|
#: * Phase-handoff alone (max 1.0) crosses threshold — a clean
|
||||||
#: F5-style handoff is sufficient evidence on its own.
|
#: F5-style handoff is sufficient evidence on its own.
|
||||||
#: * Shared-infra alone (max 1.0) yields 0.7 — strong but not enough
|
#: * Shared-infra alone (max 1.0) crosses threshold — payload+C2
|
||||||
#: without supporting evidence (F1 burns the same wordlist /
|
#: overlap is the canonical co-op signal (F5 multi_operator's
|
||||||
#: different campaigns shouldn't fuse on infra alone).
|
#: intended pass condition; decky overlap was deliberately moved
|
||||||
|
#: to :func:`cohort_weight` to avoid F1's false merge on shared
|
||||||
|
#: targeting).
|
||||||
#: * Temporal overlap alone (max 1.0) yields 0.4 — supporting weight.
|
#: * Temporal overlap alone (max 1.0) yields 0.4 — supporting weight.
|
||||||
#: * Cohort alone (max 1.0) yields 0.1 — defeats F2-style failures.
|
#: * Cohort alone (max 1.0) yields 0.1 — defeats F1's shared-decky
|
||||||
|
#: failure mode and F2's rotating-ASN one.
|
||||||
#:
|
#:
|
||||||
#: Shared-infra + temporal overlap together (1.1) cross threshold —
|
#: F1 shared_wordlist: payload+C2 = ∅ on both sides → shared_infra =
|
||||||
#: the canonical co-op pattern. Shared-infra + cohort (0.8) does
|
#: 0; ASN+decky overlap fires cohort but at 0.1 stays well below
|
||||||
#: NOT — F1's wordlist-overlap-only failure mode is preserved.
|
#: threshold. F2 vpn_hopping is folded by the identity layer first,
|
||||||
|
#: so the campaign clusterer sees one identity → one campaign.
|
||||||
CAMPAIGN_TIER_WEIGHTS: dict[str, float] = {
|
CAMPAIGN_TIER_WEIGHTS: dict[str, float] = {
|
||||||
"phase_handoff": 1.0,
|
"phase_handoff": 1.0,
|
||||||
"shared_infra": 0.7,
|
"shared_infra": 1.0,
|
||||||
"temporal_overlap": 0.4,
|
"temporal_overlap": 0.4,
|
||||||
"cohort": 0.1,
|
"cohort": 0.1,
|
||||||
}
|
}
|
||||||
@@ -363,8 +379,17 @@ def from_synthetic_identity(att, identity_uuid: Optional[str] = None) -> Identit
|
|||||||
decky = getattr(s, "decky", None) or getattr(s, "decky_id", None)
|
decky = getattr(s, "decky", None) or getattr(s, "decky_id", None)
|
||||||
if decky:
|
if decky:
|
||||||
decky_set.add(decky)
|
decky_set.add(decky)
|
||||||
ts_start = getattr(s, "start_ts", None)
|
# SyntheticSession exposes ``started_at`` (datetime) +
|
||||||
ts_end = getattr(s, "end_ts", None)
|
# ``duration_s``; the production-row adapter (commit 3) gets
|
||||||
|
# ``start_ts``/``end_ts`` directly. Support both.
|
||||||
|
started_at = getattr(s, "started_at", None)
|
||||||
|
duration_s = getattr(s, "duration_s", None)
|
||||||
|
if started_at is not None:
|
||||||
|
ts_start = started_at.timestamp()
|
||||||
|
ts_end = ts_start + (float(duration_s) if duration_s else 0.0)
|
||||||
|
else:
|
||||||
|
ts_start = getattr(s, "start_ts", None)
|
||||||
|
ts_end = getattr(s, "end_ts", None)
|
||||||
if ts_start is not None and ts_end is not None:
|
if ts_start is not None and ts_end is not None:
|
||||||
session_windows.append((float(ts_start), float(ts_end)))
|
session_windows.append((float(ts_start), float(ts_end)))
|
||||||
phase_value = s.phase.value if hasattr(s, "phase") else None
|
phase_value = s.phase.value if hasattr(s, "phase") else None
|
||||||
@@ -379,6 +404,8 @@ def from_synthetic_identity(att, identity_uuid: Optional[str] = None) -> Identit
|
|||||||
last_phase_per_decky[decky] = phase_value
|
last_phase_per_decky[decky] = phase_value
|
||||||
if ts_end is not None:
|
if ts_end is not None:
|
||||||
last_seen_per_decky[decky] = float(ts_end)
|
last_seen_per_decky[decky] = float(ts_end)
|
||||||
|
elif ts_start is not None:
|
||||||
|
last_seen_per_decky[decky] = float(ts_start)
|
||||||
|
|
||||||
return IdentityFeatures(
|
return IdentityFeatures(
|
||||||
identity_uuid=identity_uuid or att.attacker_id,
|
identity_uuid=identity_uuid or att.attacker_id,
|
||||||
|
|||||||
@@ -170,6 +170,9 @@ __all__ = [
|
|||||||
"AttackersResponse",
|
"AttackersResponse",
|
||||||
"SessionProfile",
|
"SessionProfile",
|
||||||
"SmtpTarget",
|
"SmtpTarget",
|
||||||
|
# campaigns
|
||||||
|
"Campaign",
|
||||||
|
"CampaignsResponse",
|
||||||
# deploy
|
# deploy
|
||||||
"DeployIniRequest",
|
"DeployIniRequest",
|
||||||
"DeployResponse",
|
"DeployResponse",
|
||||||
|
|||||||
@@ -275,36 +275,36 @@ def test_cohort_alone_below_threshold():
|
|||||||
assert combined_campaign_weight(a, b) < CAMPAIGN_EDGE_THRESHOLD
|
assert combined_campaign_weight(a, b) < CAMPAIGN_EDGE_THRESHOLD
|
||||||
|
|
||||||
|
|
||||||
def test_shared_infra_plus_temporal_overlap_crosses_threshold():
|
def test_shared_infra_alone_crosses_threshold():
|
||||||
"""The canonical co-op pattern: shared infra during the same window."""
|
"""Shared payload + C2 alone is enough — F5's intended pass condition."""
|
||||||
a = _features(
|
a = _features(
|
||||||
"a",
|
"a",
|
||||||
payload_hashes=frozenset({"h"}),
|
payload_hashes=frozenset({"h"}),
|
||||||
c2_endpoints=frozenset({"c"}),
|
c2_endpoints=frozenset({"c"}),
|
||||||
decky_set=frozenset({"d1"}),
|
|
||||||
session_windows=((0.0, 100.0),),
|
|
||||||
)
|
)
|
||||||
b = _features(
|
b = _features(
|
||||||
"b",
|
"b",
|
||||||
payload_hashes=frozenset({"h"}),
|
payload_hashes=frozenset({"h"}),
|
||||||
c2_endpoints=frozenset({"c"}),
|
c2_endpoints=frozenset({"c"}),
|
||||||
decky_set=frozenset({"d1"}),
|
|
||||||
session_windows=((0.0, 100.0),),
|
|
||||||
)
|
)
|
||||||
assert combined_campaign_weight(a, b) >= CAMPAIGN_EDGE_THRESHOLD
|
assert combined_campaign_weight(a, b) >= CAMPAIGN_EDGE_THRESHOLD
|
||||||
|
|
||||||
|
|
||||||
def test_shared_infra_plus_cohort_below_threshold():
|
def test_decky_overlap_alone_below_threshold():
|
||||||
"""F1 shared_wordlist: shared signals minus operational overlap is NOT co-op."""
|
"""F1's failure mode: shared targeting on a small fleet is NOT co-op.
|
||||||
|
|
||||||
|
Two campaigns hitting the same SSH deckies share no payload/C2,
|
||||||
|
just the decky set. Cohort tier alone must not cross threshold.
|
||||||
|
"""
|
||||||
a = _features(
|
a = _features(
|
||||||
"a",
|
"a",
|
||||||
payload_hashes=frozenset({"h"}),
|
decky_set=frozenset({"d1", "d2"}),
|
||||||
asn_cohort=frozenset({64512}),
|
asn_cohort=frozenset({64512}),
|
||||||
)
|
)
|
||||||
b = _features(
|
b = _features(
|
||||||
"b",
|
"b",
|
||||||
payload_hashes=frozenset({"h"}),
|
decky_set=frozenset({"d1", "d2"}),
|
||||||
asn_cohort=frozenset({64512}),
|
asn_cohort=frozenset({64513}),
|
||||||
)
|
)
|
||||||
assert combined_campaign_weight(a, b) < CAMPAIGN_EDGE_THRESHOLD
|
assert combined_campaign_weight(a, b) < CAMPAIGN_EDGE_THRESHOLD
|
||||||
|
|
||||||
|
|||||||
@@ -247,17 +247,14 @@ async def test_tick_empty_db_returns_empty_result(repo):
|
|||||||
|
|
||||||
@pytest.mark.anyio
|
@pytest.mark.anyio
|
||||||
async def test_tick_forms_campaign_for_shared_infra_co_op(repo):
|
async def test_tick_forms_campaign_for_shared_infra_co_op(repo):
|
||||||
# Two identities, full shared-infra (payload + c2). Below threshold
|
"""Two identities with shared payload + C2 fold to one campaign.
|
||||||
# at identity level (and identity-side veto would block them) but at
|
|
||||||
# campaign level shared-infra alone is 0.7; need temporal overlap to
|
The canonical F5-style co-op pattern, exercised end-to-end through
|
||||||
# cross. Add overlap via session windows... but the production-row
|
the production-row adapter. ``from_identity_row`` reads
|
||||||
# adapter doesn't yet populate session_windows. So instead use a
|
``payload_simhashes`` + ``c2_endpoints`` from the AttackerIdentity
|
||||||
# full payload+c2 overlap which gives Jaccard=1.0 → 0.7. Below
|
JSON columns, builds IdentityFeatures, and the campaign weight
|
||||||
# threshold. The realistic production scenario for crossing is
|
crosses threshold on shared_infra alone.
|
||||||
# phase-handoff which the production-row adapter also doesn't yet
|
"""
|
||||||
# populate. So with the v1 production-row adapter the campaign
|
|
||||||
# clusterer's effective behavior is "every identity is its own
|
|
||||||
# campaign" — exactly the F3 lone_wolf pass. Verify that here.
|
|
||||||
await _create_identity(
|
await _create_identity(
|
||||||
repo, "i1",
|
repo, "i1",
|
||||||
payload_simhashes=json.dumps(["h1"]),
|
payload_simhashes=json.dumps(["h1"]),
|
||||||
@@ -272,15 +269,31 @@ async def test_tick_forms_campaign_for_shared_infra_co_op(repo):
|
|||||||
c = ConnectedComponentsCampaignClusterer()
|
c = ConnectedComponentsCampaignClusterer()
|
||||||
result = await c.tick(repo)
|
result = await c.tick(repo)
|
||||||
|
|
||||||
# No phase-handoff or temporal overlap available from the
|
assert len(result.campaigns_formed) == 1
|
||||||
# production-row adapter — both stay singletons.
|
formed_idents = set(result.campaigns_formed[0]["identity_uuids"])
|
||||||
assert len(result.campaigns_formed) == 2
|
|
||||||
formed_idents = {
|
|
||||||
i for entry in result.campaigns_formed for i in entry["identity_uuids"]
|
|
||||||
}
|
|
||||||
assert formed_idents == {"i1", "i2"}
|
assert formed_idents == {"i1", "i2"}
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.anyio
|
||||||
|
async def test_tick_keeps_distinct_payloads_separate(repo):
|
||||||
|
"""No payload/C2 overlap → singleton per identity."""
|
||||||
|
await _create_identity(
|
||||||
|
repo, "i1",
|
||||||
|
payload_simhashes=json.dumps(["h1"]),
|
||||||
|
c2_endpoints=json.dumps(["c1"]),
|
||||||
|
)
|
||||||
|
await _create_identity(
|
||||||
|
repo, "i2",
|
||||||
|
payload_simhashes=json.dumps(["h2"]),
|
||||||
|
c2_endpoints=json.dumps(["c2"]),
|
||||||
|
)
|
||||||
|
|
||||||
|
c = ConnectedComponentsCampaignClusterer()
|
||||||
|
result = await c.tick(repo)
|
||||||
|
|
||||||
|
assert len(result.campaigns_formed) == 2
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.anyio
|
@pytest.mark.anyio
|
||||||
async def test_tick_idempotent_links_existing_identity(repo):
|
async def test_tick_idempotent_links_existing_identity(repo):
|
||||||
"""Second tick on same input doesn't double-create campaigns."""
|
"""Second tick on same input doesn't double-create campaigns."""
|
||||||
|
|||||||
278
tests/clustering/test_fixtures_campaign_clusterer.py
Normal file
278
tests/clustering/test_fixtures_campaign_clusterer.py
Normal file
@@ -0,0 +1,278 @@
|
|||||||
|
"""Run the production campaign clusterer through all 7 fixtures.
|
||||||
|
|
||||||
|
The 7 fixtures' YAML bounds were tuned for *reference* clusterers
|
||||||
|
(``c2_callback_clusterer``, ``composite_signals_clusterer``, etc.).
|
||||||
|
The production campaign clusterer (``ConnectedComponentsCampaignClusterer``)
|
||||||
|
is the system under test now; this module asserts it meets every
|
||||||
|
existing bound, plus a few stricter per-fixture invariants where the
|
||||||
|
algorithm should — by design — score perfectly.
|
||||||
|
|
||||||
|
The pure path is what's exercised here: ``cluster_identities``
|
||||||
|
operating over ``IdentityFeatures`` projected via
|
||||||
|
``from_synthetic_identity``. Each ``SyntheticAttacker`` is treated as
|
||||||
|
one identity (identity layer is below; the campaign clusterer reads
|
||||||
|
identities). End-to-end DB-backed validation is in
|
||||||
|
``test_campaign_worker.py``.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from decnet.clustering.campaign.impl.connected_components import (
|
||||||
|
cluster_identities,
|
||||||
|
)
|
||||||
|
from decnet.clustering.campaign.impl.similarity import (
|
||||||
|
IdentityFeatures,
|
||||||
|
from_synthetic_identity,
|
||||||
|
)
|
||||||
|
from decnet.clustering.impl.connected_components import cluster_observations
|
||||||
|
from decnet.clustering.impl.similarity import from_synthetic
|
||||||
|
from tests.clustering.fixture_harness import assert_fixture_bounds
|
||||||
|
from tests.clustering.metrics import score
|
||||||
|
from tests.factories.campaign_factory import generate, load_yaml
|
||||||
|
|
||||||
|
FIXTURE_DIR = Path(__file__).parent.parent / "fixtures" / "campaigns"
|
||||||
|
|
||||||
|
|
||||||
|
def _load_corpus(yaml_name: str) -> Any:
|
||||||
|
"""Load a fixture; expand the noise_floor composite if required."""
|
||||||
|
path = FIXTURE_DIR / yaml_name
|
||||||
|
raw = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||||
|
if "include_fixtures" in raw:
|
||||||
|
# Mirror tests/clustering/test_noise_floor_fixture.py's expander —
|
||||||
|
# noise_floor is the only fixture that uses this format.
|
||||||
|
campaigns: list[dict[str, Any]] = []
|
||||||
|
inherited_noise = 0
|
||||||
|
for fname in raw["include_fixtures"]:
|
||||||
|
sub = load_yaml(FIXTURE_DIR / fname)
|
||||||
|
if "corpus" in sub:
|
||||||
|
campaigns.extend(sub["corpus"].get("campaigns", []))
|
||||||
|
inherited_noise += int(
|
||||||
|
(sub["corpus"].get("noise") or {}).get("scanner_count", 0)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
campaigns.append({"campaign": sub["campaign"]})
|
||||||
|
extra = int(raw.get("extra_noise_scanners", 0))
|
||||||
|
spec: Any = {
|
||||||
|
"corpus": {
|
||||||
|
"campaigns": campaigns,
|
||||||
|
"noise": {"scanner_count": inherited_noise + extra},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return generate(spec, seed=0)
|
||||||
|
return generate(load_yaml(path), seed=0)
|
||||||
|
|
||||||
|
|
||||||
|
def production_campaign_clusterer(corpus) -> dict[str, str]:
|
||||||
|
"""Predict-fn adapter — chains identity + campaign clustering.
|
||||||
|
|
||||||
|
Mirrors the production pipeline: the identity clusterer groups
|
||||||
|
rotated-IP observations into identities, then the campaign
|
||||||
|
clusterer groups identities into campaigns. The harness scores
|
||||||
|
``{attacker_id: cluster_id}`` so the chain preserves the
|
||||||
|
attacker → identity → campaign mapping.
|
||||||
|
"""
|
||||||
|
# ── Layer 1: identity clustering over observations.
|
||||||
|
obs_list = [from_synthetic(a) for a in corpus.attackers]
|
||||||
|
obs_labels = cluster_observations(obs_list)
|
||||||
|
|
||||||
|
# Group attackers by their identity cluster.
|
||||||
|
by_identity: dict[str, list] = {}
|
||||||
|
for a in corpus.attackers:
|
||||||
|
by_identity.setdefault(obs_labels[a.attacker_id], []).append(a)
|
||||||
|
|
||||||
|
# ── Layer 2: aggregate each identity's member observations into
|
||||||
|
# one ``IdentityFeatures``, run campaign clustering.
|
||||||
|
identity_features: list[IdentityFeatures] = []
|
||||||
|
for identity_id, members in by_identity.items():
|
||||||
|
identity_features.append(_merge_features(identity_id, members))
|
||||||
|
campaign_labels = cluster_identities(identity_features)
|
||||||
|
|
||||||
|
# ── Map attacker_id → campaign cluster id via the identity hop.
|
||||||
|
return {
|
||||||
|
a.attacker_id: campaign_labels[obs_labels[a.attacker_id]]
|
||||||
|
for a in corpus.attackers
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_features(identity_uuid: str, members) -> IdentityFeatures:
|
||||||
|
"""Aggregate per-attacker IdentityFeatures into a single identity.
|
||||||
|
|
||||||
|
Set fields union; per-decky maps are merged (first/last seen
|
||||||
|
extends across all member observations); session windows
|
||||||
|
concatenate.
|
||||||
|
"""
|
||||||
|
parts = [from_synthetic_identity(a, identity_uuid=identity_uuid) for a in members]
|
||||||
|
|
||||||
|
asn_cohort: set[int] = set()
|
||||||
|
payload_hashes: set[str] = set()
|
||||||
|
c2_endpoints: set[str] = set()
|
||||||
|
decky_set: set[str] = set()
|
||||||
|
session_windows: list[tuple[float, float]] = []
|
||||||
|
last_phase_per_decky: dict[str, str] = {}
|
||||||
|
first_phase_per_decky: dict[str, str] = {}
|
||||||
|
last_seen_per_decky: dict[str, float] = {}
|
||||||
|
first_seen_per_decky: dict[str, float] = {}
|
||||||
|
commands_by_phase_on_decky: dict[tuple[str, str], list[str]] = {}
|
||||||
|
|
||||||
|
for p in parts:
|
||||||
|
asn_cohort |= p.asn_cohort
|
||||||
|
payload_hashes |= p.payload_hashes
|
||||||
|
c2_endpoints |= p.c2_endpoints
|
||||||
|
decky_set |= p.decky_set
|
||||||
|
session_windows.extend(p.session_windows)
|
||||||
|
for decky, ts in p.first_seen_per_decky.items():
|
||||||
|
cur = first_seen_per_decky.get(decky)
|
||||||
|
if cur is None or ts < cur:
|
||||||
|
first_seen_per_decky[decky] = ts
|
||||||
|
first_phase_per_decky[decky] = p.first_phase_per_decky.get(decky, "")
|
||||||
|
for decky, ts in p.last_seen_per_decky.items():
|
||||||
|
cur = last_seen_per_decky.get(decky)
|
||||||
|
if cur is None or ts > cur:
|
||||||
|
last_seen_per_decky[decky] = ts
|
||||||
|
last_phase_per_decky[decky] = p.last_phase_per_decky.get(decky, "")
|
||||||
|
for key, cmds in p.commands_by_phase_on_decky.items():
|
||||||
|
commands_by_phase_on_decky.setdefault(key, []).extend(cmds)
|
||||||
|
|
||||||
|
return IdentityFeatures(
|
||||||
|
identity_uuid=identity_uuid,
|
||||||
|
asn_cohort=frozenset(asn_cohort),
|
||||||
|
payload_hashes=frozenset(payload_hashes),
|
||||||
|
c2_endpoints=frozenset(c2_endpoints),
|
||||||
|
decky_set=frozenset(decky_set),
|
||||||
|
session_windows=tuple(session_windows),
|
||||||
|
last_phase_per_decky=last_phase_per_decky,
|
||||||
|
first_phase_per_decky=first_phase_per_decky,
|
||||||
|
last_seen_per_decky=last_seen_per_decky,
|
||||||
|
first_seen_per_decky=first_seen_per_decky,
|
||||||
|
commands_by_phase_on_decky={
|
||||||
|
k: tuple(v) for k, v in commands_by_phase_on_decky.items()
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Per-fixture bound assertions ───────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"yaml_name,expected_name,truth_level",
|
||||||
|
[
|
||||||
|
("lone_wolf.yaml", "lone_wolf.expected.yaml", "campaign"),
|
||||||
|
("shared_wordlist.yaml", "shared_wordlist.expected.yaml", "campaign"),
|
||||||
|
("vpn_hopping.yaml", "vpn_hopping.expected.yaml", "campaign"),
|
||||||
|
("paused_campaign.yaml", "paused_campaign.expected.yaml", "campaign"),
|
||||||
|
("multi_operator.yaml", "multi_operator.expected.yaml", "campaign"),
|
||||||
|
("noise_floor.yaml", "noise_floor.expected.yaml", "campaign"),
|
||||||
|
("slow_burn.yaml", "slow_burn.expected.yaml", "campaign"),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_production_campaign_clusterer_passes_fixture_bounds(
|
||||||
|
yaml_name: str, expected_name: str, truth_level: str,
|
||||||
|
) -> None:
|
||||||
|
corpus = _load_corpus(yaml_name)
|
||||||
|
assert_fixture_bounds(
|
||||||
|
corpus,
|
||||||
|
production_campaign_clusterer,
|
||||||
|
FIXTURE_DIR / expected_name,
|
||||||
|
truth_level=truth_level,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Per-fixture sharpness assertions (production clusterer specifics) ─────
|
||||||
|
#
|
||||||
|
# These tighten the YAML bounds for fixtures where the production
|
||||||
|
# clusterer is expected to score *perfectly*. They live as Python
|
||||||
|
# assertions (not YAML) so they only gate the production clusterer —
|
||||||
|
# the YAML bounds stay loose for the reference-clusterer tests in the
|
||||||
|
# per-fixture files. Ratcheting these up over time is safe; the YAML
|
||||||
|
# bounds remain the floor that *every* tested clusterer must beat.
|
||||||
|
|
||||||
|
|
||||||
|
def test_f3_lone_wolf_perfect_score() -> None:
|
||||||
|
"""Every actor a singleton — campaign clusterer should match."""
|
||||||
|
corpus = _load_corpus("lone_wolf.yaml")
|
||||||
|
pred = production_campaign_clusterer(corpus)
|
||||||
|
metrics = score(corpus.truth_labels(level="campaign"), pred)
|
||||||
|
assert metrics["singleton_recall"] == pytest.approx(1.0)
|
||||||
|
assert metrics["adjusted_rand_index"] == pytest.approx(1.0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_f1_shared_wordlist_no_false_merge() -> None:
|
||||||
|
"""Two campaigns burning the same wordlist must NOT fuse."""
|
||||||
|
corpus = _load_corpus("shared_wordlist.yaml")
|
||||||
|
pred = production_campaign_clusterer(corpus)
|
||||||
|
truth = corpus.truth_labels(level="campaign")
|
||||||
|
# Predicted: each truth-class member should have its own cluster id
|
||||||
|
# (they share no payload / c2 / phase-handoff).
|
||||||
|
truth_to_pred: dict[str, set[str]] = {}
|
||||||
|
for aid, t in truth.items():
|
||||||
|
truth_to_pred.setdefault(t, set()).add(pred[aid])
|
||||||
|
# No predicted cluster spans two truth campaigns.
|
||||||
|
pred_to_truth: dict[str, set[str]] = {}
|
||||||
|
for aid, p in pred.items():
|
||||||
|
pred_to_truth.setdefault(p, set()).add(truth[aid])
|
||||||
|
assert all(len(s) == 1 for s in pred_to_truth.values()), (
|
||||||
|
f"shared_wordlist: predicted cluster spans multiple campaigns: "
|
||||||
|
f"{pred_to_truth}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_f5_multi_operator_folds_to_one_campaign() -> None:
|
||||||
|
"""Two operators with shared payload + C2 + phase-handoff fold to one campaign."""
|
||||||
|
corpus = _load_corpus("multi_operator.yaml")
|
||||||
|
pred = production_campaign_clusterer(corpus)
|
||||||
|
cluster_ids = set(pred.values())
|
||||||
|
assert len(cluster_ids) == 1, (
|
||||||
|
f"multi_operator: expected 1 campaign, got {len(cluster_ids)} — "
|
||||||
|
f"predictions: {pred}"
|
||||||
|
)
|
||||||
|
metrics = score(corpus.truth_labels(level="campaign"), pred)
|
||||||
|
assert metrics["adjusted_rand_index"] == pytest.approx(1.0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_f7_slow_burn_time_shift_invariance() -> None:
|
||||||
|
"""Shift every timestamp +90 days — predictions must be identical.
|
||||||
|
|
||||||
|
The pure F7 invariant: campaign edges are pairwise-relative; an
|
||||||
|
absolute shift on every session must not change any cluster
|
||||||
|
assignment. Mirrors the identity-side check in
|
||||||
|
``test_slow_burn_fixture.py``.
|
||||||
|
"""
|
||||||
|
from datetime import timedelta
|
||||||
|
|
||||||
|
corpus = _load_corpus("slow_burn.yaml")
|
||||||
|
base_pred = production_campaign_clusterer(corpus)
|
||||||
|
|
||||||
|
delta = timedelta(days=90)
|
||||||
|
for a in corpus.attackers:
|
||||||
|
a.first_seen = a.first_seen + delta
|
||||||
|
a.last_seen = a.last_seen + delta
|
||||||
|
for s in a.sessions:
|
||||||
|
s.started_at = s.started_at + delta
|
||||||
|
|
||||||
|
shifted_pred = production_campaign_clusterer(corpus)
|
||||||
|
|
||||||
|
# Cluster id labels are opaque — what matters is the partition.
|
||||||
|
base_partition = _partition(base_pred)
|
||||||
|
shifted_partition = _partition(shifted_pred)
|
||||||
|
assert base_partition == shifted_partition, (
|
||||||
|
f"slow_burn: +90d shift changed the predicted partition\n"
|
||||||
|
f"base: {base_partition}\n"
|
||||||
|
f"shifted: {shifted_partition}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _partition(labels: dict[str, str]) -> set[frozenset[str]]:
|
||||||
|
"""Return the cluster partition (set of frozensets of member ids).
|
||||||
|
|
||||||
|
Cluster id strings are arbitrary; the equivalence we care about is
|
||||||
|
"which ids ended up in the same cluster?".
|
||||||
|
"""
|
||||||
|
by_cluster: dict[str, set[str]] = {}
|
||||||
|
for member, cluster_id in labels.items():
|
||||||
|
by_cluster.setdefault(cluster_id, set()).add(member)
|
||||||
|
return {frozenset(s) for s in by_cluster.values()}
|
||||||
@@ -8,10 +8,10 @@
|
|||||||
# algorithm matures. Loosening any bound to make CI pass requires
|
# algorithm matures. Loosening any bound to make CI pass requires
|
||||||
# justification in the PR description (per CAMPAIGN_CLUSTERING.md §2).
|
# justification in the PR description (per CAMPAIGN_CLUSTERING.md §2).
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -16,10 +16,10 @@
|
|||||||
#
|
#
|
||||||
# Bounds are loose at v1; tighten as the algorithm matures.
|
# Bounds are loose at v1; tighten as the algorithm matures.
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -15,10 +15,10 @@
|
|||||||
#
|
#
|
||||||
# Bounds are loose at v1; tighten as the algorithm matures.
|
# Bounds are loose at v1; tighten as the algorithm matures.
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -15,10 +15,10 @@
|
|||||||
#
|
#
|
||||||
# Bounds are loose at v1; tighten as the algorithm matures.
|
# Bounds are loose at v1; tighten as the algorithm matures.
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -41,7 +41,7 @@ campaign:
|
|||||||
- id: ops-sprint-1
|
- id: ops-sprint-1
|
||||||
asn: 64520
|
asn: 64520
|
||||||
ip_pool: sticky
|
ip_pool: sticky
|
||||||
ja3: "771,4865-4866-4867-49195-49199-49196-49200,0-23-65281-10-11-35-16-5-13-18-51-45-43-27,29-23-24,0"
|
ja3: "771,4865-4867-49195-49199-49196-49200-157,0-23-65281-10-11-35-16-5-13-18-51-45-43-27,29-24,0"
|
||||||
hassh: "paused-op-dddddddd-dddddddd-dddddddd"
|
hassh: "paused-op-dddddddd-dddddddd-dddddddd"
|
||||||
hours_active_utc: [9, 10, 11, 12, 13, 14, 15, 16]
|
hours_active_utc: [9, 10, 11, 12, 13, 14, 15, 16]
|
||||||
jitter_seconds: 60
|
jitter_seconds: 60
|
||||||
@@ -49,7 +49,7 @@ campaign:
|
|||||||
- id: ops-sprint-2
|
- id: ops-sprint-2
|
||||||
asn: 64520 # same ASN — operator stays on same egress
|
asn: 64520 # same ASN — operator stays on same egress
|
||||||
ip_pool: sticky
|
ip_pool: sticky
|
||||||
ja3: "771,4865-4866-4867-49195-49199-49196-49200,0-23-65281-10-11-35-16-5-13-18-51-45-43-27,29-23-24,0"
|
ja3: "771,4865-4867-49195-49199-49196-49200-157,0-23-65281-10-11-35-16-5-13-18-51-45-43-27,29-24,0"
|
||||||
hassh: "paused-op-dddddddd-dddddddd-dddddddd"
|
hassh: "paused-op-dddddddd-dddddddd-dddddddd"
|
||||||
hours_active_utc: [9, 10, 11, 12, 13, 14, 15, 16]
|
hours_active_utc: [9, 10, 11, 12, 13, 14, 15, 16]
|
||||||
jitter_seconds: 60
|
jitter_seconds: 60
|
||||||
|
|||||||
@@ -12,10 +12,10 @@
|
|||||||
# any bound to make CI pass requires PR-comment justification (per
|
# any bound to make CI pass requires PR-comment justification (per
|
||||||
# CAMPAIGN_CLUSTERING.md §2).
|
# CAMPAIGN_CLUSTERING.md §2).
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -15,10 +15,10 @@
|
|||||||
#
|
#
|
||||||
# Bounds are loose at v1; tighten as the algorithm matures.
|
# Bounds are loose at v1; tighten as the algorithm matures.
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
@@ -16,10 +16,10 @@
|
|||||||
# any bound to make CI pass requires PR-comment justification (per
|
# any bound to make CI pass requires PR-comment justification (per
|
||||||
# CAMPAIGN_CLUSTERING.md §2).
|
# CAMPAIGN_CLUSTERING.md §2).
|
||||||
adjusted_rand_index:
|
adjusted_rand_index:
|
||||||
min: 0.85
|
min: 1.0
|
||||||
homogeneity:
|
homogeneity:
|
||||||
min: 0.90
|
min: 1.0
|
||||||
completeness:
|
completeness:
|
||||||
min: 0.80
|
min: 1.0
|
||||||
singleton_recall:
|
singleton_recall:
|
||||||
min: 0.95
|
min: 1.0
|
||||||
|
|||||||
Reference in New Issue
Block a user