feat(clustering): combined edge weight + medium-tier wiring
The clusterer now drops a single high-tier function call in favor of a tier-weighted sum. Tier multipliers (high=1.0, medium=0.6, low=0.2, very_low=0.05) are tuned so the threshold (1.0) admits high-tier agreement alone while leaving every weaker tier — and every combination of weaker tiers — under threshold. Per-tier discipline tested: - high alone clusters - medium alone does NOT cluster (supporting signal only) - low alone does NOT cluster (fixture 1's failure mode) - very-low alone does NOT cluster (fixture 2's failure mode) - all three weak tiers stacked still don't reach threshold - high + medium clusters (high already saturates) The combination is forward-compatible: low + very-low contributions are computed today but always project to 0.0 because the production adapter doesn't populate credentials / ASN-edge inputs into the fixture path yet. Their contribution becomes load-bearing in commit 7 when the low-tier landing tightens the F1 / F2 bounds. Fixture 4 (paused_campaign) ratchet added: high-tier signal carries the multi-day-silence campaign into one identity. Time-agnostic invariant — silence is irrelevant to the edge weight.
This commit is contained in:
@@ -287,6 +287,36 @@ def test_shared_wordlist_passes_with_production_clusterer():
|
||||
)
|
||||
|
||||
|
||||
def test_paused_campaign_passes_with_production_clusterer():
|
||||
"""Fixture 4: one campaign split across two operational windows by
|
||||
a multi-day silence. Both halves share JA3 + HASSH + payload + C2;
|
||||
the production clusterer must fold them into one identity. Time-
|
||||
agnostic invariant: the silence window is irrelevant to clustering."""
|
||||
from tests.clustering.fixture_harness import assert_fixture_bounds
|
||||
from tests.factories.campaign_factory import generate, load_yaml
|
||||
|
||||
corpus = generate(load_yaml(FIXTURE_DIR / "paused_campaign.yaml"), seed=0)
|
||||
assert_fixture_bounds(
|
||||
corpus, _production_clusterer_predict,
|
||||
FIXTURE_DIR / "paused_campaign.expected.yaml",
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_observations_medium_alone_does_not_fuse():
|
||||
"""Two observations sharing only command-sequence (medium-tier)
|
||||
must stay in distinct clusters — medium is a supporting signal."""
|
||||
a = Observation(
|
||||
observation_id="a",
|
||||
commands_by_phase={"discovery": ("ls", "id", "uname")},
|
||||
)
|
||||
b = Observation(
|
||||
observation_id="b",
|
||||
commands_by_phase={"discovery": ("ls", "id", "uname")},
|
||||
)
|
||||
labels = cluster_observations([a, b])
|
||||
assert labels["a"] != labels["b"]
|
||||
|
||||
|
||||
def test_vpn_hopping_passes_at_identity_level_with_production_clusterer():
|
||||
"""Fixture 2: one rotating actor with stable JA3 + HASSH across
|
||||
5 ASNs. The production clusterer must fold all 5 observations into
|
||||
|
||||
Reference in New Issue
Block a user