feat(clustering): wire high-weight edges end-to-end
The connected-components clusterer now writes attacker_identities rows + sets attackers.identity_id when high-weight signals (JA3 / HASSH / payload-hash / C2-endpoint exact match) agree across observations. Singletons stay un-fingerprinted and un-clustered. Algorithm split: - cluster_observations(observations) — pure union-find over the high-weight edge function. Same code path for fixture validation and production tick. - from_attacker_row(row) — production-row adapter; recovers JA3 + HASSH from Attacker.fingerprints JSON. Payload + C2 join from logs in later commits; the function shape doesn't change. Repo additions on BaseRepository + SQLModelRepository: - list_attackers_for_clustering(limit=None) - create_attacker_identity(row) - set_attacker_identity_id(attacker_uuid, identity_uuid) DummyRepo coverage stub updated. v1 behavior is conservative: only assigns identities to observations whose identity_id is currently NULL. Multi-identity components are skipped this pass — merge / re-assign lands in commit 10 with revocable merges. Fixture bounds tightened against the production clusterer: - lone_wolf (F3) — singletons stay singletons - shared_wordlist (F1) — credential-only overlap doesn't cluster (high-weight tier doesn't include credentials) - vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3 + HASSH fold into one identity, ARI = 1.0, completeness = 1.0
This commit is contained in:
@@ -66,6 +66,9 @@ class DummyRepo(BaseRepository):
|
||||
async def count_identities(self): await super().count_identities(); return 0
|
||||
async def list_observations_for_identity(self, u, limit=50, offset=0): await super().list_observations_for_identity(u, limit, offset); return []
|
||||
async def count_observations_for_identity(self, u): await super().count_observations_for_identity(u); return 0
|
||||
async def list_attackers_for_clustering(self, limit=None): await super().list_attackers_for_clustering(limit); return []
|
||||
async def create_attacker_identity(self, row): await super().create_attacker_identity(row); return ""
|
||||
async def set_attacker_identity_id(self, a, i): await super().set_attacker_identity_id(a, i)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_base_repo_coverage():
|
||||
@@ -133,6 +136,9 @@ async def test_base_repo_coverage():
|
||||
await dr.count_identities()
|
||||
await dr.list_observations_for_identity("a")
|
||||
await dr.count_observations_for_identity("a")
|
||||
await dr.list_attackers_for_clustering()
|
||||
await dr.create_attacker_identity({"uuid": "i"})
|
||||
await dr.set_attacker_identity_id("a", "i")
|
||||
|
||||
# Swarm methods: default NotImplementedError on BaseRepository. Covering
|
||||
# them here keeps the coverage contract honest for the swarm CRUD surface.
|
||||
|
||||
Reference in New Issue
Block a user