feat(clustering): revocable merges (merge + unmerge)

Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:

Pass 1 — per-component reconciliation:
  * Fresh component → mint identity (commit 4 path).
  * Single-identity component → link unassigned observations.
  * Multi-identity component → soft-merge: pick the smallest-uuid
    winner deterministically, set merged_into_uuid on each loser,
    link unassigned observations to the winner. Observations stay
    FK'd to their original identity row — the merge is a soft
    pointer, not a re-point. Audit trail preserved; cached
    subscribers resolve through the chain.

Pass 2 — revocable-merge undo:
  * For each merged-out identity, check whether its observations
    still cluster with its winner's. If not, the merge is
    contradicted by new evidence — clear merged_into_uuid and emit
    identities_unmerged. The resurrected identity keeps its original
    uuid, so subscribers that cached it during the merged interval
    re-attach without a new lookup.

A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).

Repo additions on BaseRepository + SQLModelRepository:
  * list_all_identities() — includes merged-out rows.
  * update_identity_merged_into(uuid, winner_or_None) — single
    setter for both merge and unmerge.
DummyRepo coverage stub updated.

Tests:
  * Two distinct identities bridged by a new observation merge with
    the smaller uuid as winner.
  * A pre-seeded soft-merge whose underlying observations diverge
    gets revoked; resurrected uuid emerges with merged_into_uuid
    cleared.
  * Tick is idempotent under no state changes.
This commit is contained in:
2026-04-26 08:33:32 -04:00
parent 87412da1ca
commit e364ef8859
5 changed files with 343 additions and 50 deletions

View File

@@ -69,6 +69,8 @@ class DummyRepo(BaseRepository):
async def list_attackers_for_clustering(self, limit=None): await super().list_attackers_for_clustering(limit); return []
async def create_attacker_identity(self, row): await super().create_attacker_identity(row); return ""
async def set_attacker_identity_id(self, a, i): await super().set_attacker_identity_id(a, i)
async def list_all_identities(self): await super().list_all_identities(); return []
async def update_identity_merged_into(self, u, w): await super().update_identity_merged_into(u, w)
@pytest.mark.asyncio
async def test_base_repo_coverage():
@@ -139,6 +141,9 @@ async def test_base_repo_coverage():
await dr.list_attackers_for_clustering()
await dr.create_attacker_identity({"uuid": "i"})
await dr.set_attacker_identity_id("a", "i")
await dr.list_all_identities()
await dr.update_identity_merged_into("a", "b")
await dr.update_identity_merged_into("a", None)
# Swarm methods: default NotImplementedError on BaseRepository. Covering
# them here keeps the coverage contract honest for the swarm CRUD surface.