feat(clustering): revocable merges (merge + unmerge)
Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:
Pass 1 — per-component reconciliation:
* Fresh component → mint identity (commit 4 path).
* Single-identity component → link unassigned observations.
* Multi-identity component → soft-merge: pick the smallest-uuid
winner deterministically, set merged_into_uuid on each loser,
link unassigned observations to the winner. Observations stay
FK'd to their original identity row — the merge is a soft
pointer, not a re-point. Audit trail preserved; cached
subscribers resolve through the chain.
Pass 2 — revocable-merge undo:
* For each merged-out identity, check whether its observations
still cluster with its winner's. If not, the merge is
contradicted by new evidence — clear merged_into_uuid and emit
identities_unmerged. The resurrected identity keeps its original
uuid, so subscribers that cached it during the merged interval
re-attach without a new lookup.
A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).
Repo additions on BaseRepository + SQLModelRepository:
* list_all_identities() — includes merged-out rows.
* update_identity_merged_into(uuid, winner_or_None) — single
setter for both merge and unmerge.
DummyRepo coverage stub updated.
Tests:
* Two distinct identities bridged by a new observation merge with
the smaller uuid as winner.
* A pre-seeded soft-merge whose underlying observations diverge
gets revoked; resurrected uuid emerges with merged_into_uuid
cleared.
* Tick is idempotent under no state changes.
This commit is contained in:
@@ -69,6 +69,8 @@ class DummyRepo(BaseRepository):
|
||||
async def list_attackers_for_clustering(self, limit=None): await super().list_attackers_for_clustering(limit); return []
|
||||
async def create_attacker_identity(self, row): await super().create_attacker_identity(row); return ""
|
||||
async def set_attacker_identity_id(self, a, i): await super().set_attacker_identity_id(a, i)
|
||||
async def list_all_identities(self): await super().list_all_identities(); return []
|
||||
async def update_identity_merged_into(self, u, w): await super().update_identity_merged_into(u, w)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_base_repo_coverage():
|
||||
@@ -139,6 +141,9 @@ async def test_base_repo_coverage():
|
||||
await dr.list_attackers_for_clustering()
|
||||
await dr.create_attacker_identity({"uuid": "i"})
|
||||
await dr.set_attacker_identity_id("a", "i")
|
||||
await dr.list_all_identities()
|
||||
await dr.update_identity_merged_into("a", "b")
|
||||
await dr.update_identity_merged_into("a", None)
|
||||
|
||||
# Swarm methods: default NotImplementedError on BaseRepository. Covering
|
||||
# them here keeps the coverage contract honest for the swarm CRUD surface.
|
||||
|
||||
Reference in New Issue
Block a user