feat(mazenet): step 7 — topology_mutations queue + mutator reconciler

Adds the live-mutation pipeline for active/degraded topologies:

* TopologyMutation table with composite index (state, topology_id)
  so the watch-loop guard query stays O(log n).
* claim_next_mutation is a single atomic UPDATE ... WHERE
  state='pending' so racing reconcilers deterministically pick one
  winner; losers see rowcount=0 and skip.
* reconcile_topologies drains pending rows per live topology, applies
  via decnet.mutator.ops.dispatch, and on failure marks the mutation
  failed + transitions topology to degraded.
* run_watch_loop gains a gated branch: flat-fleet mutate_all runs
  every tick unchanged; the reconciler only enters when the cheap
  has_pending_topology_mutation guard returns True.
* apply_* ops re-check hard invariants (names, IP collisions, subnet
  overlap, known services, service_config shape) after every mutation
  so the repo never lands in an invalid state.
* CLI: 'decnet topology mutate' / 'mutations' subcommands.
This commit is contained in:
2026-04-20 18:02:37 -04:00
parent 91df57d36b
commit a76b9ecdf9
7 changed files with 1033 additions and 2 deletions

View File

@@ -327,3 +327,41 @@ class BaseRepository(ABC):
self, edge_id: str, *, expected_version: Optional[int] = None
) -> None:
raise NotImplementedError
# -------------------- live mutation queue (reconciler) --------------------
async def enqueue_topology_mutation(
self,
topology_id: str,
op: str,
payload: dict[str, Any],
*,
expected_version: Optional[int] = None,
) -> str:
raise NotImplementedError
async def claim_next_mutation(
self, topology_id: str
) -> Optional[dict[str, Any]]:
raise NotImplementedError
async def mark_mutation_applied(self, mutation_id: str) -> None:
raise NotImplementedError
async def mark_mutation_failed(
self, mutation_id: str, reason: str
) -> None:
raise NotImplementedError
async def list_topology_mutations(
self,
topology_id: str,
state: Optional[str] = None,
) -> list[dict[str, Any]]:
raise NotImplementedError
async def has_pending_topology_mutation(self) -> bool:
return False
async def list_live_topology_ids(self) -> list[str]:
return []