fix(web): serialize live topology mutations + surface failures loudly

Live MazeNET edits fired their mutations fire-and-forget: each canvas
action enqueued immediately and never awaited the result. Two failures
followed from that:

- expected_version is bumped at ENQUEUE (not at apply), so two ops fired
  back-to-back raced — the second carried a stale version and 409'd.
  Edits only worked when hand-paced (an SSE refetch landed between them).
- A failed mutation degrades the topology, but the only signal was a 4s
  toast, so the user saw DEGRADED with no cause.

useTopologyEditor now routes every live op through a serialized submit
queue: one enqueue in flight at a time (submission order preserved), an
optimistic expected_version cursor advanced per enqueue so back-to-back
ops (e.g. reparent's detach+attach) don't need a refetch between them,
and each mutation awaited to a terminal state. A 'failed' row throws
MutationFailedError, which the page pins as a persistent UPDATE FAILED
banner instead of a vanishing toast.

Slice 1 of the live-edit rework; stage+UPDATE-button batching and louder
backend materialisation reporting to follow.
This commit is contained in:
2026-06-16 12:44:34 -04:00
parent 5505de782f
commit f18bfee746
5 changed files with 216 additions and 28 deletions

View File

@@ -5,6 +5,7 @@ import type { Net, MazeNode, Edge } from './types';
import { DEFAULT_SERVICES, ARCHETYPES as DEFAULT_ARCHETYPES } from './data';
import type { Archetype, ServiceDef } from './data';
import type { MazeApi } from './useMazeApi';
import { MutationFailedError } from './useTopologyEditor';
import { useTopologyStream, type TopologyStreamEvent } from './useTopologyStream';
export interface TopoMeta {
@@ -42,6 +43,10 @@ export interface UseTopologyDataResult {
// Errors + transient banners
loadErr: string | null;
actionErr: string | null;
/** Persistent (no auto-clear) error from a failed live mutation —
* the topology likely went degraded. Dismissed via clearCommitErr. */
commitErr: string | null;
clearCommitErr: () => void;
flashErr: (err: unknown, fallback: string) => void;
// Deploy
@@ -77,9 +82,18 @@ export function useTopologyData(
const [loadErr, setLoadErr] = useState<string | null>(null);
const [actionErr, setActionErr] = useState<string | null>(null);
const [commitErr, setCommitErr] = useState<string | null>(null);
const [deploying, setDeploying] = useState(false);
const clearCommitErr = useCallback(() => setCommitErr(null), []);
const flashErr = useCallback((err: unknown, fallback: string) => {
// A failed live mutation is loud + persistent: the queue halted and
// the topology probably degraded — don't let it vanish in 4s.
if (err instanceof MutationFailedError) {
setCommitErr(err.message);
return;
}
const msg = (err as ApiError)?.response?.data?.detail ?? (err as ApiError)?.message ?? fallback;
setActionErr(msg);
setTimeout(() => setActionErr(null), 4000);
@@ -189,7 +203,7 @@ export function useTopologyData(
edges, setEdges,
topoMeta,
services, archetypes,
loadErr, actionErr, flashErr,
loadErr, actionErr, commitErr, clearCommitErr, flashErr,
deploying, onDeploy,
streamLive, lastEventAt, streamEnabled,
refetch,