feat(agent/collector): topology-label discovery and master-authoritative supersede

Legacy fleet deckies live in decnet-state.json; MazeNET topology
containers don't. Tag them at compose-time with
decnet.topology.service=true and let the collector match on that label.
Spin up the agent's log collector on the first successful /topology/apply
(not in the lifespan — that would break the no-docker-on-boot invariant)
and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE,
separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running
both roles can't forward its own ingest back to itself.

When master pushes a topology that differs from whatever is pinned
locally, teardown the predecessor and accept the new one. Refusing with
409 left the agent stranded after partial deploys. record_error now
persists the hydrated blob so a later teardown can still walk the LAN
list — otherwise a half-failed apply strands containers + bridges with
no breadcrumb back to them.
This commit is contained in:
2026-04-21 10:23:10 -04:00
parent 050607e00d
commit 0cdcfe2653
8 changed files with 221 additions and 20 deletions

View File

@@ -83,6 +83,14 @@ def generate_topology_compose(hydrated: dict[str, Any]) -> dict:
"networks": nets,
"cap_add": ["NET_ADMIN"],
"logging": _DOCKER_LOGGING,
# Labels let the host collector discover topology containers
# without consulting decnet-state.json (which only knows about
# legacy fleet deckies). See decnet/collector/worker.py.
"labels": {
"decnet.topology.id": topology_id,
"decnet.topology.decky": name,
"decnet.topology.role": "base",
},
}
if forwards_l3:
base["sysctls"] = {"net.ipv4.ip_forward": 1}
@@ -120,6 +128,17 @@ def generate_topology_compose(hydrated: dict[str, Any]) -> dict:
fragment.pop("hostname", None)
fragment.pop("networks", None)
fragment["logging"] = _DOCKER_LOGGING
# ``decnet.topology.service=true`` is the marker the collector
# filters on — without it, log streams for this container are
# never attached.
labels = dict(fragment.get("labels") or {})
labels.update({
"decnet.topology.id": topology_id,
"decnet.topology.decky": name,
"decnet.topology.service_name": svc_name,
"decnet.topology.service": "true",
})
fragment["labels"] = labels
services[f"{name}-{svc_name}"] = fragment
networks: dict[str, dict] = {