Legacy fleet deckies live in decnet-state.json; MazeNET topology
containers don't. Tag them at compose-time with
decnet.topology.service=true and let the collector match on that label.
Spin up the agent's log collector on the first successful /topology/apply
(not in the lifespan — that would break the no-docker-on-boot invariant)
and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE,
separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running
both roles can't forward its own ingest back to itself.
When master pushes a topology that differs from whatever is pinned
locally, teardown the predecessor and accept the new one. Refusing with
409 left the agent stranded after partial deploys. record_error now
persists the hydrated blob so a later teardown can still walk the LAN
list — otherwise a half-failed apply strands containers + bridges with
no breadcrumb back to them.
Two small observability follow-ups to the phase-1 agent/topology wiring:
TopologySummary now carries needs_resync so operators can see the
heartbeat's resync flag via the topology list/detail API without
dropping into the DB.
TopologyStore.record_error becomes an upsert — when a docker/compose
failure fires during the first materialise (put() never reached), we
still land a marker row so GET /topology/state surfaces the error and
the next heartbeat carries an empty applied_version_hash. That empty
hash is what master's heartbeat check relies on to flag the topology
for resync instead of assuming the apply succeeded.
New mTLS-protected routes on the agent:
- POST /topology/apply — master pushes {hydrated, version_hash}.
Validates the hash matches locally (serialisation drift guard),
runs the topology through the same validator/composer pipeline
used master-side, then creates bridges + compose up + records the
apply in topology.db.
- POST /topology/teardown — dismantles compose, removes bridges,
clears topology.db. Idempotent.
- GET /topology/state — returns applied row + live docker
observation for the heartbeat.
Implementation lives in decnet/agent/topology_ops.py; it reuses the
private compose helpers from decnet.engine.deployer so we don't
duplicate compose/project-name plumbing. The apply path is sync
under the hood (docker SDK + subprocess); we hop to a thread so the
event loop keeps servicing other agent traffic.
v1 is one-topology-per-agent; cross-topology apply returns 409.
Step 4 of the agent <-> topology integration.
Single-row sqlite tracking which topology the agent last applied and
its version hash. Sync/stdlib, same pattern as the log-forwarder
offset store. v1 is one-topology-per-agent; attempting to apply a
different topology over a populated row raises AlreadyApplied so the
endpoint can return 409. observed() snapshots live docker state
(decnet-topology-* bridges + decnet-* containers) for the heartbeat.
The store is a cache, not authority — no auto-restore on boot.
Master remains the only source of truth.
Step 3 of the agent <-> topology integration.