3 Commits

Author SHA1 Message Date
0b1a17b4eb fix(agent): pass --always-recreate-deps so service netns shares stay fresh
Decky service containers join their base via `network_mode:
container:<base>` and Docker binds that share at service start time. If
`docker compose up` recreates a base (e.g. ports: changes after a
forwards_l3 toggle) but decides services are unchanged, services keep
a stale FD into the destroyed namespace and end up with only `lo` — so
external traffic hits a closed port on the live base and gets RST.

Hit live on the first VPS deploy: external SSH to the dmz-gateway was
refused while sshd was listening, because base and service netns
inodes had drifted apart. `--always-recreate-deps` makes compose
rebuild every dependent whenever its base is recreated, removing the
race entirely.
2026-04-27 22:55:48 -04:00
0cdcfe2653 feat(agent/collector): topology-label discovery and master-authoritative supersede
Legacy fleet deckies live in decnet-state.json; MazeNET topology
containers don't. Tag them at compose-time with
decnet.topology.service=true and let the collector match on that label.
Spin up the agent's log collector on the first successful /topology/apply
(not in the lifespan — that would break the no-docker-on-boot invariant)
and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE,
separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running
both roles can't forward its own ingest back to itself.

When master pushes a topology that differs from whatever is pinned
locally, teardown the predecessor and accept the new one. Refusing with
409 left the agent stranded after partial deploys. record_error now
persists the hydrated blob so a later teardown can still walk the LAN
list — otherwise a half-failed apply strands containers + bridges with
no breadcrumb back to them.
2026-04-21 10:23:10 -04:00
13cb0ff38e feat(agent): topology apply/teardown/state endpoints
New mTLS-protected routes on the agent:

- POST /topology/apply — master pushes {hydrated, version_hash}.
  Validates the hash matches locally (serialisation drift guard),
  runs the topology through the same validator/composer pipeline
  used master-side, then creates bridges + compose up + records the
  apply in topology.db.
- POST /topology/teardown — dismantles compose, removes bridges,
  clears topology.db.  Idempotent.
- GET /topology/state — returns applied row + live docker
  observation for the heartbeat.

Implementation lives in decnet/agent/topology_ops.py; it reuses the
private compose helpers from decnet.engine.deployer so we don't
duplicate compose/project-name plumbing.  The apply path is sync
under the hood (docker SDK + subprocess); we hop to a thread so the
event loop keeps servicing other agent traffic.

v1 is one-topology-per-agent; cross-topology apply returns 409.

Step 4 of the agent <-> topology integration.
2026-04-21 01:25:15 -04:00