feat(swarm): self-destruct agent on decommission

Decommissioning a worker from the dashboard (or swarm controller) now
asks the agent to wipe its own install before the master forgets it.
The agent stops decky containers + every decnet-* systemd unit, then
deletes /opt/decnet*, /etc/systemd/system/decnet-*, /var/lib/decnet/*,
and /usr/local/bin/decnet*. Logs under /var/log are preserved.

The reaper runs as a detached /tmp script (start_new_session=True) so
it survives the agent process being killed. Self-destruct dispatch is
best-effort — a dead worker doesn't block master-side cleanup.
This commit is contained in:
2026-04-19 20:47:09 -04:00
parent 9d68bb45c7
commit 14250cacad
7 changed files with 231 additions and 2 deletions

View File

@@ -87,6 +87,20 @@ async def teardown(req: TeardownRequest) -> dict:
return {"status": "torn_down", "decky_id": req.decky_id}
@app.post("/self-destruct")
async def self_destruct() -> dict:
"""Stop all DECNET services on this worker and delete the install
footprint. Called by the master during decommission. Logs under
/var/log/decnet* are preserved. Fire-and-forget — returns 202 before
the reaper starts deleting files."""
try:
await _exec.self_destruct()
except Exception as exc:
log.exception("agent.self_destruct failed")
raise HTTPException(status_code=500, detail=str(exc)) from exc
return {"status": "self_destruct_scheduled"}
@app.post("/mutate")
async def mutate(req: MutateRequest) -> dict:
# Service rotation is routed through the deployer's existing mutate path