feat(agent): /deploy and /mutate become 202 fire-and-forget
The wizard API used to hang because /deckies/deploy ran docker compose
build && up -d synchronously, holding the request thread for minutes.
The worker side of that pipeline now returns 202 Accepted immediately
and runs the deploy in an asyncio.create_task.
On task completion (success or failure) the worker pushes a one-off
heartbeat carrying a lifecycle delta per decky:
{decky_name, operation, status: succeeded|failed, error?, completed_at}
Master pivots these onto open DeckyLifecycle rows in the heartbeat
handler (next commit). The scheduled 30s heartbeat tick is the
fallback if the immediate push drops.
- decnet/agent/app.py: /deploy and /mutate return 202; dry_run mutate
still validates synchronously and returns 200.
- decnet/agent/executor.py: deploy_async + mutate_async wrap the work
and push the completion delta.
- decnet/agent/heartbeat.py: push_lifecycle_delta() helper builds a
one-off body and POSTs with the same mTLS context as the loop.
- decnet/swarm/client.py: revert deploy/mutate to control timeout
(master no longer holds the HTTP request open for compose work).
Worker state.json gains no lifecycle field -- master DeckyLifecycle is
the source of truth; the master sweep handles crashed-mid-deploy
recovery.
This commit is contained in:
@@ -246,13 +246,11 @@ class AgentClient:
|
||||
"dry_run": dry_run,
|
||||
"no_cache": no_cache,
|
||||
}
|
||||
# Swap in a long-deploy timeout for this call only.
|
||||
old = self._require_client().timeout
|
||||
self._require_client().timeout = _TIMEOUT_DEPLOY
|
||||
try:
|
||||
resp = await self._require_client().post("/deploy", json=body)
|
||||
finally:
|
||||
self._require_client().timeout = old
|
||||
# Worker /deploy is async (202 fire-and-forget): the response only
|
||||
# acks acceptance; the real work runs in the agent's event loop
|
||||
# and reports terminal state via heartbeat lifecycle deltas. No
|
||||
# need for the long deploy timeout here.
|
||||
resp = await self._require_client().post("/deploy", json=body)
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
@@ -268,14 +266,8 @@ class AgentClient:
|
||||
"services": list(services),
|
||||
"dry_run": dry_run,
|
||||
}
|
||||
# Worker /mutate runs `compose up -d` which can pull/build; same
|
||||
# long-tail latency as /deploy. Swap the deploy timeout in.
|
||||
old = self._require_client().timeout
|
||||
self._require_client().timeout = _TIMEOUT_DEPLOY
|
||||
try:
|
||||
resp = await self._require_client().post("/mutate", json=body)
|
||||
finally:
|
||||
self._require_client().timeout = old
|
||||
# Worker /mutate is async (202): control-timeout is right.
|
||||
resp = await self._require_client().post("/mutate", json=body)
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user