feat(agent): /deploy and /mutate become 202 fire-and-forget
The wizard API used to hang because /deckies/deploy ran docker compose
build && up -d synchronously, holding the request thread for minutes.
The worker side of that pipeline now returns 202 Accepted immediately
and runs the deploy in an asyncio.create_task.
On task completion (success or failure) the worker pushes a one-off
heartbeat carrying a lifecycle delta per decky:
{decky_name, operation, status: succeeded|failed, error?, completed_at}
Master pivots these onto open DeckyLifecycle rows in the heartbeat
handler (next commit). The scheduled 30s heartbeat tick is the
fallback if the immediate push drops.
- decnet/agent/app.py: /deploy and /mutate return 202; dry_run mutate
still validates synchronously and returns 200.
- decnet/agent/executor.py: deploy_async + mutate_async wrap the work
and push the completion delta.
- decnet/agent/heartbeat.py: push_lifecycle_delta() helper builds a
one-off body and POSTs with the same mTLS context as the loop.
- decnet/swarm/client.py: revert deploy/mutate to control timeout
(master no longer holds the HTTP request open for compose work).
Worker state.json gains no lifecycle field -- master DeckyLifecycle is
the source of truth; the master sweep handles crashed-mid-deploy
recovery.
This commit is contained in:
@@ -142,8 +142,11 @@ async def test_client_mutate_unknown_decky_404(
|
||||
async with swarm_client.AgentClient(
|
||||
address="127.0.0.1", agent_port=port, identity=master_id,
|
||||
) as agent:
|
||||
# Only dry_run can surface 404 synchronously; the live path is
|
||||
# 202 fire-and-forget and would surface failure via the
|
||||
# heartbeat lifecycle delta.
|
||||
with pytest.raises(httpx.HTTPStatusError) as ei:
|
||||
await agent.mutate("ghost", ["ssh"])
|
||||
await agent.mutate("ghost", ["ssh"], dry_run=True)
|
||||
assert ei.value.response.status_code == 404
|
||||
finally:
|
||||
server.should_exit = True
|
||||
|
||||
Reference in New Issue
Block a user