fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right

Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts
TopologyDecky rows where state='running'.  No code path was flipping
that state away from the default 'pending', so the count read 0/N
even when every container was running fine — the dashboard was lying.

Two complementary fixes:

1. deploy_topology — after the post-deploy compose ps verification,
   reconcile each TopologyDecky.state from the corresponding base
   container's docker state.  running → 'running'; anything else →
   'failed'.  Reuses the ps_rows already gathered for the
   ACTIVE-vs-DEGRADED status decision; no extra docker hit.

2. apply_add_decky — _materialise_decky_spawn now returns True/False;
   on True the row is updated to state='running' before
   _assert_valid_after.  Catches the case where a decky added via the
   live mutator queue stays at 'pending' indefinitely (the deployer's
   reconcile only runs on a fresh deploy_topology pass).

Existing topology deckies in active topologies will still read as
'pending' until the next deploy_topology runs, since this is
forward-only.  An operator-side fix is to teardown + redeploy or run
the (forthcoming) reconcile-on-startup pass.
This commit is contained in:
2026-04-29 11:09:32 -04:00
parent 57e527534c
commit d314470d7f
3 changed files with 74 additions and 7 deletions

View File

@@ -130,6 +130,23 @@ async def test_add_decky_spawns_base_and_service_containers(repo, stubs):
assert "newbox-ssh" in args
@pytest.mark.anyio
async def test_add_decky_flips_state_to_running_after_spawn(repo, stubs):
"""Without this the dashboard's ACTIVE DECKIES count reads 0/N."""
tid = await _make_active(repo)
lans = await repo.list_lans_for_topology(tid)
home_lan = lans[0]["name"]
await apply_add_decky(repo, tid, {
"name": "newrunner",
"lan": home_lan,
"services": [],
})
rows = await repo.list_topology_deckies(tid)
new = next(r for r in rows if r["name"] == "newrunner")
assert new["state"] == "running"
@pytest.mark.anyio
async def test_add_decky_skips_materialisation_when_pending(repo, stubs):
"""Pending topology gets DB write only — deploy_topology will spawn."""