feat(workers): bus-backed Workers panel (registry, control, installed flag)

Ships the backend half of Config → Workers:

* Worker registry aggregates `system.*.health` + `system.bus.health`
  heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
  out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
  (so the UI can explain "all UNKNOWN" when the bus socket is down)
  and a per-row `installed` flag populated from
  `systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
  `system.<name>.control`; workers listen via the shared control
  listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
  sniffer / prober / mutator worker loops. API self-heartbeats too
  so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
  validation, control listener shutdown path, and the API surface
  (auth gating, bus-connected field, unknown-name 404).

Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
This commit is contained in:
2026-04-22 14:10:39 -04:00
parent fcaac648a4
commit 0fbb07c2ec
18 changed files with 863 additions and 10 deletions

View File

@@ -82,6 +82,16 @@ SYSTEM_BUS_HEALTH = "bus.health"
# :func:`system_health`. The leaf constant stays the same across workers;
# the worker name goes in the middle token.
SYSTEM_HEALTH = "health"
# Worker-control leaf — built per-worker as ``system.<worker>.control`` via
# :func:`system_control`. Admin-originated stop intents travel on this
# topic; each worker subscribes to its own.
SYSTEM_CONTROL = "control"
# Control payload ``action`` values — the wire vocabulary. Only ``stop`` is
# handled in v1; ``start`` is reserved because a stopped worker has no
# subscriber, so starting requires external supervision (systemd).
WORKER_CONTROL_STOP = "stop"
WORKER_CONTROL_START = "start"
# ─── Builders ────────────────────────────────────────────────────────────────
@@ -152,6 +162,23 @@ def system_health(worker: str) -> str:
return f"{SYSTEM}.{worker}.{SYSTEM_HEALTH}"
def system_control(worker: str) -> str:
"""Build ``system.<worker>.control``.
Admin-originated stop (and, eventually, start) intents are published
here; the worker in question subscribes to its own address and reacts.
Payload shape::
{"action": "stop", "requested_by": "<username>", "ts": <unix>}
*action* must be one of :data:`WORKER_CONTROL_STOP` /
:data:`WORKER_CONTROL_START`; any other value is ignored by the
listener. Same segment rules as :func:`system_health`.
"""
_reject_tokens(worker)
return f"{SYSTEM}.{worker}.{SYSTEM_CONTROL}"
def _reject_tokens(*parts: str) -> None:
"""Reject topic segments that would break NATS-style tokenization.