feat(workers): bus-backed Workers panel (registry, control, installed flag)

Ships the backend half of Config → Workers:

* Worker registry aggregates `system.*.health` + `system.bus.health`
  heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
  out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
  (so the UI can explain "all UNKNOWN" when the bus socket is down)
  and a per-row `installed` flag populated from
  `systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
  `system.<name>.control`; workers listen via the shared control
  listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
  sniffer / prober / mutator worker loops. API self-heartbeats too
  so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
  validation, control listener shutdown path, and the API surface
  (auth gating, bus-connected field, unknown-name 404).

Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
This commit is contained in:
2026-04-22 14:10:39 -04:00
parent fcaac648a4
commit 0fbb07c2ec
18 changed files with 863 additions and 10 deletions

View File

@@ -1,5 +1,5 @@
from datetime import datetime, timezone
from typing import Literal, Optional, Any, List, Annotated
from typing import Dict, Literal, Optional, Any, List, Annotated
from uuid import uuid4
from sqlalchemy import Column, Index, Text, UniqueConstraint
from sqlalchemy.dialects.mysql import MEDIUMTEXT
@@ -452,6 +452,52 @@ class HealthResponse(BaseModel):
components: dict[str, ComponentHealth]
# --- Workers panel (Config → Workers) ---
# Bus-backed health + control: workers heartbeat on ``system.<name>.health``
# and listen on ``system.<name>.control``. The API aggregates last-seen
# heartbeats via the worker registry; these are the HTTP-facing shapes.
class WorkerStatus(BaseModel):
name: str
# ``ok`` — heartbeat within 90s (3× 30s heartbeat interval)
# ``stale`` — worker was seen before but hasn't pulsed in 90s+
# ``unknown`` — we've never received a heartbeat from this name
status: Literal["ok", "stale", "unknown"]
last_heartbeat_ts: Optional[float] = None
seconds_since: Optional[float] = None
# Whatever the worker's ``extra()`` callback put in the heartbeat;
# opaque to the panel, displayed only if the UI knows the key.
extra: Dict[str, Any] = PydanticField(default_factory=dict)
# True iff a ``decnet-<name>.service`` unit file is present on the
# host. False flips the UI START button to disabled with a
# "Unit not installed" tooltip. Default True for backwards compat
# on clients that pre-date the field.
installed: bool = True
class WorkersResponse(BaseModel):
workers: List[WorkerStatus]
generated_at: float
bus_connected: bool
class WorkerControlResponse(BaseModel):
accepted: bool
worker: str
action: str
class StartFailure(BaseModel):
name: str
reason: str
class StartAllResponse(BaseModel):
started: List[str]
already_running: List[str]
failed: List[StartFailure]
# --- Swarm API DTOs ---
# Request/response contracts for the master-side swarm controller
# (decnet/web/swarm_api.py). The underlying SQLModel tables — SwarmHost and