feat(workers): bus-backed Workers panel (registry, control, installed flag)
Ships the backend half of Config → Workers:
* Worker registry aggregates `system.*.health` + `system.bus.health`
heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
(so the UI can explain "all UNKNOWN" when the bus socket is down)
and a per-row `installed` flag populated from
`systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
`system.<name>.control`; workers listen via the shared control
listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
sniffer / prober / mutator worker loops. API self-heartbeats too
so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
validation, control listener shutdown path, and the API surface
(auth gating, bus-connected field, unknown-name 404).
Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
from datetime import datetime, timezone
|
||||
from typing import Literal, Optional, Any, List, Annotated
|
||||
from typing import Dict, Literal, Optional, Any, List, Annotated
|
||||
from uuid import uuid4
|
||||
from sqlalchemy import Column, Index, Text, UniqueConstraint
|
||||
from sqlalchemy.dialects.mysql import MEDIUMTEXT
|
||||
@@ -452,6 +452,52 @@ class HealthResponse(BaseModel):
|
||||
components: dict[str, ComponentHealth]
|
||||
|
||||
|
||||
# --- Workers panel (Config → Workers) ---
|
||||
# Bus-backed health + control: workers heartbeat on ``system.<name>.health``
|
||||
# and listen on ``system.<name>.control``. The API aggregates last-seen
|
||||
# heartbeats via the worker registry; these are the HTTP-facing shapes.
|
||||
|
||||
class WorkerStatus(BaseModel):
|
||||
name: str
|
||||
# ``ok`` — heartbeat within 90s (3× 30s heartbeat interval)
|
||||
# ``stale`` — worker was seen before but hasn't pulsed in 90s+
|
||||
# ``unknown`` — we've never received a heartbeat from this name
|
||||
status: Literal["ok", "stale", "unknown"]
|
||||
last_heartbeat_ts: Optional[float] = None
|
||||
seconds_since: Optional[float] = None
|
||||
# Whatever the worker's ``extra()`` callback put in the heartbeat;
|
||||
# opaque to the panel, displayed only if the UI knows the key.
|
||||
extra: Dict[str, Any] = PydanticField(default_factory=dict)
|
||||
# True iff a ``decnet-<name>.service`` unit file is present on the
|
||||
# host. False flips the UI START button to disabled with a
|
||||
# "Unit not installed" tooltip. Default True for backwards compat
|
||||
# on clients that pre-date the field.
|
||||
installed: bool = True
|
||||
|
||||
|
||||
class WorkersResponse(BaseModel):
|
||||
workers: List[WorkerStatus]
|
||||
generated_at: float
|
||||
bus_connected: bool
|
||||
|
||||
|
||||
class WorkerControlResponse(BaseModel):
|
||||
accepted: bool
|
||||
worker: str
|
||||
action: str
|
||||
|
||||
|
||||
class StartFailure(BaseModel):
|
||||
name: str
|
||||
reason: str
|
||||
|
||||
|
||||
class StartAllResponse(BaseModel):
|
||||
started: List[str]
|
||||
already_running: List[str]
|
||||
failed: List[StartFailure]
|
||||
|
||||
|
||||
# --- Swarm API DTOs ---
|
||||
# Request/response contracts for the master-side swarm controller
|
||||
# (decnet/web/swarm_api.py). The underlying SQLModel tables — SwarmHost and
|
||||
|
||||
Reference in New Issue
Block a user