feat(workers): add start + start-all endpoints (systemd supervisor)

POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown
worker, 503 if the unit file is not installed, 502 if systemctl
returns non-zero (stderr snippet in detail, full stack logged).
Admin only.

POST /api/v1/workers/start-all — best-effort: walks the worker list
in dependency order (bus → api → data-plane), skips already-active
and uninstalled units, aggregates outcomes into
{started, already_running, failed[]}. Returns 200 even on partial
failure; the caller reads the three lists.

Both endpoints delegate to the systemd_control helper, so the attack
surface for "what gets executed" is locked to `decnet-<validated-name>
.service` at two layers (router KNOWN_WORKERS + helper regex).
This commit is contained in:
2026-04-22 14:12:29 -04:00
parent 0fbb07c2ec
commit 13ea916943
4 changed files with 350 additions and 0 deletions

View File

@@ -24,6 +24,8 @@ from .config.api_reinit import router as config_reinit_router
from .health.api_get_health import router as health_router
from .workers.api_list_workers import router as workers_list_router
from .workers.api_control_worker import router as workers_control_router
from .workers.api_start_worker import router as workers_start_router
from .workers.api_start_all_workers import router as workers_start_all_router
from .artifacts.api_get_artifact import router as artifacts_router
from .swarm_updates import swarm_updates_router
from .swarm_mgmt import swarm_mgmt_router
@@ -73,6 +75,8 @@ api_router.include_router(stream_router)
api_router.include_router(health_router)
api_router.include_router(workers_list_router)
api_router.include_router(workers_control_router)
api_router.include_router(workers_start_router)
api_router.include_router(workers_start_all_router)
# Configuration
api_router.include_router(config_get_router)