Commit Graph

13 Commits

Author SHA1 Message Date
ee24a7551f fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:07:53 -04:00
f597ab2810 fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 01:26:24 -04:00
542d129d6f refactor(services_live): replace string-sniffed error dispatch with typed exception subclasses
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
2026-04-30 20:49:29 -04:00
c78ba6f698 fix(deploy): pre-remove container by name before force-recreate
Docker Compose tracks the previous container by internal ID. When that
container was already removed or renamed, --force-recreate fails with
"No such container". Remove by name first so Compose always starts clean.
2026-04-30 11:54:00 -04:00
eefab020d4 fix(swarm): propagate service mutations to worker agent via shard re-dispatch
Add/remove/update_config on a fleet decky living on a swarm worker — and on
an agent-pinned topology — used to run the master's local docker-compose only,
which has no containers for the remote decky. The mutation persisted on master
and silently no-op'd on the worker.

- Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host
  shard from master state and call dispatch_decnet_config — same proven path
  as POST /swarm/deploy. Skip local _compose (no containers to touch).
- Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology
  (existing helper) to push the latest hydrated blob to the worker.
- Local-only deckies: behaviour unchanged.
- Tests: 5 new in tests/engine/test_services_live_swarm.py covering all
  three mutations on a swarm fleet decky (no local _compose, dispatch fires
  with the right host's deckies), plus apply=False save-only path (no
  dispatch), plus regression that local-only fleet add still runs local compose.

Bus signal `decky.{name}.service_config_changed` keeps publishing as an
audit trail; it is not the propagation trigger.
2026-04-29 12:51:16 -04:00
94b06ee862 feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
  against the service's config_schema before any state mutation (400 on
  bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
  _add_fleet_service, persisting validated cfg to decky_config.service_config
  BEFORE compose regen so the first `up -d --build` materialises the env on
  the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
    * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
    * MazeNET Inspector's ADD SERVICE picker
    * MazeNET palette drag-drop onto a deployed decky
  Empty-schema services short-circuit to a one-click add (no modal flash).
  Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
  on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
  (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
  service.added → service_added topic).
2026-04-29 12:44:47 -04:00
75b1ce3a31 feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
  metadata so the Inspector can auto-render forms.
- PUT  /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
  validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
  force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
  through the existing _persist_fleet_change / _rerender_topology_compose
  machinery; emits decky.<name>.service_config_changed on the bus.
2026-04-29 11:38:06 -04:00
d314470d7f fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right
Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts
TopologyDecky rows where state='running'.  No code path was flipping
that state away from the default 'pending', so the count read 0/N
even when every container was running fine — the dashboard was lying.

Two complementary fixes:

1. deploy_topology — after the post-deploy compose ps verification,
   reconcile each TopologyDecky.state from the corresponding base
   container's docker state.  running → 'running'; anything else →
   'failed'.  Reuses the ps_rows already gathered for the
   ACTIVE-vs-DEGRADED status decision; no extra docker hit.

2. apply_add_decky — _materialise_decky_spawn now returns True/False;
   on True the row is updated to state='running' before
   _assert_valid_after.  Catches the case where a decky added via the
   live mutator queue stays at 'pending' indefinitely (the deployer's
   reconcile only runs on a fresh deploy_topology pass).

Existing topology deckies in active topologies will still read as
'pending' until the next deploy_topology runs, since this is
forward-only.  An operator-side fix is to teardown + redeploy or run
the (forthcoming) reconcile-on-startup pass.
2026-04-29 11:09:32 -04:00
d595240f55 fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.

After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'.  Operators see real state on the
topology page instead of an optimistic flag.

_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
2026-04-28 23:39:50 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
5802de1f86 feat(canary): seed baseline canaries on MazeNET deckies
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.

The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
2026-04-28 22:30:11 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
c384a3103a refactor: separate engine, collector, mutator, and fleet into independent subpackages
- decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed
- decnet/collector/ — Docker log streaming (moved from web/collector.py)
- decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code)
- decnet/fleet.py — shared decky-building logic extracted from cli.py

Cross-contamination eliminated:
- web router no longer imports from decnet.cli
- mutator no longer imports from decnet.cli
- cli no longer imports from decnet.web
- _kill_api() moved to cli (process management, not engine concern)
- _compose_with_retry duplicate removed from mutator
2026-04-12 00:26:22 -04:00