DECNET

Author	SHA1	Message	Date
anti	c78ba6f698	fix(deploy): pre-remove container by name before force-recreate Docker Compose tracks the previous container by internal ID. When that container was already removed or renamed, --force-recreate fails with "No such container". Remove by name first so Compose always starts clean.	2026-04-30 11:54:00 -04:00
anti	eefab020d4	fix(swarm): propagate service mutations to worker agent via shard re-dispatch Add/remove/update_config on a fleet decky living on a swarm worker — and on an agent-pinned topology — used to run the master's local docker-compose only, which has no containers for the remote decky. The mutation persisted on master and silently no-op'd on the worker. - Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host shard from master state and call dispatch_decnet_config — same proven path as POST /swarm/deploy. Skip local _compose (no containers to touch). - Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology (existing helper) to push the latest hydrated blob to the worker. - Local-only deckies: behaviour unchanged. - Tests: 5 new in tests/engine/test_services_live_swarm.py covering all three mutations on a swarm fleet decky (no local _compose, dispatch fires with the right host's deckies), plus apply=False save-only path (no dispatch), plus regression that local-only fleet add still runs local compose. Bus signal `decky.{name}.service_config_changed` keeps publishing as an audit trail; it is not the propagation trigger.	2026-04-29 12:51:16 -04:00
anti	94b06ee862	feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector - DeckyServiceAddRequest gains an optional `config: dict` field, validated against the service's config_schema before any state mutation (400 on bad type, no half-written rows). - Engine: add_service threads `config` into _add_topology_service / _add_fleet_service, persisting validated cfg to decky_config.service_config BEFORE compose regen so the first `up -d --build` materialises the env on the new container. No follow-up apply needed. - Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by: * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component) * MazeNET Inspector's ADD SERVICE picker * MazeNET palette drag-drop onto a deployed decky Empty-schema services short-circuit to a one-click add (no modal flash). Operator can cancel; errors surface in the modal. - Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent on bad types, back-compat empty-config. - Drive-by: fix stale repo-method names in test_services_live.py (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper, service.added → service_added topic).	2026-04-29 12:44:47 -04:00
anti	75b1ce3a31	feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology - GET /topologies/services/{name}/schema serves the declared ServiceConfigField metadata so the Inspector can auto-render forms. - PUT /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the validated dict (DB + compose); container untouched (Save). - POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive). - New engine helper update_service_config wires both fleet and topology paths through the existing _persist_fleet_change / _rerender_topology_compose machinery; emits decky.<name>.service_config_changed on the bus.	2026-04-29 11:38:06 -04:00
anti	d314470d7f	fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts TopologyDecky rows where state='running'. No code path was flipping that state away from the default 'pending', so the count read 0/N even when every container was running fine — the dashboard was lying. Two complementary fixes: 1. deploy_topology — after the post-deploy compose ps verification, reconcile each TopologyDecky.state from the corresponding base container's docker state. running → 'running'; anything else → 'failed'. Reuses the ps_rows already gathered for the ACTIVE-vs-DEGRADED status decision; no extra docker hit. 2. apply_add_decky — _materialise_decky_spawn now returns True/False; on True the row is updated to state='running' before _assert_valid_after. Catches the case where a decky added via the live mutator queue stays at 'pending' indefinitely (the deployer's reconcile only runs on a fresh deploy_topology pass). Existing topology deckies in active topologies will still read as 'pending' until the next deploy_topology runs, since this is forward-only. An operator-side fix is to teardown + redeploy or run the (forthcoming) reconcile-on-startup pass.	2026-04-29 11:09:32 -04:00
anti	d595240f55	fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash deploy_topology was flipping to ACTIVE the moment 'compose up -d' returned 0, but compose returns 0 as soon as containers are started. A service that crashes on boot (port bind failure, bad image, missing entrypoint) left the topology row sitting at ACTIVE indefinitely while half the substrate was dead. After compose returns, we now run 'compose ps --all --format json', parse the newline-delimited per-container rows, and downgrade to DEGRADED with a reason listing the first eight unhealthy containers if anything isn't in state='running'. Operators see real state on the topology page instead of an optimistic flag. _compose_ps swallows compose-level errors (returns []) so an unrelated docker hiccup doesn't gate the success path — the existing in-flight exception path still catches genuine deploy failures with FAILED.	2026-04-28 23:39:50 -04:00
anti	6ac8cac908	feat(deckies): live service add/remove without full redeploy decnet.engine.services_live exposes add_service / remove_service for both fleet and topology decky scopes. The host's _compose() wrapper already supported per-service targeting (up --no-deps -d <svc>, stop, rm -f); what was missing was the orchestration around it: * add: validate against decnet.services.registry (rejects unknown + fleet_singleton); persist the new services list; re-render the per-scope compose file (so future redeploys reflect the change); run docker compose up -d --no-deps --build <decky>-<svc>. * remove: stop + rm -f the service container; persist; re-render compose so a future up -d doesn't bring it back. Both publish decky.<name>.service.added / .removed on the bus, with the post-mutation services list. Topic constants added to decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored). Four new admin endpoints: * POST/DELETE /api/v1/deckies/{name}/services{,/svc} * POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc} ServiceMutationError messages are mapped at the API boundary to 404 (decky/topology missing), 409 (idempotency violation), 422 (unknown or fleet_singleton service).	2026-04-28 22:51:42 -04:00
anti	5802de1f86	feat(canary): seed baseline canaries on MazeNET deckies Topology deploys now plant the configured canary baseline set on every decky in the topology, mirroring the fleet-deploy hook. Containers are resolved via resolve_topology_container — <decky>-ssh when the decky exposes an ssh service, else the topology base container decnet_t_<id8>_<decky>. The planter's plant/revoke/seed_baseline grow an optional container= kwarg; default preserves the fleet <name>-ssh resolution.	2026-04-28 22:30:11 -04:00
anti	862e4dbb31	merge: testing → main (reconcile 2-week divergence)	2026-04-28 18:36:00 -04:00
anti	c384a3103a	refactor: separate engine, collector, mutator, and fleet into independent subpackages - decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed - decnet/collector/ — Docker log streaming (moved from web/collector.py) - decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code) - decnet/fleet.py — shared decky-building logic extracted from cli.py Cross-contamination eliminated: - web router no longer imports from decnet.cli - mutator no longer imports from decnet.cli - cli no longer imports from decnet.web - _kill_api() moved to cli (process management, not engine concern) - _compose_with_retry duplicate removed from mutator	2026-04-12 00:26:22 -04:00

10 Commits