Commit Graph

12 Commits

Author SHA1 Message Date
5b70a34c94 fix(routes): added undocumented responses 2026-04-20 01:23:07 -04:00
148e51011c feat(swarm): agent→master heartbeat with per-host cert pinning
New POST /swarm/heartbeat on the swarm controller. Workers post every
~30s with the output of executor.status(); the master bumps
SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh
DeckyConfig snapshot and runtime-derived state (running/degraded).

Security: CA-signed mTLS alone is not sufficient — a decommissioned
worker's still-valid cert could resurrect ghost shards. The endpoint
extracts the presented peer cert (primary: scope["extensions"]["tls"],
fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it
to the SwarmHost.client_cert_fingerprint stored for the claimed
host_uuid. Extraction is factored into _extract_peer_fingerprint so
tests can exercise both uvicorn scope shapes and the both-unavailable
fail-closed path without mocking uvicorn's TLS pipeline.

Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel
impl reuses the indexed client_cert_fingerprint column).
2026-04-19 21:37:15 -04:00
3ebd206bca feat(swarm): persist DeckyConfig snapshot per shard + enrich list API
Dispatch now writes the full serialised DeckyConfig into
DeckyShard.decky_config (plus decky_ip as a cheap extract), so the
master can render the same rich per-decky card the local-fleet view
uses — hostname, distro, archetype, service_config, mutate_interval,
last_mutated — without round-tripping to the worker on every page
render. DeckyShardView gains the corresponding fields; the repository
flattens the snapshot at read time. Pre-migration rows keep working
(fields fall through as None/defaults).

Columns are additive + nullable so SQLModel.metadata.create_all handles
the change on both SQLite and MySQL. Backfill happens organically on
the next dispatch or (in a follow-up) agent heartbeat.
2026-04-19 21:29:45 -04:00
14250cacad feat(swarm): self-destruct agent on decommission
Decommissioning a worker from the dashboard (or swarm controller) now
asks the agent to wipe its own install before the master forgets it.
The agent stops decky containers + every decnet-* systemd unit, then
deletes /opt/decnet*, /etc/systemd/system/decnet-*, /var/lib/decnet/*,
and /usr/local/bin/decnet*. Logs under /var/log are preserved.

The reaper runs as a detached /tmp script (start_new_session=True) so
it survives the agent process being killed. Self-destruct dispatch is
best-effort — a dead worker doesn't block master-side cleanup.
2026-04-19 20:47:09 -04:00
df18cb44cc fix(swarm): don't paint healthy deckies as failed when a shard-sibling fails
docker compose up is partial-success-friendly — a build failure on one
service doesn't roll back the others. But the master was catching the
agent's 500 and tagging every decky in the shard as 'failed' with the
same error message. From the UI that looked like all three deckies died
even though two were live on the worker.

On dispatch exception, probe the agent's /status to learn which deckies
actually have running containers, and upsert per-decky state accordingly.
Only fall back to marking the whole shard failed if the status probe
itself is unreachable.

Enhance agent.executor.status() to include a 'runtime' map keyed by
decky name with per-service container state, so the master has something
concrete to consult.
2026-04-19 20:11:08 -04:00
5df995fda1 feat(enroll): opt-in IPvlan per-agent for Wi-Fi-bridged VMs
Wi-Fi APs bind one MAC per associated station, so VirtualBox/VMware
guests bridged over Wi-Fi rotate the VM's DHCP lease when Docker's
macvlan starts emitting container-MAC frames through the vNIC. Adds a
`use_ipvlan` toggle on the Agent Enrollment tab (mirrors the updater
daemon checkbox): flips the flag on SwarmHost, bakes `ipvlan=true` into
the agent's decnet.ini, and `_worker_config` forces ipvlan=True on the
per-host shard at dispatch. Safe no-op on wired/bare-metal agents.
2026-04-19 17:57:45 -04:00
79db999030 feat(fleet): auto-swarm deploy — shard across enrolled workers when master
POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence:
when the caller is a master with at least one reachable swarm host, round-
robin host_uuids are assigned over new deckies and the config is dispatched
via AgentClient. Falls back to local docker-compose otherwise.

Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config
so both endpoints share the same shard/dispatch/persist path. Adds
GET /system/deployment-mode for the UI to show 'will shard across N hosts'
vs 'will deploy locally' before the operator clicks deploy.
2026-04-19 06:09:08 -04:00
7765b36c50 feat(updater): remote self-update daemon with auto-rollback
Adds a separate `decnet updater` daemon on each worker that owns the
agent's release directory and installs tarball pushes from the master
over mTLS. A normal `/update` never touches the updater itself, so the
updater is always a known-good rescuer if a bad agent push breaks
/health — the rotation is reversed and the agent restarted against the
previous release. `POST /update-self` handles updater upgrades
explicitly (no auto-rollback).

- decnet/updater/: executor, FastAPI app, uvicorn launcher
- decnet/swarm/updater_client.py, tar_tree.py: master-side push
- cli: `decnet updater`, `decnet swarm update [--host|--all]
  [--include-self] [--dry-run]`, `--updater` on `swarm enroll`
- enrollment API issues a second cert (CN=updater@<host>) signed by the
  same CA; SwarmHost records updater_cert_fingerprint
- tests: executor, app, CLI, tar tree, enroll-with-updater (37 new)
- wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link
2026-04-18 21:40:21 -04:00
8914c27220 feat(swarm): add decnet swarm deckies to list deployed shards by host
`swarm list` only shows enrolled workers — there was no way to see which
deckies are running and where. Adds GET /swarm/deckies on the controller
(joins DeckyShard with SwarmHost for name/address/status) plus the CLI
wrapper with --host / --state filters and --json.
2026-04-18 21:10:07 -04:00
e2d6f857b5 refactor(swarm): move router DTOs into decnet/web/db/models.py
_schemas.py was a local exception to the codebase convention. The rest
of the app keeps all API request/response DTOs in decnet/web/db/models.py
alongside UserResponse, DeployIniRequest, etc. — the swarm endpoints now
follow the same convention (SwarmEnrollRequest, SwarmHostView, etc).
Deletes decnet/web/router/swarm/_schemas.py.
2026-04-18 19:28:15 -04:00
811136e600 refactor(swarm): one file per endpoint, matching existing router layout
Splits the three grouped router files into eight api_<verb>_<resource>.py
modules under decnet/web/router/swarm/ to match the convention used by
router/fleet/ and router/config/. Shared request/response models live in
_schemas.py. Keeps each endpoint easy to locate and modify without
stepping on siblings.
2026-04-18 19:23:06 -04:00
63b0a58527 feat(swarm): master-side SWARM controller (swarmctl) + agent CLI
Adds decnet/web/swarm_api.py as an independent FastAPI app with routers
for host enrollment, deployment dispatch (sharding DecnetConfig across
enrolled workers via AgentClient), and active health probing. Runs as
its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation
pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for
the worker side.

29 tests added under tests/swarm/test_swarm_api.py cover enrollment
(including bundle generation + duplicate rejection), host CRUD, sharding
correctness, non-swarm-mode rejection, teardown, and health probes with
a stubbed AgentClient.
2026-04-18 19:18:33 -04:00