Commit Graph

6 Commits

Author SHA1 Message Date
62f7c88b90 feat(swarmctl): --tls with auto-issued or BYOC server cert
swarmctl CLI gains --tls/--cert/--key/--client-ca flags. With --tls the
controller runs uvicorn under HTTPS + mTLS (CERT_REQUIRED) so worker
heartbeats can reach it cross-host. Default is still 127.0.0.1 plaintext
for backwards compat with the master-CLI enrollment flow.

Auto-issue path (no --cert/--key given): a server cert signed by the
existing DECNET CA is issued once and parked under ~/.decnet/swarmctl/.
Workers already ship that CA's ca.crt from the enroll bundle, so they
verify the endpoint with no extra trust config. BYOC via --cert/--key
when the operator wants a publicly-trusted or externally-managed cert.
The auto-cert path is idempotent across restarts to keep a stable
fingerprint for any long-lived mTLS sessions.
2026-04-19 21:46:32 -04:00
14250cacad feat(swarm): self-destruct agent on decommission
Decommissioning a worker from the dashboard (or swarm controller) now
asks the agent to wipe its own install before the master forgets it.
The agent stops decky containers + every decnet-* systemd unit, then
deletes /opt/decnet*, /etc/systemd/system/decnet-*, /var/lib/decnet/*,
and /usr/local/bin/decnet*. Logs under /var/log are preserved.

The reaper runs as a detached /tmp script (start_new_session=True) so
it survives the agent process being killed. Self-destruct dispatch is
best-effort — a dead worker doesn't block master-side cleanup.
2026-04-19 20:47:09 -04:00
7765b36c50 feat(updater): remote self-update daemon with auto-rollback
Adds a separate `decnet updater` daemon on each worker that owns the
agent's release directory and installs tarball pushes from the master
over mTLS. A normal `/update` never touches the updater itself, so the
updater is always a known-good rescuer if a bad agent push breaks
/health — the rotation is reversed and the agent restarted against the
previous release. `POST /update-self` handles updater upgrades
explicitly (no auto-rollback).

- decnet/updater/: executor, FastAPI app, uvicorn launcher
- decnet/swarm/updater_client.py, tar_tree.py: master-side push
- cli: `decnet updater`, `decnet swarm update [--host|--all]
  [--include-self] [--dry-run]`, `--updater` on `swarm enroll`
- enrollment API issues a second cert (CN=updater@<host>) signed by the
  same CA; SwarmHost records updater_cert_fingerprint
- tests: executor, app, CLI, tar tree, enroll-with-updater (37 new)
- wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link
2026-04-18 21:40:21 -04:00
39d2077a3a feat(swarm): syslog-over-TLS log pipeline (RFC 5425, TCP 6514)
Worker-side log_forwarder tails the local RFC 5424 log file and ships
each line as an octet-counted frame to the master over mTLS. Offset is
persisted in a tiny local SQLite so master outages never cause loss or
duplication — reconnect resumes from the exact byte where the previous
session left off. Impostor workers (cert not signed by DECNET CA) are
rejected at TLS handshake.

Master-side log_listener terminates mTLS on 0.0.0.0:6514, validates the
client cert, extracts the peer CN as authoritative worker provenance,
and appends each frame to the master's ingest log files. Attacker-
controlled syslog HOSTNAME field is ignored — the CA-controlled CN is
the only source of provenance.

7 tests added covering framing codec, offset persistence across
reopens, end-to-end mTLS delivery, crash-resilience (offset survives
restart, no duplicate shipping), and impostor-CA rejection.

DECNET_SWARM_SYSLOG_PORT / DECNET_SWARM_MASTER_HOST env bindings
added.
2026-04-18 19:33:58 -04:00
0c77cdab32 feat(swarm): master AgentClient — mTLS httpx wrapper around worker API
decnet.swarm.client exposes:
- MasterIdentity / ensure_master_identity(): the master's own CA-signed
  client bundle, issued once into ~/.decnet/ca/master/.
- AgentClient: async-context httpx wrapper that talks to a worker agent
  over mTLS. health/status/deploy/teardown methods mirror the agent API.

SSL context is built from a bare ssl.SSLContext(PROTOCOL_TLS_CLIENT)
instead of httpx.create_ssl_context — the latter layers on default-CA
and purpose logic that broke private-CA mTLS. Server cert is pinned by
CA + chain, not DNS (workers enroll with arbitrary SANs).

tests/swarm/test_client_agent_roundtrip.py spins uvicorn in-process
with real certs on disk and verifies:
- A CA-signed master client passes health + status calls.
- An impostor whose cert comes from a different CA cannot connect.
2026-04-18 19:08:36 -04:00
d3b90679c5 feat(swarm): PKI module — self-managed CA for master/worker mTLS
decnet.swarm.pki provides:
- generate_ca() / ensure_ca() — self-signed root, PKCS8 PEM, 4096-bit.
- issue_worker_cert() — per-worker keypair + cert signed by the CA with
  serverAuth + clientAuth EKU so the same identity backs the agent's
  HTTPS endpoint AND the syslog-over-TLS upstream.
- write_worker_bundle() / load_worker_bundle() — persist with 0600 on
  private keys.
- fingerprint() — SHA-256 DER hex for master-side pinning.

tests/swarm/test_pki.py covers:
- CA idempotency on disk.
- Signed chain validates against CA subject.
- SAN population (DNS + IP).
- Bundle roundtrip with 0600 key perms.
- End-to-end mTLS handshake between two CA-issued peers.
- Cross-CA client rejection (handshake fails).
2026-04-18 07:09:58 -04:00