From 2c75d6593f014417081bbc543030012ba4019133 Mon Sep 17 00:00:00 2001 From: anti Date: Sat, 18 Apr 2026 20:20:42 -0400 Subject: [PATCH] docs(swarm): add SWARM Mode page and cross-link from Deployment Modes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comprehensive walkthrough for the newly-landed SWARM control plane: - Architecture diagram (master: swarmctl/listener/ingester/api; worker: agent/forwarder) with ports cheat sheet - Step-by-step setup (CA bootstrap, enrollment, bundle shipment, agent + forwarder startup, check, first swarm deploy) - Full command reference for swarmctl, listener, agent, forwarder, and the swarm enroll/list/check/decommission subcommands - Log-pipeline end-to-end story (RFC 5424 on worker → RFC 5425 mTLS on 6514 → master.json → ingester → dashboard), including tcpdump-based plaintext-leak check and source_worker provenance note - Operational concerns: master crash resume (no dup/loss), worker crash, CA rotation, cert rotation, teardown - Security posture summary - Known limitations (get_host_ip master-side bug, no web UI yet, round-robin only, single master) - Troubleshooting matrix Deployment-Modes: trimmed the old 'swarm is not implemented, drive it from Ansible' section and replaced with a link to the new page. _Sidebar: added SWARM-Mode under User docs. --- Deployment-Modes.md | 62 +++--- SWARM-Mode.md | 531 ++++++++++++++++++++++++++++++++++++++++++++ _Sidebar.md | 1 + 3 files changed, 559 insertions(+), 35 deletions(-) create mode 100644 SWARM-Mode.md diff --git a/Deployment-Modes.md b/Deployment-Modes.md index 811fff1..eca2bff 100644 --- a/Deployment-Modes.md +++ b/Deployment-Modes.md @@ -107,50 +107,42 @@ path calls `load_ini` and `build_deckies_from_ini`. \__________________|__________________/ | isolated mgmt / SIEM + | + [master] — swarmctl, listener, ingester, CA ``` -Each real host runs a UNIHOST-shaped deployment over its own slice of the IP -space. An external orchestrator (Ansible, sshpass-driven scripts, etc.) -invokes `decnet deploy --mode swarm ...` on each host in turn. The CLI -currently accepts `swarm` as a valid mode — the fleet-wide orchestration layer -lives outside the DECNET binary and is the operator's responsibility. See the -README's architecture section for the intended shape. +SWARM has a dedicated page: **[SWARM Mode](SWARM-Mode)**. That page is the +authoritative reference for setup, enrollment, the log pipeline, and +troubleshooting. -### CLI +In brief: DECNET ships a **master** (`decnet swarmctl` + `decnet listener`) +that orchestrates **workers** (`decnet agent` + `decnet forwarder`) over +HTTP+mTLS on port 8770/8765 and syslog-over-TLS (RFC 5425) on port 6514. +A self-managed CA at `~/.decnet/ca/` signs every worker cert at enrollment. -Run on each host, coordinating IP ranges so deckies do not collide: +Typical first-time flow: ``` -# host-A -sudo decnet deploy \ - --mode swarm \ - --deckies 3 \ - --interface eth0 \ - --ip-start 192.168.1.10 \ - --randomize-services +# On the master: +decnet swarmctl --daemon +decnet listener --daemon +decnet swarm enroll --name decky-vm --address 192.168.1.13 \ + --out-dir /tmp/decky-vm-bundle -# host-B -sudo decnet deploy \ - --mode swarm \ - --deckies 3 \ - --interface eth0 \ - --ip-start 192.168.1.20 \ - --randomize-services +# Ship the bundle to the worker, then on the worker: +sudo decnet agent --daemon --agent-dir ~/.decnet/agent +decnet forwarder --daemon --master-host + +# Back on the master: +decnet swarm check +decnet swarm list +decnet deploy --mode swarm --deckies 6 --services ssh,smb ``` -`--ip-start` is the operator's primary tool for partitioning the subnet across -hosts; `allocate_ips` in `decnet/network.py` starts sequentially from that -address and skips reserved / in-use IPs. - -### INI - -For reproducible swarm rollouts, give each host its own INI and drive the -rollout from Ansible (or similar): - -``` -decnet deploy --mode swarm --config ./host-A.ini -decnet deploy --mode swarm --config ./host-B.ini -``` +`deploy --mode swarm` round-robins deckies across all enrolled workers, +shards the compose config, and dispatches each shard to the matching +agent. See [SWARM Mode](SWARM-Mode) for the full walkthrough, command +reference, security posture, and troubleshooting matrix. --- diff --git a/SWARM-Mode.md b/SWARM-Mode.md new file mode 100644 index 0000000..435e39f --- /dev/null +++ b/SWARM-Mode.md @@ -0,0 +1,531 @@ +# SWARM Mode — Multi-host Deployment + +SWARM is DECNET's multi-host deployment posture. One **master** orchestrates +N **workers** (real hosts), each running a slice of the decky fleet. The +control plane speaks HTTP+mTLS; the log plane speaks RFC 5425 syslog over +mTLS on TCP 6514. Everything is signed by a single DECNET-managed CA on the +master. + +If you want a single-box deployment, stop here and read +[Deployment Modes](Deployment-Modes) → UNIHOST. SWARM has more moving parts +and is not the right starting point for first runs. + +See also: [CLI reference](CLI-Reference), +[Deployment modes](Deployment-Modes), +[Logging and syslog](Logging-and-Syslog), +[Networking: MACVLAN and IPVLAN](Networking-MACVLAN-IPVLAN), +[Teardown](Teardown-and-State). + +--- + +## Architecture in one picture + +``` +┌──────────────────────── MASTER ────────────────────────┐ ┌───────── WORKER ─────────┐ +│ │ │ │ +│ decnet api :8000 (dashboard / REST) │ │ decnet agent :8765 │ +│ decnet swarmctl :8770 (SWARM control plane) │◀mTLS▶│ (FastAPI/uvicorn) │ +│ decnet listener :6514 (syslog-over-TLS sink) │◀mTLS─│ decnet forwarder │ +│ decnet ingester (parses master.json) │ │ (tails local log file,│ +│ │ │ ships RFC 5425 over │ +│ SQLite/MySQL (shared repo, SwarmHost + DeckyShard) │ │ TCP 6514 to master) │ +│ │ │ │ +│ ~/.decnet/ca/ (self-signed CA — ca.crt, ca.key) │ │ ~/.decnet/agent/ │ +│ ~/.decnet/master/ (master client cert for swarmctl) │ │ (CA-issued bundle) │ +│ │ │ docker / compose │ +└────────────────────────────────────────────────────────┘ └──────────────────────────┘ +``` + +Four long-running processes on the master, two on each worker. Each process +is a separate supervised unit — if `swarmctl` crashes, the main `decnet api`, +the log listener, and the ingester keep running. This mirrors the +`start_new_session=True` subprocess pattern used everywhere else in DECNET +(`decnet/cli.py::api` and friends). + +### Ports cheat sheet + +| Port | Process | Host | Protocol | mTLS? | +|------|--------------------|---------|------------------|-------| +| 8000 | `decnet api` | master | HTTP | no | +| 8770 | `decnet swarmctl` | master | HTTP | no * | +| 6514 | `decnet listener` | master | syslog (RFC5425) | **yes** | +| 8765 | `decnet agent` | worker | HTTPS | **yes** | +| 5140 | local collector | worker | syslog | no (loopback) | + +\* `swarmctl` binds to `127.0.0.1` by default and is called by the local +`decnet` CLI. If you need to drive it from outside the master box, put it +behind a reverse proxy with your own auth — it is not hardened for public +exposure. + +--- + +## Prerequisites + +On the **master**: + +- DECNET installed (`pip install -e .`) in the venv you plan to run from. +- Write access to `~/.decnet/` (CA and master bundle land here). +- A reachable listen address for port 6514 and 8765→master replies. +- Docker is **not** needed on the master unless the master is also a worker. + +On each **worker**: + +- DECNET installed. +- Docker Engine + Compose plugin (the agent shells out to `docker compose` + exactly like UNIHOST). +- `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root). + `NOPASSWD` is convenient for unattended daemons. +- Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from + the master (deploy/teardown/health RPCs). + +Time sync is a hard requirement — mTLS cert validation fails if worker and +master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`. + +--- + +## Setup walkthrough + +This is a complete, literal walkthrough. Follow it top to bottom the first +time. Every command is either run **on master** or **on worker** — +annotated below each block. + +### 1. Master — start the control plane + +```bash +# Start the SWARM controller. First run creates ~/.decnet/ca/ automatically +# (self-signed CA, ca.crt/ca.key) and ~/.decnet/master/ (client cert for +# the master process's own identity when talking to worker agents). +decnet swarmctl --daemon --host 127.0.0.1 --port 8770 + +# Start the log listener. First run creates master.log (RFC 5424 forensic +# sink, every line verbatim) and master.json (one JSON object per event for +# the ingester). +mkdir -p ~/.decnet/master-logs +decnet listener --daemon \ + --host 0.0.0.0 --port 6514 \ + --log-path ~/.decnet/master-logs/master.log \ + --json-path ~/.decnet/master-logs/master.json + +# Confirm both are up. +curl -sf http://127.0.0.1:8770/health && echo OK +ss -tlnp | grep -E '8770|6514' +``` + +`--daemon` detaches to a new session (same `_daemonize()` as `decnet api`). +Without it, the command stays in the foreground. + +At this point: +- `~/.decnet/ca/ca.crt` is the CA every worker will trust. +- `~/.decnet/ca/ca.key` **must never leave the master**. Treat it like an + SSH host key: losing it means re-enrolling every worker. +- `~/.decnet/master/` holds the master's own client certificate that + `swarmctl` uses to authenticate outbound RPCs to worker agents. + +### 2. Master — enroll a worker + +The enrollment command is a single call that does four things: + +1. Generates a worker keypair + CSR on the master (the private key is + written directly to the output bundle; it never touches the wire). +2. Signs the CSR with the CA, producing `worker.crt`. +3. Records a `SwarmHost` row in the shared repo with status `enrolled` and + the cert fingerprint. +4. Writes the bundle files to `--out-dir` for you to ship to the worker. + +```bash +decnet swarm enroll \ + --name decky-vm \ + --address 192.168.1.13 \ + --sans decky-vm.lan,192.168.1.13 \ + --out-dir /tmp/decky-vm-bundle +``` + +`--name` is the worker's DECNET identity — it becomes the cert CN and the +`source_worker` tag on every log line forwarded from that host. Pick names +you can grep for. Must be unique; re-enrolling the same name is rejected. + +`--address` is the worker's IP as reachable from the master. This is what +the master's control-plane client will connect to for deploy/teardown RPCs. + +`--sans` is a comma-separated list of Subject Alternative Names. Include +every DNS name and IP the master might use to reach the worker. At minimum, +include the IP you passed to `--address`. + +Output (`/tmp/decky-vm-bundle/`): + +``` +ca.crt # the DECNET CA certificate +worker.crt # CA-signed client+server cert for this worker +worker.key # worker private key (mode 0600) +``` + +### 3. Ship the bundle to the worker + +Any secure channel works — this is a plain file copy. `scp`, `rsync`, +`sshpass` in a closet lab — pick your poison: + +```bash +# From the master: +scp -r /tmp/decky-vm-bundle/* anti@192.168.1.13:~/.decnet/agent/ +``` + +On the worker, the bundle must land at `~/.decnet/agent/` of the user that +will run `decnet agent`. **Watch out for `sudo`**: if you run the agent +under `sudo`, `$HOME` expands to `/root`, not `/home/anti`. Either put the +bundle under `/root/.decnet/agent/`, or pass `--agent-dir` to override. + +After copying, `chmod 600 ~/.decnet/agent/worker.key` and delete the master +copy. + +### 4. Worker — start the agent + forwarder + +```bash +# On the worker, as the user whose $HOME holds the bundle (or with --agent-dir). +sudo decnet agent --daemon \ + --host 0.0.0.0 --port 8765 \ + --agent-dir /home/anti/.decnet/agent + +# The forwarder tails the worker's local decky log file and ships each +# line, octet-framed and mTLS-wrapped, to the master listener. +decnet forwarder --daemon \ + --master-host 192.168.1.13-master-ip \ + --master-port 6514 \ + --log-path /var/log/decnet/decnet.log \ + --state-db ~/.decnet/agent/forwarder.db \ + --agent-dir /home/anti/.decnet/agent +``` + +`--state-db` holds a single table that records the forwarder's byte offset +into the log file. On reconnect after a master outage, the forwarder +**resumes from the stored offset** — no duplicates, no gaps. Truncation +(logrotate) is detected (`st_size < offset`) and resets the offset to 0. + +`--master-host` / `--master-port` can also be set via +`DECNET_SWARM_MASTER_HOST` / `DECNET_SWARM_MASTER_PORT` so operators can +bake them into a systemd unit or `.env` file. + +### 5. Master — confirm the worker is alive + +```bash +# List enrolled workers. Fresh enrollments are status=enrolled until the +# first successful health ping flips them to active. +decnet swarm list + +# Poll worker agents. On success, flips SwarmHost.status to active and +# stamps SwarmHost.last_heartbeat. +decnet swarm check + +decnet swarm list +# name=decky-vm status=active last_heartbeat=2026-04-18T... +``` + +If `check` reports `reachable: false`, the usual suspects are: the agent +isn't running, the master cannot reach worker:8765 (firewall / NAT), +`--address` at enrollment doesn't match the worker's actual IP, or clock +skew is breaking cert validity. + +### 6. Deploy deckies across the swarm + +```bash +decnet deploy --mode swarm --deckies 6 --services ssh,smb --dry-run +# Round-robins 6 deckies across all enrolled workers (with status IN +# (enrolled, active)) and prints the compose-shard plan. + +decnet deploy --mode swarm --deckies 6 --services ssh,smb +# Live run: POSTs each worker's shard to swarmctl, which fans out to each +# agent's /deploy, which calls the same deployer.py used in UNIHOST. +``` + +Sharding is **round-robin** by enrollment order. If you have workers A and +B and ask for 3 deckies, A gets 2 and B gets 1. If you want a different +distribution, run two separate `deploy` calls with filtered host lists +(feature request; see Known Limitations). + +Empty swarm is a hard error: `deploy --mode swarm` with zero enrolled +workers exits non-zero with `No enrolled workers`. + +--- + +## Command reference + +All of these live in `decnet/cli.py`. Run `decnet --help` for the +authoritative option list. What follows are the knobs you will actually +care about. + +### `decnet swarmctl` + +Master-side SWARM control plane. FastAPI app at `decnet.web.swarm_api:app`, +launched as a subprocess of `uvicorn`. Daemonizable. + +``` +--host 127.0.0.1 Bind address. Keep this on loopback unless you know + what you're doing; swarmctl is called by the local CLI. +--port 8770 Bind port. +--daemon / -d Detach to background. +``` + +### `decnet listener` + +Master-side RFC 5425 syslog-over-TLS sink. Validates every client cert +against `~/.decnet/ca/ca.crt`, tags each line with the peer's CN as +`source_worker`, and writes two sinks: an RFC 5424 `.log` file and a parsed +`.json` file (one object per event). + +``` +--host 0.0.0.0 Bind address. +--port 6514 IANA-assigned syslog-TLS port. +--log-path Raw RFC 5424 sink. default: ./master.log +--json-path Parsed JSON sink. default: ./master.json +--ca-dir CA to verify peers. default: ~/.decnet/ca +--daemon / -d Detach to background. +``` + +### `decnet agent` + +Worker-side control-plane daemon. FastAPI on HTTPS with +`ssl.CERT_REQUIRED`. Endpoints: `/deploy`, `/teardown`, `/status`, +`/health`, `/mutate`. All require a CA-signed client cert (the master's). + +``` +--host 0.0.0.0 Bind address. +--port 8765 Bind port. +--agent-dir Override bundle location. Required when running under + sudo or any user whose $HOME doesn't hold the bundle. +--daemon / -d Detach to background. +``` + +### `decnet forwarder` + +Worker-side log shipper. Tails `--log-path` (default: +`DECNET_INGEST_LOG_FILE`, i.e. the same file the local collector writes), +frames each line per RFC 5425 octet-counting, and writes it to +master:6514 over mTLS. Offset state is persisted in SQLite so a master +crash is recoverable without data loss. + +``` +--master-host Master IP. env: DECNET_SWARM_MASTER_HOST +--master-port Listener port. default: 6514 +--log-path File to tail. default: DECNET_INGEST_LOG_FILE +--state-db Offset SQLite. default: ~/.decnet/agent/forwarder.db +--agent-dir Bundle dir. default: ~/.decnet/agent +--poll-interval File tail interval. default: 0.5 +--daemon / -d Detach to background. +``` + +### `decnet swarm enroll` + +Issues a worker bundle and records a `SwarmHost` row. + +``` +--name Worker identity (CN + source_worker tag). Required. +--address IP/hostname the master uses to reach the agent. Required. +--sans a,b,c Subject Alternative Names. default: [--address] +--out-dir Where to write the bundle. default: ./-bundle +--agent-port Port to record on the host row. default: 8765 +--notes Free-form annotation, shown in `swarm list`. +``` + +### `decnet swarm list` + +Prints the `SwarmHost` rows as a table. + +``` +--status + Filter. default: all except decommissioned. +--json Emit JSON, not a table. Useful for scripting. +``` + +### `decnet swarm check` + +Synchronously polls every active/enrolled agent's `/health`. On success, +flips status to `active` and stamps `last_heartbeat`. On failure, flips to +`unreachable` and records the error. + +### `decnet swarm decommission` + +Marks a host `decommissioned` in the repo, tears down any running deckies +on it via the agent (if reachable), and **revokes** the worker's cert from +the master's active-set. The worker's bundle files are not deleted from the +worker — you are expected to wipe those out of band. + +``` +--name | --uuid Identify by either. One is required. +--yes Skip confirmation prompt. +--keep-deckies Leave containers running on the worker. + Use this when reassigning hardware. +``` + +### `decnet deploy --mode swarm` + +Round-robins the requested deckies across enrolled workers and dispatches +to `swarmctl`, which POSTs each shard to the matching agent. Compose +generation is shared with UNIHOST; only the **distribution** differs. + +``` +--deckies Total fleet size across all workers. +--services a,b,c Fixed service set for every decky. +--randomize-services Per-decky random subset from the catalog. +--archetype Pick from Archetypes (see wiki page). +--dry-run Print the shard plan; no RPC. +``` + +--- + +## Log pipeline — what actually happens to an attack event + +1. Attacker hits a decky. The decky's in-container emit helper writes an + RFC 5424 line to `stdout` and to `/var/log/decnet/decnet.log` inside the + container. (See [Logging and syslog](Logging-and-Syslog).) +2. Worker's local collector picks the event up over loopback syslog + (worker:5140). Plaintext is fine here — it never leaves the host. +3. The collector appends the parsed RFC 5424 to + `DECNET_INGEST_LOG_FILE` on the worker. +4. `decnet forwarder` tails that file, octet-frames each line, connects + to `master:6514` over mTLS (trust root = DECNET CA), and sends it. +5. `decnet listener` on the master validates the peer cert, extracts + the CN as `source_worker`, enriches the structured data, and writes + to `master.log` + `master.json`. +6. `decnet ingester` tails `master.json` and inserts rows into the shared + repo — the same code path UNIHOST uses. +7. The web dashboard (`decnet api`) queries the repo; live-logs stream + over SSE. + +**Verify on the wire.** A `tcpdump -i any 'port 6514'` on the master +should show only `ClientHello` / `ServerHello` / `Certificate` / encrypted +`ApplicationData` records. No plaintext `<13>1 2026-...` — if you see +those, something is wrong and logs are leaking in the clear. The listener +refuses plaintext connections outright. + +**Verify provenance.** Every line in `master.log` has a `source_worker=...` +SD element populated from the peer cert's CN. The master does **not** +trust a `source_worker` the worker tries to set itself — it is always +overwritten with the authenticated cert identity. A compromised worker +cannot impersonate another worker's name. + +--- + +## Operational concerns + +### Master crash / restart + +Kill the listener mid-shipment. The forwarder detects the dropped +connection, retries with exponential backoff (capped at 30s), buffers +writes **into the worker's local log file** (not RAM), and on reconnect +resumes shipping from the last committed offset in `forwarder.db`. + +Guarantee: **no duplicates, no loss**, across any number of master +restarts, as long as the worker's disk is intact. Verified end-to-end in +`tests/swarm/test_forwarder_resilience.py`. + +### Worker crash / restart + +The agent is stateless at the process level — all state lives in the +bundle on disk plus whatever Docker has running. `systemctl restart +decnet-agent` (or equivalent) is safe at any time. The forwarder picks +up exactly where it left off. + +### Rotating the CA + +Don't. The CA key signs every worker cert. Replacing it means re-enrolling +every worker. If the CA key is compromised, treat it as a full rebuild: +decommission every worker, delete `~/.decnet/ca/`, restart `swarmctl` (it +regenerates a fresh CA), re-enroll every worker with fresh bundles. + +### Rotating a single worker cert + +``` +decnet swarm decommission --name decky-old --yes +decnet swarm enroll --name decky-new --address \ + --out-dir /tmp/decky-new-bundle +# ship the new bundle, restart the agent pointed at it. +``` + +There is no in-place rotation — decommission + re-enroll is the path. + +### Teardown + +```bash +# Master: tear down all deckies across all workers, then stop control plane. +decnet teardown --all --mode swarm + +# On each worker, if you want to remove the bundle: +rm -rf ~/.decnet/agent +systemctl stop decnet-agent decnet-forwarder + +# Master, to fully wipe swarm state: +decnet swarm decommission --name --yes +# This leaves ~/.decnet/ca/ intact so you can re-enroll later. To fully +# wipe: rm -rf ~/.decnet/ca ~/.decnet/master +``` + +--- + +## Security posture, briefly + +- **Every control-plane connection** is mTLS. No token auth, no HTTP + fallback, no "just for testing" plaintext knob. +- **Every log-plane connection** is mTLS (RFC 5425 on 6514). Plaintext + syslog over the wire is refused. +- The master CA signs both the master's own client cert and every worker + cert. Certs carry SANs so hostname verification actually works — the + worker will reject a master that presents a cert without the worker's + address in the SANs. +- The listener tags every incoming line with the authenticated peer CN. + A worker cannot spoof another worker's identity. +- `swarmctl` binds to loopback by default. If you expose it, put real + auth in front. + +--- + +## Known limitations + +- **`deploy --mode swarm` runs `get_host_ip(--interface)` on the master** + before dispatching to workers. This means `--interface` must name a NIC + that exists on the master. If your workers have different NIC names + (common in heterogeneous fleets), this fails. Workaround: use INI + per-worker configs that hardcode the right subnet, and call deploy + once per worker. A proper fix (defer network detection to the worker + agent) is tracked in `Roadmap-and-Known-Debt`. +- **No web UI for swarm management yet.** CLI only. Dashboard integration + is on the roadmap. +- **No automatic discovery.** Workers don't broadcast; enrollment is + explicit and that's intentional. +- **Single master.** No HA. If the master dies, the control plane is gone + until it comes back. Workers keep buffering logs and keep serving + attackers — they don't need the master to stay up — but you can't issue + new deploys or tear anything down while the master is down. +- **Sharding is round-robin.** No weights, no affinity, no "run the + high-interaction HTTPS decky on the beefy box". Feature request. + +--- + +## Troubleshooting + +| Symptom | Likely cause | Fix | +|---|---|---| +| `swarm check` says `reachable: false` | Agent not running, firewall, wrong `--address` at enrollment, or clock skew | `curl -k https://:8765/health` from the master, check `ntpq`/`chronyc tracking`, re-enroll if the IP was wrong | +| Forwarder logs `ssl.SSLCertVerificationError` | Bundle mismatch (ca.crt ≠ master's CA) or clock skew | Re-download the bundle from `swarm enroll`, check time sync | +| Forwarder logs `ConnectionRefusedError` on 6514 | Listener not running, or binding to the wrong interface | `ss -tlnp \| grep 6514` on the master | +| `swarm list` shows `status=enrolled` indefinitely | `swarm check` has never been run, or agent is unreachable | Run `swarm check`; see row 1 if that fails | +| Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` | +| `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first | +| `deploy --mode swarm` fails on `get_host_ip` | The NIC name you passed doesn't exist on the master | See Known Limitations; use per-host INI files | +| Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts | + +If things are really broken and you want a clean slate on the master: + +```bash +systemctl stop decnet-swarmctl decnet-listener # or your supervisor of choice +rm -rf ~/.decnet/ca ~/.decnet/master ~/.decnet/master-logs +# SwarmHost rows live in the shared repo; clear them if you want a clean DB. +sqlite3 ~/.decnet/decnet.db 'DELETE FROM swarmhost; DELETE FROM deckyshard;' +``` + +And on every worker: + +```bash +systemctl stop decnet-agent decnet-forwarder +rm -rf ~/.decnet/agent +``` + +Then start from step 1 of [Setup walkthrough](#setup-walkthrough). diff --git a/_Sidebar.md b/_Sidebar.md index 1342d66..d946d0a 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -17,6 +17,7 @@ - [OS-Fingerprint-Spoofing](OS-Fingerprint-Spoofing) - [Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN) - [Deployment-Modes](Deployment-Modes) +- [SWARM-Mode](SWARM-Mode) - [Environment-Variables](Environment-Variables) - [Teardown-and-State](Teardown-and-State) - [Database-Drivers](Database-Drivers)