From 60710d8f2f76b455b869420ab3388065ca583222 Mon Sep 17 00:00:00 2001 From: anti Date: Sat, 18 Apr 2026 20:47:19 -0400 Subject: [PATCH] docs(swarm): spell out Compose v2 plugin prerequisite Caught on a fresh Debian trixie VM: 'Docker Engine + Compose plugin' as a one-liner prerequisite is a common setup trap because trixie (and plenty of other distros) ship only the legacy 'docker-compose' (v1), not the 'docker compose' subcommand that the DECNET deployer calls. Adds two explicit install paths (Docker's apt repo for online boxes, standalone binary via scp for lab/air-gapped networks), calls out why legacy v1 does not work, and documents the exact failure signature (exit 125 + docker help text) so the next person who hits it on the worker side knows immediately what's wrong. Cross-references from the troubleshooting table. --- SWARM-Mode.md | 67 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 65 insertions(+), 2 deletions(-) diff --git a/SWARM-Mode.md b/SWARM-Mode.md index 3055e27..6d6f945 100644 --- a/SWARM-Mode.md +++ b/SWARM-Mode.md @@ -71,13 +71,75 @@ On the **master**: On each **worker**: - DECNET installed. -- Docker Engine + Compose plugin (the agent shells out to `docker compose` - exactly like UNIHOST). +- **Docker Engine + Compose v2 plugin** (the agent shells out to + `docker compose`, not the legacy `docker-compose`). This is the single + most common setup trap — verify with `docker compose version` before + enrolling. See [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) + below if your distro ships the Docker engine but not the plugin + (Debian trixie's stock repos, for example, only carry v1). - `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root). `NOPASSWD` is convenient for unattended daemons. - Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from the master (deploy/teardown/health RPCs). +### Installing Compose v2 on a worker + +If `docker compose version` prints anything other than `Docker Compose +version v2.x.y`, you need the plugin. Pick the path that matches your +environment. + +**Option A — Docker's official apt repo (recommended when it's available):** + +```bash +# Debian/Ubuntu. Adds Docker's own package source, then installs the +# compose plugin alongside whatever docker-ce/docker.io you already have. +sudo apt-get update +sudo apt-get install -y ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/debian/gpg \ + -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc +echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \ + https://download.docker.com/linux/debian $(. /etc/os-release && echo $VERSION_CODENAME) stable" \ + | sudo tee /etc/apt/sources.list.d/docker.list +sudo apt-get update +sudo apt-get install -y docker-compose-plugin +docker compose version # expect v2.x.y +``` + +For Ubuntu, swap `debian` for `ubuntu` in both the keyring URL and the +sources.list entry. + +**Option B — standalone binary (offline or restricted networks):** + +```bash +# Drop the v2 binary into Docker's CLI plugin directory. Works on any +# distro with the Docker engine already installed. +sudo mkdir -p /usr/local/lib/docker/cli-plugins +sudo curl -fsSL \ + "https://github.com/docker/compose/releases/download/v2.29.7/docker-compose-linux-$(uname -m)" \ + -o /usr/local/lib/docker/cli-plugins/docker-compose +sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose +docker compose version +``` + +If the worker can't reach GitHub directly (closed lab network, air-gapped +VM, etc.), download the binary on a box that *can* reach it and `scp` it +to the worker's `/usr/local/lib/docker/cli-plugins/docker-compose` — +that's the entire install. + +**Do not** install the legacy `docker-compose` (v1, the Python one) and +call it a day. The DECNET deployer invokes `docker compose ...` as a +subcommand, not `docker-compose ...` as a binary — they are different +programs with different code paths, and v1 is end-of-life. + +**Symptom if you get this wrong.** `decnet deploy --mode swarm` returns a +500 from the worker with +`CalledProcessError: Command '['docker', 'compose', ...]' returned +non-zero exit status 125`. The worker's agent log will show the +`docker` CLI's own help text dumped into stderr because `docker` treats +`compose` as an unknown positional when the plugin isn't installed. + Time sync is a hard requirement — mTLS cert validation fails if worker and master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`. @@ -503,6 +565,7 @@ decnet swarm decommission --name --yes | Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` | | `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first | | Worker returns 500 on `/deploy` with `ip addr show ` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying | +| Worker returns 500 on `/deploy` with `docker compose ... exit status 125` and docker help text in the log | Compose v2 plugin is not installed on the worker; the stock `docker` binary is treating `compose` as an unknown subcommand | `docker compose version` on the worker. If it doesn't print v2.x.y, see [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) | | Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts | If things are really broken and you want a clean slate on the master: