docs(swarm): spell out Compose v2 plugin prerequisite

Caught on a fresh Debian trixie VM: 'Docker Engine + Compose plugin' as
a one-liner prerequisite is a common setup trap because trixie (and
plenty of other distros) ship only the legacy 'docker-compose' (v1), not
the 'docker compose' subcommand that the DECNET deployer calls.

Adds two explicit install paths (Docker's apt repo for online boxes,
standalone binary via scp for lab/air-gapped networks), calls out why
legacy v1 does not work, and documents the exact failure signature
(exit 125 + docker help text) so the next person who hits it on the
worker side knows immediately what's wrong. Cross-references from the
troubleshooting table.
2026-04-18 20:47:19 -04:00
parent 19ed1c2753
commit 60710d8f2f

@@ -71,13 +71,75 @@ On the **master**:
On each **worker**: On each **worker**:
- DECNET installed. - DECNET installed.
- Docker Engine + Compose plugin (the agent shells out to `docker compose` - **Docker Engine + Compose v2 plugin** (the agent shells out to
exactly like UNIHOST). `docker compose`, not the legacy `docker-compose`). This is the single
most common setup trap — verify with `docker compose version` before
enrolling. See [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker)
below if your distro ships the Docker engine but not the plugin
(Debian trixie's stock repos, for example, only carry v1).
- `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root). - `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root).
`NOPASSWD` is convenient for unattended daemons. `NOPASSWD` is convenient for unattended daemons.
- Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from - Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from
the master (deploy/teardown/health RPCs). the master (deploy/teardown/health RPCs).
### Installing Compose v2 on a worker
If `docker compose version` prints anything other than `Docker Compose
version v2.x.y`, you need the plugin. Pick the path that matches your
environment.
**Option A — Docker's official apt repo (recommended when it's available):**
```bash
# Debian/Ubuntu. Adds Docker's own package source, then installs the
# compose plugin alongside whatever docker-ce/docker.io you already have.
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/debian $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install -y docker-compose-plugin
docker compose version # expect v2.x.y
```
For Ubuntu, swap `debian` for `ubuntu` in both the keyring URL and the
sources.list entry.
**Option B — standalone binary (offline or restricted networks):**
```bash
# Drop the v2 binary into Docker's CLI plugin directory. Works on any
# distro with the Docker engine already installed.
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -fsSL \
"https://github.com/docker/compose/releases/download/v2.29.7/docker-compose-linux-$(uname -m)" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
docker compose version
```
If the worker can't reach GitHub directly (closed lab network, air-gapped
VM, etc.), download the binary on a box that *can* reach it and `scp` it
to the worker's `/usr/local/lib/docker/cli-plugins/docker-compose`
that's the entire install.
**Do not** install the legacy `docker-compose` (v1, the Python one) and
call it a day. The DECNET deployer invokes `docker compose ...` as a
subcommand, not `docker-compose ...` as a binary — they are different
programs with different code paths, and v1 is end-of-life.
**Symptom if you get this wrong.** `decnet deploy --mode swarm` returns a
500 from the worker with
`CalledProcessError: Command '['docker', 'compose', ...]' returned
non-zero exit status 125`. The worker's agent log will show the
`docker` CLI's own help text dumped into stderr because `docker` treats
`compose` as an unknown positional when the plugin isn't installed.
Time sync is a hard requirement — mTLS cert validation fails if worker and Time sync is a hard requirement — mTLS cert validation fails if worker and
master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`. master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`.
@@ -503,6 +565,7 @@ decnet swarm decommission --name <each-worker> --yes
| Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` | | Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` |
| `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first | | `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first |
| Worker returns 500 on `/deploy` with `ip addr show <nic>` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying | | Worker returns 500 on `/deploy` with `ip addr show <nic>` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying |
| Worker returns 500 on `/deploy` with `docker compose ... exit status 125` and docker help text in the log | Compose v2 plugin is not installed on the worker; the stock `docker` binary is treating `compose` as an unknown subcommand | `docker compose version` on the worker. If it doesn't print v2.x.y, see [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) |
| Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts | | Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts |
If things are really broken and you want a clean slate on the master: If things are really broken and you want a clean slate on the master: