docs(swarm): spell out Compose v2 plugin prerequisite
Caught on a fresh Debian trixie VM: 'Docker Engine + Compose plugin' as a one-liner prerequisite is a common setup trap because trixie (and plenty of other distros) ship only the legacy 'docker-compose' (v1), not the 'docker compose' subcommand that the DECNET deployer calls. Adds two explicit install paths (Docker's apt repo for online boxes, standalone binary via scp for lab/air-gapped networks), calls out why legacy v1 does not work, and documents the exact failure signature (exit 125 + docker help text) so the next person who hits it on the worker side knows immediately what's wrong. Cross-references from the troubleshooting table.
@@ -71,13 +71,75 @@ On the **master**:
|
|||||||
On each **worker**:
|
On each **worker**:
|
||||||
|
|
||||||
- DECNET installed.
|
- DECNET installed.
|
||||||
- Docker Engine + Compose plugin (the agent shells out to `docker compose`
|
- **Docker Engine + Compose v2 plugin** (the agent shells out to
|
||||||
exactly like UNIHOST).
|
`docker compose`, not the legacy `docker-compose`). This is the single
|
||||||
|
most common setup trap — verify with `docker compose version` before
|
||||||
|
enrolling. See [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker)
|
||||||
|
below if your distro ships the Docker engine but not the plugin
|
||||||
|
(Debian trixie's stock repos, for example, only carry v1).
|
||||||
- `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root).
|
- `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root).
|
||||||
`NOPASSWD` is convenient for unattended daemons.
|
`NOPASSWD` is convenient for unattended daemons.
|
||||||
- Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from
|
- Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from
|
||||||
the master (deploy/teardown/health RPCs).
|
the master (deploy/teardown/health RPCs).
|
||||||
|
|
||||||
|
### Installing Compose v2 on a worker
|
||||||
|
|
||||||
|
If `docker compose version` prints anything other than `Docker Compose
|
||||||
|
version v2.x.y`, you need the plugin. Pick the path that matches your
|
||||||
|
environment.
|
||||||
|
|
||||||
|
**Option A — Docker's official apt repo (recommended when it's available):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Debian/Ubuntu. Adds Docker's own package source, then installs the
|
||||||
|
# compose plugin alongside whatever docker-ce/docker.io you already have.
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y ca-certificates curl
|
||||||
|
sudo install -m 0755 -d /etc/apt/keyrings
|
||||||
|
sudo curl -fsSL https://download.docker.com/linux/debian/gpg \
|
||||||
|
-o /etc/apt/keyrings/docker.asc
|
||||||
|
sudo chmod a+r /etc/apt/keyrings/docker.asc
|
||||||
|
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
|
||||||
|
https://download.docker.com/linux/debian $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
|
||||||
|
| sudo tee /etc/apt/sources.list.d/docker.list
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y docker-compose-plugin
|
||||||
|
docker compose version # expect v2.x.y
|
||||||
|
```
|
||||||
|
|
||||||
|
For Ubuntu, swap `debian` for `ubuntu` in both the keyring URL and the
|
||||||
|
sources.list entry.
|
||||||
|
|
||||||
|
**Option B — standalone binary (offline or restricted networks):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Drop the v2 binary into Docker's CLI plugin directory. Works on any
|
||||||
|
# distro with the Docker engine already installed.
|
||||||
|
sudo mkdir -p /usr/local/lib/docker/cli-plugins
|
||||||
|
sudo curl -fsSL \
|
||||||
|
"https://github.com/docker/compose/releases/download/v2.29.7/docker-compose-linux-$(uname -m)" \
|
||||||
|
-o /usr/local/lib/docker/cli-plugins/docker-compose
|
||||||
|
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
|
||||||
|
docker compose version
|
||||||
|
```
|
||||||
|
|
||||||
|
If the worker can't reach GitHub directly (closed lab network, air-gapped
|
||||||
|
VM, etc.), download the binary on a box that *can* reach it and `scp` it
|
||||||
|
to the worker's `/usr/local/lib/docker/cli-plugins/docker-compose` —
|
||||||
|
that's the entire install.
|
||||||
|
|
||||||
|
**Do not** install the legacy `docker-compose` (v1, the Python one) and
|
||||||
|
call it a day. The DECNET deployer invokes `docker compose ...` as a
|
||||||
|
subcommand, not `docker-compose ...` as a binary — they are different
|
||||||
|
programs with different code paths, and v1 is end-of-life.
|
||||||
|
|
||||||
|
**Symptom if you get this wrong.** `decnet deploy --mode swarm` returns a
|
||||||
|
500 from the worker with
|
||||||
|
`CalledProcessError: Command '['docker', 'compose', ...]' returned
|
||||||
|
non-zero exit status 125`. The worker's agent log will show the
|
||||||
|
`docker` CLI's own help text dumped into stderr because `docker` treats
|
||||||
|
`compose` as an unknown positional when the plugin isn't installed.
|
||||||
|
|
||||||
Time sync is a hard requirement — mTLS cert validation fails if worker and
|
Time sync is a hard requirement — mTLS cert validation fails if worker and
|
||||||
master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`.
|
master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`.
|
||||||
|
|
||||||
@@ -503,6 +565,7 @@ decnet swarm decommission --name <each-worker> --yes
|
|||||||
| Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` |
|
| Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` |
|
||||||
| `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first |
|
| `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first |
|
||||||
| Worker returns 500 on `/deploy` with `ip addr show <nic>` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying |
|
| Worker returns 500 on `/deploy` with `ip addr show <nic>` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying |
|
||||||
|
| Worker returns 500 on `/deploy` with `docker compose ... exit status 125` and docker help text in the log | Compose v2 plugin is not installed on the worker; the stock `docker` binary is treating `compose` as an unknown subcommand | `docker compose version` on the worker. If it doesn't print v2.x.y, see [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) |
|
||||||
| Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts |
|
| Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts |
|
||||||
|
|
||||||
If things are really broken and you want a clean slate on the master:
|
If things are really broken and you want a clean slate on the master:
|
||||||
|
|||||||
Reference in New Issue
Block a user