From 7c6fe3b57643b1f7a9f17f28957ad3757c04c862 Mon Sep 17 00:00:00 2001 From: anti Date: Sat, 18 Apr 2026 21:00:26 -0400 Subject: [PATCH] docs(swarm): add buildx 0.17+ prereq alongside compose v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second Docker-side prereq uncovered running a real deploy on a fresh Debian trixie VM: images pulled fine but 'docker compose up --build' bailed with 'compose build requires buildx 0.17.0 or later'. Debian's buildx is stuck at 0.13, so the compose-plugin install must be paired with a buildx plugin install. - Prereqs list now requires docker compose version AND docker buildx version to be verified before enrolling. - Install section renamed to 'Installing Compose v2 and Buildx on a worker', covers both plugins with arch-aware curl incantations and uname -m → compose-arch / buildx-arch mapping (compose uses x86_64, buildx uses amd64 — footgun). - Adds a troubleshooting row for the buildx-too-old case and one for wrong-arch binary ('Invalid Plugins: ... exec format error'). Both uncovered on the live VM run; docs now match reality. --- SWARM-Mode.md | 90 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 65 insertions(+), 25 deletions(-) diff --git a/SWARM-Mode.md b/SWARM-Mode.md index 6d6f945..0eea475 100644 --- a/SWARM-Mode.md +++ b/SWARM-Mode.md @@ -71,28 +71,36 @@ On the **master**: On each **worker**: - DECNET installed. -- **Docker Engine + Compose v2 plugin** (the agent shells out to - `docker compose`, not the legacy `docker-compose`). This is the single - most common setup trap — verify with `docker compose version` before - enrolling. See [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) - below if your distro ships the Docker engine but not the plugin - (Debian trixie's stock repos, for example, only carry v1). +- **Docker Engine + Compose v2 plugin + Buildx ≥ 0.17** (the agent shells + out to `docker compose` with `--build`, which in turn invokes buildx + for image builds). Verify both before enrolling: + ```bash + docker compose version # expect v2.x.y + docker buildx version # expect v0.17.0 or newer + ``` + This is the single most common setup trap. Distros vary wildly in what + they ship — Debian trixie's stock repos have neither the compose v2 + plugin nor a recent-enough buildx, for example. See [Installing + Compose v2 and Buildx on a worker](#installing-compose-v2-and-buildx-on-a-worker) + below. - `sudo` for the user running `decnet agent` (MACVLAN/IPVLAN needs root). `NOPASSWD` is convenient for unattended daemons. - Outbound TCP to master:6514 (log forward) and inbound TCP on 8765 from the master (deploy/teardown/health RPCs). -### Installing Compose v2 on a worker +### Installing Compose v2 and Buildx on a worker If `docker compose version` prints anything other than `Docker Compose -version v2.x.y`, you need the plugin. Pick the path that matches your +version v2.x.y`, or `docker buildx version` prints older than `v0.17.0`, +install the missing plugin(s). Pick the path that matches your environment. **Option A — Docker's official apt repo (recommended when it's available):** ```bash # Debian/Ubuntu. Adds Docker's own package source, then installs the -# compose plugin alongside whatever docker-ce/docker.io you already have. +# compose + buildx plugins alongside whatever docker-ce/docker.io you +# already have. sudo apt-get update sudo apt-get install -y ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings @@ -103,42 +111,72 @@ echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker. https://download.docker.com/linux/debian $(. /etc/os-release && echo $VERSION_CODENAME) stable" \ | sudo tee /etc/apt/sources.list.d/docker.list sudo apt-get update -sudo apt-get install -y docker-compose-plugin -docker compose version # expect v2.x.y +sudo apt-get install -y docker-compose-plugin docker-buildx-plugin +docker compose version # expect v2.x.y +docker buildx version # expect v0.17.0+ ``` For Ubuntu, swap `debian` for `ubuntu` in both the keyring URL and the sources.list entry. -**Option B — standalone binary (offline or restricted networks):** +**Option B — standalone binaries (offline or restricted networks):** + +Both plugins install the same way: download the binary for your +architecture and drop it into Docker's CLI plugin directory. ```bash -# Drop the v2 binary into Docker's CLI plugin directory. Works on any -# distro with the Docker engine already installed. +# Confirm the worker's architecture first — x86_64, aarch64, armv7l. +ARCH=$(uname -m) +case "$ARCH" in + x86_64) COMPOSE_ARCH=x86_64; BUILDX_ARCH=amd64 ;; + aarch64) COMPOSE_ARCH=aarch64; BUILDX_ARCH=arm64 ;; + armv7l) COMPOSE_ARCH=armv7; BUILDX_ARCH=arm-v7 ;; +esac + sudo mkdir -p /usr/local/lib/docker/cli-plugins + +# Compose v2 sudo curl -fsSL \ - "https://github.com/docker/compose/releases/download/v2.29.7/docker-compose-linux-$(uname -m)" \ + "https://github.com/docker/compose/releases/download/v2.29.7/docker-compose-linux-${COMPOSE_ARCH}" \ -o /usr/local/lib/docker/cli-plugins/docker-compose sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose + +# Buildx +sudo curl -fsSL \ + "https://github.com/docker/buildx/releases/download/v0.18.0/buildx-v0.18.0.linux-${BUILDX_ARCH}" \ + -o /usr/local/lib/docker/cli-plugins/docker-buildx +sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-buildx + docker compose version +docker buildx version ``` If the worker can't reach GitHub directly (closed lab network, air-gapped -VM, etc.), download the binary on a box that *can* reach it and `scp` it -to the worker's `/usr/local/lib/docker/cli-plugins/docker-compose` — -that's the entire install. +VM, etc.), download the binaries on a box that *can* reach it and `scp` +them to the worker's `/usr/local/lib/docker/cli-plugins/` — that's the +entire install. + +**Watch the architecture.** Downloading `linux-x86_64` onto an `aarch64` +worker (or vice versa) gets you `exec format error: failed to fetch +metadata` from the `docker` CLI and the plugin is listed under "Invalid +Plugins" in `docker info`. `uname -m` is your friend. **Do not** install the legacy `docker-compose` (v1, the Python one) and call it a day. The DECNET deployer invokes `docker compose ...` as a subcommand, not `docker-compose ...` as a binary — they are different programs with different code paths, and v1 is end-of-life. -**Symptom if you get this wrong.** `decnet deploy --mode swarm` returns a -500 from the worker with -`CalledProcessError: Command '['docker', 'compose', ...]' returned -non-zero exit status 125`. The worker's agent log will show the -`docker` CLI's own help text dumped into stderr because `docker` treats -`compose` as an unknown positional when the plugin isn't installed. +**Symptoms if you get this wrong.** + +- No compose plugin at all: `CalledProcessError: Command '['docker', + 'compose', ...]' returned non-zero exit status 125`, agent log shows + the `docker` CLI's help text (because `compose` is an unknown + subcommand). +- Compose plugin OK but buildx too old: `compose build requires buildx + 0.17.0 or later` in the agent log, followed by `up --build` exit + status 1. Images pull fine, the build step is what fails. +- Wrong-arch binary: `Invalid Plugins: compose failed to fetch metadata: + fork/exec ...: exec format error` in `docker info`. Time sync is a hard requirement — mTLS cert validation fails if worker and master clocks differ by more than a few minutes. Run `chronyd`/`systemd-timesyncd`. @@ -565,7 +603,9 @@ decnet swarm decommission --name --yes | Lines appear in `master.log` but not the dashboard | Ingester not running, or pointed at the wrong JSON path | `systemctl status decnet-ingester`, confirm `DECNET_INGEST_LOG_FILE` matches `listener --json-path` | | `deploy --mode swarm` fails with `No enrolled workers` | Exactly what it says | `swarm enroll` at least one worker first | | Worker returns 500 on `/deploy` with `ip addr show ` error | The worker's agent is re-detecting its own NIC (this is the relocalize step) and can't find a usable interface | Run `ip route show default` on the worker — if empty, the default route is missing; fix the worker's networking before deploying | -| Worker returns 500 on `/deploy` with `docker compose ... exit status 125` and docker help text in the log | Compose v2 plugin is not installed on the worker; the stock `docker` binary is treating `compose` as an unknown subcommand | `docker compose version` on the worker. If it doesn't print v2.x.y, see [Installing Compose v2 on a worker](#installing-compose-v2-on-a-worker) | +| Worker returns 500 on `/deploy` with `docker compose ... exit status 125` and docker help text in the log | Compose v2 plugin is not installed on the worker; the stock `docker` binary is treating `compose` as an unknown subcommand | `docker compose version` on the worker. If it doesn't print v2.x.y, see [Installing Compose v2 and Buildx on a worker](#installing-compose-v2-and-buildx-on-a-worker) | +| Worker returns 500 on `/deploy` with `compose build requires buildx 0.17.0 or later` | Buildx plugin missing or too old on the worker; images pull but the build step fails | `docker buildx version` on the worker. If it's below v0.17.0, see [Installing Compose v2 and Buildx on a worker](#installing-compose-v2-and-buildx-on-a-worker) | +| `docker info` lists a CLI plugin under "Invalid Plugins: ... exec format error" | Wrong-architecture binary installed — e.g. x86_64 binary dropped onto an aarch64 host | Re-download the plugin binary matching `uname -m` and overwrite the file in `/usr/local/lib/docker/cli-plugins/` | | Agent rejects master with `BAD_CERTIFICATE` | Master's own client cert (`~/.decnet/master/`) isn't in the worker's trust chain | Never happens if both sides were issued from the same CA. Check you didn't re-init the CA between `swarmctl` starts | If things are really broken and you want a clean slate on the master: