diff --git a/Troubleshooting.md b/Troubleshooting.md index d506828..7abc9ff 100644 --- a/Troubleshooting.md +++ b/Troubleshooting.md @@ -58,6 +58,44 @@ Under heavy concurrent event ingestion, SQLite can hit writer-lock contention. S See [[Database-Drivers]]. +## Docker + +### Buildx leaked mounts ("read-only file system") + +**Symptom.** A topology deploy dies during `docker compose up --build` with a misleading error like: + +``` +failed to update builder last activity time: +open /home//.docker/buildx/activity/.tmp-default...: +read-only file system +``` + +Your home filesystem is not actually read-only. The real problem is Docker's buildkit driver: it has leaked bind-mounts from a previous failed build and can no longer write its activity timestamp. Each retry accumulates more leaked mounts. You can confirm it with: + +```bash +mount | grep -c '/var/lib/docker/tmp/buildkit-mount' +``` + +Anything past single digits is pathological. We've seen hosts sitting on hundreds after a few botched mass-scale topologies. + +**Fix.** + +```bash +docker buildx prune -af +sudo systemctl restart docker +``` + +Restart is the operative step — it drops every leaked mount. `prune -af` also discards the build cache so the next deploy rebuilds from scratch; skip it if you want the cache preserved. + +If the activity dir itself is corrupted (rare): + +```bash +rm -rf ~/.docker/buildx/activity +docker buildx create --use +``` + +**How DECNET handles it.** The engine's `_compose_with_retry` counts leaked buildkit mounts before every build and refuses to start if the count crosses 10 — you get the recovery recipe in the error payload instead of a cryptic EROFS surfaced three retries deep. Mid-build failures that match the known wedge signature also short-circuit the retry loop with the same hint. + --- See also: [[Security-and-Stealth]] · [[Environment-Variables]] · [[Roadmap-and-Known-Debt]]