5
Troubleshooting
anti edited this page 2026-04-24 22:07:20 -04:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Troubleshooting

Common gotchas when deploying and running DECNET.

Networking

MACVLAN fails on WSL

WSL does not play nicely with MACVLAN drivers. Options:

  • Run DECNET on bare metal or inside a proper VM (preferred).
  • Fall back to IPVLAN by passing --ipvlan on the deploy command.

See Home for supported environments.

NIC not in promiscuous mode

Deckies and the sniffer need the host NIC in promiscuous mode to see decoy-directed traffic. If captures look empty:

sudo ip link set <iface> promisc on

Auth and Startup

admin/admin rejected at startup

Intentional. DECNET refuses to boot with the trivial default. Set DECNET_ADMIN_USER and DECNET_ADMIN_PASSWORD to real values.

JWT secret too short

DECNET_JWT_SECRET must be at least 32 bytes for HS256 (RFC 7518 §3.2). Shorter secrets are rejected at startup with an explicit error. See decnet/env.py.

Embedded vs Standalone Workers

Running both the embedded profiler/sniffer and a standalone instance causes duplicate or skipped events.

Fix: pick one. Unset the embed flags when running standalone workers:

unset DECNET_EMBED_PROFILER
unset DECNET_EMBED_SNIFFER

See Environment-Variables.

Python Runtime

Python 3.14 GC instability under load

The 3.14 GC has surfaced crashes under DECNET's load profile. Pin to Python 3.11 3.13 until upstream stabilizes.

Database

SQLite write contention

Under heavy concurrent event ingestion, SQLite can hit writer-lock contention. Switch the backend to MySQL.

See Database-Drivers.

Docker

Buildx leaked mounts ("read-only file system")

Symptom. A topology deploy dies during docker compose up --build with a misleading error like:

failed to update builder last activity time:
open /home/<user>/.docker/buildx/activity/.tmp-default...:
read-only file system

Your home filesystem is not actually read-only. The real problem is Docker's buildkit driver: it has leaked bind-mounts from a previous failed build and can no longer write its activity timestamp. Each retry accumulates more leaked mounts. You can confirm it with:

mount | grep -c '/var/lib/docker/tmp/buildkit-mount'

Anything past single digits is pathological. We've seen hosts sitting on hundreds after a few botched mass-scale topologies.

Fix — sandboxed home (path under /home/.../.docker, count == 0).

The most common cause on a systemd-managed install: the API unit has ProtectHome=read-only and docker CLI can't write ~/.docker/buildx/activity/. The error stderr will name a path under /home/.... Fix by redirecting docker's config root into a path that's already in ReadWritePaths (i.e. the install dir):

# /etc/systemd/system/decnet-api.service
Environment=DOCKER_CONFIG={{ install_dir }}/.docker
Environment=BUILDX_CONFIG={{ install_dir }}/.docker/buildx

Then sudo systemctl daemon-reload && sudo systemctl restart decnet-api. The shipped deploy/decnet-api.service.j2 already includes these env vars — re-run decnet init to refresh the installed unit if your copy predates this fix.

Fix — leaked mounts present (count > 0).

prune -af && systemctl restart docker is not enough — leaked mounts often outlive the daemon because zombie buildkitd / containerd-shim processes still hold them. Full recipe:

sudo systemctl stop docker.socket docker.service
sudo pkill -9 -f buildkitd
sudo pkill -9 -f containerd-shim
for m in $(mount | awk '$3 ~ /buildkit-mount/ {print $3}'); do
  sudo umount -l "$m"
done
rm -rf ~/.docker/buildx/activity
sudo systemctl start docker
docker buildx use default   # the bundled builder still exists — just point at it

The umount -l step is the one most recipes online miss.

Fix — driver corruption (count == 0).

If mount | grep -c buildkit-mount already prints 0 and you still hit the wedge, the buildx driver state itself is inconsistent. Rebuild it under a non-reserved name:

rm -rf ~/.docker/buildx/activity ~/.docker/buildx/instances/*
docker buildx create --name decnet-builder --use --bootstrap
docker buildx inspect

default is reserved by Docker for the bundled builder — buildx create --name default errors out. Pick any other name.

How DECNET handles it. The engine's _compose_with_retry:

  • Pre-flights leaked mounts before every build; if the count crosses 10, refuses to start and emits the leaked-mount recipe.
  • Catches the wedge signature mid-build (failed to update builder last activity time + read-only file system) and short-circuits the retry loop, branching the recipe on whether mounts are 0 or >0.
  • Preserves the original compose stderr in the error so you can see what actually broke alongside the recipe.

Unrelated read-only file system errors (e.g. a config file mount) are NOT classified as a wedge — both sentinel phrases must match.


See also: Security-and-Stealth · Environment-Variables · Roadmap-and-Known-Debt