Remote Updates
DECNET ships a self-updater daemon that runs on every worker alongside
decnet agent. It lets the master push a new working tree to the worker
(tarball over mTLS), install it, restart the agent, health-probe it, and
auto-rollback to the previous release if the new one is unhealthy —
all without scp, sshpass, or any human SSH session.
This page covers architecture, enrollment, the command surface, and the failure modes you'll actually hit in practice. If you just want to ship code, jump to Pushing an update.
Why a separate daemon
The naive design puts the agent in charge of its own updates. That immediately bricks itself the first time you push broken code — the daemon you'd use to roll the fix back is the daemon you just broke. The updater dodges that paradox by being a completely separate process with its own venv and its own mTLS identity. A normal update does not touch the updater, so the updater is always a known-good rescuer.
A second explicit endpoint, POST /update-self, handles updater
upgrades. It has no auto-rollback and you must opt in — the contract is
"you have chosen to push to the thing that rescues you; don't break it."
Architecture
┌────────── MASTER ──────────┐ ┌──────────── WORKER ────────────┐
│ decnet swarm update ... │ │ │
│ tars working tree │ │ decnet updater :8766 ◀──────┐ │
│ POST /update ──mTLS──▶──┼────────┼─▶ snapshots, installs, probes │ │
│ │ │ restarts agent via exec │ │
│ │ │ │ │
│ │ │ decnet agent :8765 ◀──────┘ │
│ │ │ (managed by updater) │
└────────────────────────────┘ └─────────────────────────────────┘
Two daemons on each worker, each with a distinct cert (both signed by the same DECNET CA that already backs SWARM Mode). Certificate CN distinguishes identities:
| Identity | CN example | Used for |
|---|---|---|
| Agent | worker-01 |
/deploy, /teardown, /status, /health on port 8765 |
| Updater | updater@worker-01 |
/update, /update-self, /rollback, /releases on port 8766 |
Install layout on the worker
The updater owns the release directory:
/opt/decnet/ (default; override with --install-dir)
current -> releases/active (atomic symlink; flip == promotion)
venv/ shared venv — agent + updater run from here
releases/
active/ source tree of the live release
prev/ the last good source snapshot
active.new/ staging (only exists mid-update)
updater/ updater's own tree + venv + releases
— NEVER touched by a normal /update
agent.pid PID of the agent process we spawned
agent.spawn.log stdout/stderr of the most recent spawn
.env.local per-host overrides (JWT secret, DB URL, …)
~/.decnet/
agent/ worker.key, worker.crt, ca.crt
updater/ updater.key, updater.crt, ca.crt (CN=updater@<host>)
The venv is shared across releases (not per-slot). An update swaps the
source-tree symlink; pip reinstalls the decnet package into the same
venv with --force-reinstall --no-deps, so the slow work is the fresh
tarball unpack, not a full dep rebuild. On the very first update into a
brand-new venv the full dep tree is installed once — subsequent updates
are near-no-op if dependencies haven't changed.
The updater loads .env.local from its working directory, so the worker
can carry a persistent per-host .env.local (JWT secret, DB URL, log
paths) without editing site-packages. The updater spawns the agent with
cwd=/opt/decnet/ so the agent picks up the same file.
Enrollment
Enrolling a host for remote updates is a single extra flag on the
existing decnet swarm enroll:
decnet swarm enroll \
--host 192.168.1.23 --address 192.168.1.23 \
--sans 192.168.1.23 \
--updater \
--out-dir ./enroll-bundle
The controller now issues two certs signed by the same CA:
./enroll-bundle/{ca.crt, worker.crt, worker.key}— goes to~/.decnet/agent/on the worker (same as before)../enroll-bundle-updater/{ca.crt, updater.crt, updater.key}— goes to~/.decnet/updater/on the worker.
Ship both directories to the worker once (this is the last scp you'll do for this host), then on the worker:
sudo install -d -m 0700 ~/.decnet/agent ~/.decnet/updater
# ...scp the two bundles into place...
sudo decnet agent --daemon --agent-dir ~/.decnet/agent
sudo decnet updater --daemon --updater-dir ~/.decnet/updater \
--install-dir /opt/decnet
From this point on the master can push code without touching SSH.
Without --updater
If you forgot --updater at enrollment time, decommission and re-enroll
the host — that's the cleanest path. The alternative is running the
enrollment endpoint manually with issue_updater_bundle=true for an
already-enrolled host; this is currently a v2 concern.
Pushing an update
From the master (your dev box), make your changes, commit if you want (the tarball is the working tree, staged + unstaged + untracked), then:
# Push to one worker
decnet swarm update --host worker-01
# Push to every non-decommissioned worker
decnet swarm update --all
# Also ship the updater itself (explicit; no auto-rollback)
decnet swarm update --all --include-self
# Inspect what would ship — no network
decnet swarm update --all --dry-run
Output is a table per host:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ host ┃ address ┃ agent ┃ self ┃ detail ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ worker-01 │ 192.168.1.23 │ updated │ — │ 4b7e1a9... │
│ worker-02 │ 192.168.1.24 │ rolled- │ — │ probe failed;... │
│ │ │ back │ │ │
└────────────┴──────────────┴─────────┴──────┴────────────────────┘
updated(green) — new release live, agent answered/health.rolled-back(yellow, exit 1) — new release failed its post-deploy probe; the updater already swapped the symlink back and restarted the agent against the previous release. The worker is functional; the attempted update is inreleases/prev/on the worker for forensics.error(red, exit 1) — transport or install failure before the rotation even happened; no state change on the worker.
What the updater actually does
For each /update request:
- Extract the tarball into
/opt/decnet/releases/active.new/. Paths with..or leading/are rejected. - Install:
pip install --force-reinstall [--no-deps] <active.new>into the shared/opt/decnet/venv/. The first time this runs (venv doesn't exist yet) the updater bootstraps it with the full dep tree; every subsequent update uses--no-depsso only thedecnetpackage is replaced. Non-zero exit → abort, return 500 with pip stderr, no rotation. - Rotate:
prev/(if present) is removed,active/→prev/,active.new/→active/. Thecurrentsymlink is flipped atomically viarename(2). - Restart agent: SIGTERM the PID in
agent.pid, wait up to 10 s, then SIGKILL if still alive. Ifagent.pidis missing (agent was started manually, not spawned by the updater), the updater scans/procfor anydecnet agentprocess and SIGTERMs those instead — so restart is reliable regardless of how the agent was originally launched. Spawn a new agent through the shared venv'sdecnetentry point withcwd=/opt/decnet/. - Probe: GET
https://127.0.0.1:8765/healthover mTLS up to 10 times with 1 s backoff. - On probe success → return 200 with the new release manifest.
- On probe failure → swap
active/↔prev/back, restart the agent again, re-probe, return 409 with both probe transcripts androlled_back: true. The master CLI translates that to a yellow "rolled-back" row and exit code 1.
All of this runs inside the single POST handler. The update is atomic from the outside: either the worker is on the new release and the agent is healthy, or it's on the previous release (possibly because the new one failed) with a healthy agent.
--include-self
Same protocol but targets /opt/decnet/updater/, which has its own
release slots and its own venv (/opt/decnet/updater/venv/). The
updater never touches this directory during a normal /update, which
is the whole point: a broken /update can't brick the thing that rolls
it back.
Mechanics:
- The master POSTs the tarball to
POST /update-selfwithconfirm_self=true. - The updater extracts into
/opt/decnet/updater/releases/active.new/and runspip install --force-reinstall <slot>against/opt/decnet/updater/venv/. The first self-update bootstraps this venv with the full dep tree (typer, fastapi, uvicorn, …) before installingdecnet. - On success, rotate the updater's own
active/prevslots. os.execvinto the newly installed binary with a cleanly reconstructed argv. The argv is notsys.argv[1:]— inside the running processsys.argvis the uvicorn subprocess invocation (--ssl-keyfile …), whichdecnet updaterCLI does not understand. Instead the updater rebuildsdecnet updater --host … --port … \ --updater-dir … --install-dir … --agent-dir …from env vars thatdecnet.updater.server.runstashes at startup (DECNET_UPDATER_HOST,DECNET_UPDATER_PORT,DECNET_UPDATER_BUNDLE_DIR,DECNET_UPDATER_INSTALL_DIR,DECNET_UPDATER_AGENT_DIR).- The TCP connection drops mid-response. That is normal: the master
waits up to 30 s for the updater's
/healthto come back with the new SHA and treats that as success. No auto-rollback — if the new updater can't import, the old one is gone and you'll need SSH to recover.
Ordering: agent first, updater second. A broken agent push should not lock you into shipping the updater through that broken agent's host.
Use sparingly. A bad self-update is the one case you will need
scp for — the wiki's promise of "no more scp" has one asterisk on it,
and this is it.
Manual rollback
If you want to roll back without pushing new code (you notice a regression after the probe already passed):
# No CLI yet for this in v1; hit the endpoint directly.
curl --cert ~/.decnet/ca/master/worker.crt \
--key ~/.decnet/ca/master/worker.key \
--cacert ~/.decnet/ca/master/ca.crt \
-X POST https://<worker-ip>:8766/rollback
Returns 404 if there's no prev/ slot (which is only the case on the
worker's very first release — a fresh install has nothing to roll back
to). A CLI wrapper (decnet swarm rollback) is planned for v2.
Symptom table
| Symptom | Likely cause | Fix |
|---|---|---|
curl: (35) error:...peer certificate required on 8766 |
Updater is up but mTLS rejected the client cert. Using the wrong bundle. | Use ~/.decnet/ca/master/ bundle, not the worker bundle. |
swarm update hangs for >2 min on one host |
pip install is slow on a very underpowered worker. | Bump _TIMEOUT_UPDATE in decnet/swarm/updater_client.py (temporary) or enroll more resources. |
All hosts return error: ConnectTimeout |
Updater isn't running on any worker. | On each worker: sudo decnet updater --daemon --updater-dir .... |
rolled-back on every push |
The agent is now importing something the worker doesn't have. Probe hits /health and gets 500. |
Read detail field — it contains the agent's traceback. Fix and push again. |
rolled-back only on workers with a different OS |
The Compose/Buildx/Python version on that worker differs from the master's. | SWARM Mode prerequisites. The updater does not install OS packages. |
After --include-self, /health on 8766 never returns |
The new updater failed at import time. execv succeeded but Python died. |
SSH in; look at the updater's journalctl / stderr; revert ~/.decnet/updater/ to the previous tree manually. |
After --include-self, updater logs No such option: --ssl-keyfile and dies |
Pre-fix bug: updater re-exec'd with sys.argv[1:] (uvicorn's argv) instead of the CLI argv. |
Fixed in commit 40d3e86. Make sure the updater is running code that reconstructs argv from env — if not, SSH in and sudo decnet updater --host 0.0.0.0 --port 8766 --updater-dir ... --install-dir ... manually. |
After --include-self, updater dies with ModuleNotFoundError: No module named 'typer' |
Pre-fix bug: a freshly bootstrapped updater venv installed decnet with --no-deps. |
Fixed in commit 40d3e86. On an already-broken host: sudo rm -rf /opt/decnet/updater/{venv,releases,current} and restart the updater from /opt/decnet/venv/ — the next --include-self bootstraps a complete venv. |
Agent keeps serving old code after /update returns 200 |
Agent was started by hand (no agent.pid), pre-fix _stop_agent had nothing to kill. |
Fixed in commit ebeaf08: _stop_agent now falls back to a /proc scan for any decnet agent process. On stale hosts, restart the agent once. |
Master: FileNotFoundError: .../master/worker.key |
Master identity was never materialized (no swarm enroll has run yet on this install). |
decnet swarm list once — it materializes the master identity as a side effect of ensure_master_identity(). |
Out of scope (v1)
- Dependency changes. The updater
pip installs the new tree, so adding a dep usually just works. But if the new version of a dep fails to resolve against the worker's Python / index / lockfile, the rollback path catches it. It is not the updater's job to fix package layer drift — usescpand deploy manually once, then carry on. - OS package installs. Upgrades to Docker / Compose / buildx are still manual on the worker. See SWARM Mode prerequisites.
- Systemd unit changes. Shipping a new unit file requires the old one to already work; you won't get rescue for that kind of change via the updater.
- Schema migrations on the worker's tiny SQLite (if any) — manual.
- Canary / A/B rollouts.
--allis all-at-once. If you want staged rollout, push to one host, observe, then the rest. - Signed release artifacts. mTLS already authenticates the master; a detached signature is a v2 concern.
DECNET
User docs
- Quick-Start
- Installation
- Requirements-and-Python-Versions
- CLI-Reference
- INI-Config-Format
- Custom-Services
- Services-Catalog
- Service-Personas
- Archetypes
- Distro-Profiles
- OS-Fingerprint-Spoofing
- Networking-MACVLAN-IPVLAN
- Deployment-Modes
- SWARM-Mode
- MazeNET
- Remote-Updates
- Environment-Variables
- Teardown-and-State
- Database-Drivers
- Systemd-Setup
- Logging-and-Syslog
- Service-Bus
- Web-Dashboard
- REST-API-Reference
- Mutation-and-Randomization
- Troubleshooting
Developer docs
DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl