Agents run deckies locally and need to inspect their own state. Removed
`status` from MASTER_ONLY_COMMANDS so it survives the agent-mode gate.
Useful for validating remote updater pushes from the master.
_DB_RESET_TABLES was missing the swarm tables, so drop-tables mode left
them intact. create_all doesn't alter columns on existing tables, so any
schema change to SwarmHost (like use_ipvlan) never took effect after a
reset. Child FK first (decky_shards -> swarm_hosts).
Mirrors the agent→forwarder pattern: `decnet swarmctl` now fires the
syslog-TLS listener as a detached Popen sibling so a single master
invocation brings the full receive pipeline online. --no-listener opts
out for operators who want to run the listener on a different host (or
under their own systemd unit).
Listener bind host / port come from DECNET_LISTENER_HOST and
DECNET_SWARM_SYSLOG_PORT — both seedable from /etc/decnet/decnet.ini.
PID at $(pid_dir)/listener.pid so operators can kill/restart manually.
decnet.ini.example ships alongside env.config.example as the
documented surface for the new role-scoped config. Mode, forwarder
targets, listener bind, and master ports all live there — no more
memorizing flag trees.
Extends tests/test_auto_spawn.py with two swarmctl cases: listener is
spawned with the expected argv + PID file, and --no-listener suppresses.
New _spawn_detached(argv, pid_file) helper uses Popen with
start_new_session=True + close_fds=True + DEVNULL stdio to launch a
DECNET subcommand as a fully independent process. The parent does NOT
wait(); if it dies the child survives under init. This is deliberately
not a supervisor — if the child dies the operator restarts it manually.
_pid_dir() picks /opt/decnet when writable else ~/.decnet, so both
root-run production and non-root dev work without ceremony.
`decnet agent` now auto-spawns `decnet forwarder --daemon ...` as
that detached sibling, pulling master host + syslog port from
DECNET_SWARM_MASTER_HOST / DECNET_SWARM_SYSLOG_PORT. --no-forwarder
opts out. If DECNET_SWARM_MASTER_HOST is unset the auto-spawn is
silently skipped (single-host dev or operator wants to start the
forwarder separately).
tests/test_auto_spawn.py monkeypatches subprocess.Popen and verifies:
the detach kwargs are passed, the PID file exists and contains a
valid positive integer (PID-file corruption is a real operational
headache — catching bad writes at the test level is free), the
--no-forwarder flag suppresses the spawn, and the unset-master-host
path silently skips.
- MASTER_ONLY_COMMANDS / MASTER_ONLY_GROUPS frozensets enumerate every
command a worker host must not see. Comment block at the declaration
puts the maintenance obligation in front of anyone touching command
registration.
- _gate_commands_by_mode() filters both app.registered_commands (for
@app.command() registrations) and app.registered_groups (for
add_typer sub-apps) so the 'swarm' group disappears along with
'api', 'swarmctl', 'deploy', etc. on agent hosts.
- _require_master_mode() is the belt-and-braces in-function guard,
added to the four highest-risk commands (api, swarmctl, deploy,
teardown). Protects against direct function imports that would
bypass Typer.
- DECNET_DISALLOW_MASTER=false is the escape hatch for hybrid dev
hosts that legitimately play both roles.
tests/test_mode_gating.py exercises help-text listings via subprocess
and the defence-in-depth guard via direct import.
Adds a separate `decnet updater` daemon on each worker that owns the
agent's release directory and installs tarball pushes from the master
over mTLS. A normal `/update` never touches the updater itself, so the
updater is always a known-good rescuer if a bad agent push breaks
/health — the rotation is reversed and the agent restarted against the
previous release. `POST /update-self` handles updater upgrades
explicitly (no auto-rollback).
- decnet/updater/: executor, FastAPI app, uvicorn launcher
- decnet/swarm/updater_client.py, tar_tree.py: master-side push
- cli: `decnet updater`, `decnet swarm update [--host|--all]
[--include-self] [--dry-run]`, `--updater` on `swarm enroll`
- enrollment API issues a second cert (CN=updater@<host>) signed by the
same CA; SwarmHost records updater_cert_fingerprint
- tests: executor, app, CLI, tar tree, enroll-with-updater (37 new)
- wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link
`swarm list` only shows enrolled workers — there was no way to see which
deckies are running and where. Adds GET /swarm/deckies on the controller
(joins DeckyShard with SwarmHost for name/address/status) plus the CLI
wrapper with --host / --state filters and --json.
The swarmctl API already exposes POST /swarm/check — an active mTLS
probe that refreshes SwarmHost.status + last_heartbeat for every
enrolled worker. The CLI was missing a wrapper, so operators had to
curl the endpoint directly (which is how the VM validation run did it,
and how the wiki Deployment-Modes / SWARM-Mode pages ended up doc'ing
a command that didn't exist yet).
Matches the existing list/enroll/decommission pattern: typer subcommand
under swarm_app, --url override, Rich table output plus --json for
scripting. Three tests: populated table, empty-swarm path, and --json
emission.
New `decnet listener` command runs the master-side RFC 5425 syslog-TLS
receiver as a standalone process (mirrors `decnet api` / `decnet swarmctl`
pattern, SIGTERM/SIGINT handlers, --daemon support).
`decnet agent` now accepts --agent-dir so operators running the worker
agent under sudo/root can point at a bundle outside /root/.decnet/agent
(the HOME under sudo propagation).
Both flags were needed to stand up the full SWARM pipeline end-to-end on
a throwaway VM: mTLS control plane reachable, syslog-over-TLS wire
confirmed via tcpdump, master-crash/resume proved with zero loss and
zero duplication across 10 forwarded lines.
pyproject: bump asyncmy floor to 0.2.11 (resolver already pulled this in).
New sub-app talks HTTP to the local swarm controller (127.0.0.1:8770 by
default; override with --url or $DECNET_SWARMCTL_URL).
- enroll: POSTs /swarm/enroll, prints fingerprint, optionally writes
ca.crt/worker.crt/worker.key to --out-dir for scp to the worker.
- list: renders enrolled workers as a rich table (with --status filter).
- decommission: looks up uuid by --name, confirms, DELETEs.
deploy --mode swarm now:
1. fetches enrolled+active workers from the controller,
2. round-robin-assigns host_uuid to each decky,
3. POSTs the sharded DecnetConfig to /swarm/deploy,
4. renders per-worker pass/fail in a results table.
Exits non-zero if no workers exist or any worker's dispatch failed.
The forwarder module existed but had no runner — closes that gap so the
worker-side process can actually be launched and runs isolated from the
agent (asyncio.run + SIGTERM/SIGINT → stop_event).
Guards: refuses to start without a worker cert bundle or a resolvable
master host ($DECNET_SWARM_MASTER_HOST or --master-host).
Adds decnet/web/swarm_api.py as an independent FastAPI app with routers
for host enrollment, deployment dispatch (sharding DecnetConfig across
enrolled workers via AgentClient), and active health probing. Runs as
its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation
pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for
the worker side.
29 tests added under tests/swarm/test_swarm_api.py cover enrollment
(including bundle generation + duplicate rejection), host CRUD, sharding
correctness, non-swarm-mode rejection, teardown, and health probes with
a stubbed AgentClient.
Popen moved inside the try so a missing uvicorn falls through to the
existing error message instead of crashing the CLI. test_cli was still
patching the old subprocess.run entrypoint; switched both api command
tests to patch subprocess.Popen / os.killpg to match the current path.
With --workers > 1, SIGINT from the terminal raced uvicorn's supervisor:
some workers got signaled directly, the supervisor respawned them, and
the result behaved like a forkbomb. Start uvicorn in its own session and
signal the whole process group (SIGTERM → 10s grace → SIGKILL) when we
catch KeyboardInterrupt.
Forwards straight to uvicorn's --workers. Default stays at 1 so the
single-worker efficiency direction is preserved; raising it is available
for threat-actor load scenarios where the honeypot needs to soak real
attack traffic without queueing on one event loop.
resp.read(4096) blocks until 4096 bytes accumulate, which stalls SSE
events (~100-500 bytes each) in the proxy buffer indefinitely. Switch
to read1() which returns bytes immediately available without waiting
for more. Also disable the 120s socket timeout for SSE connections.
- Refactor deploy command to support service randomization and selective service deployment
- Add --services flag to filter deployed services by name
- Improve status and teardown command output formatting
- Update help text for clarity
Extends the prober with two new active probe types alongside JARM:
- HASSHServer: SSH server fingerprinting via KEX_INIT algorithm ordering
(MD5 hash of kex;enc_s2c;mac_s2c;comp_s2c, pure stdlib)
- TCP/IP stack: OS/tool fingerprinting via SYN-ACK analysis using scapy
(TTL, window size, DF bit, MSS, TCP options ordering, SHA256 hash)
Worker probe cycle now runs three phases per IP with independent
per-type port tracking. Ingester extracts bounties for all three
fingerprint types.
The collector kept streaming stale container IDs after a redeploy,
causing new service logs to never reach decnet.log. Now _kill_api()
also matches and SIGTERMs any running decnet.cli collect process.
- Modify Rfc5424Formatter to read decnet_component from LogRecord
and use it as RFC 5424 APP-NAME field (falls back to 'decnet')
- Add get_logger(component) factory in decnet/logging/__init__.py
with _ComponentFilter that injects decnet_component on each record
- Wire all five layers to their component tag:
cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester,
routers), mutator -> 'mutator', collector -> 'collector'
- Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each
layer per the defined vocabulary; DEBUG calls are suppressed unless
DECNET_DEVELOPER=true
- Add tests/test_logging.py covering factory, filter, formatter
component-awareness, fallback behaviour, and level gating
The collector subprocess was spawned via 'python3 -m decnet.cli collect'
but cli.py had no 'if __name__ == __main__: app()' guard. Python executed
the module, defined all functions, then exited cleanly with code 0 without
ever calling the collect command. No output, no log file, exit 0 — silent
non-start every time.
Also route collector stderr to <log_file>.collector.log so future crashes
are visible instead of disappearing into DEVNULL.
Collector and mutator watcher subprocesses were spawned without
start_new_session=True, leaving them in the parent's process group.
SIGHUP (sent when the controlling terminal closes) killed both
processes silently — stdout/stderr were DEVNULL so the crash was
invisible.
Also update test_services and test_composer to reflect the ssh plugin
no longer using Cowrie env vars (replaced with SSH_ROOT_PASSWORD /
SSH_HOSTNAME matching the real_ssh plugin).
Two bugs caused the log file to never be written:
1. is_service_container() used regex '^decky-\d+-\w' which only matched
the old decky-01-smtp naming style. Actual containers are named
omega-decky-smtp, relay-decky-smtp, etc. Fixed by using Docker Compose
labels instead: com.docker.compose.project=decnet + non-empty
depends_on discriminates service containers from base (sleep infinity)
containers reliably regardless of decky naming convention.
Added is_service_event() for the Docker events path.
2. The collector was only started when --api was used. Added a 'collect'
CLI subcommand (decnet collect --log-file <path>) and wired it into
deploy as an auto-started background process when --api is not in use.
Default log path: /var/log/decnet/decnet.log
When --parallel is set:
- DOCKER_BUILDKIT=1 is injected into the subprocess environment to
ensure BuildKit is active regardless of host daemon config
- docker compose build runs first (all images built concurrently)
- docker compose up -d follows without --build (no redundant checks)
Without --parallel the original up --build path is preserved.
--parallel and --no-cache compose correctly (build --no-cache).
Services now print RFC 5424 to stdout; Docker captures via json-file driver.
A new host-side collector (decnet.web.collector) streams docker logs from all
running decky service containers and writes RFC 5424 + parsed JSON to the host
log file. The existing ingester continues to tail the .json file unchanged.
rsyslog can consume the .log file independently — no DECNET involvement needed.
Removes: bind-mount volume injection, _LOG_NETWORK bridge, log_target config
field and --log-target CLI flag, TCP syslog forwarding from service templates.
- ini_loader.py: DeckySpec gains nmap_os field; load_ini parses nmap_os=
(also accepts nmap-os= hyphen alias) and propagates it to amount-expanded deckies
- cli.py: _build_deckies_from_ini resolves nmap_os with priority:
explicit INI key > archetype default > "linux"
- test-full.ini: every decky now carries nmap_os=; [windows-workstation]
gains archetype= so its OS family is set correctly; decky-winbox/fileserv/
ldapdc → windows, decky-iot → embedded, decky-legacy → bsd, rest → linux
- tests/test_ini_loader.py: 7 new tests covering nmap_os parsing, defaults,
hyphen alias, and amount= expansion propagation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each decky base container now receives a set of Linux kernel sysctls
(net.ipv4.ip_default_ttl, net.ipv4.tcp_syn_retries, etc.) tuned to
match the claimed OS family, making nmap OS detection return the
expected OS rather than the Linux host.
- decnet/os_fingerprint.py: OS profile table (linux/windows/bsd/embedded/cisco)
keyed by TTL and TCP tuning knobs
- decnet/archetypes.py: Archetype gains nmap_os field; windows-* → "windows",
printer/iot/industrial → "embedded", rest → "linux"
- decnet/config.py: DeckyConfig gains nmap_os field (default "linux")
- decnet/cli.py: nmap_os resolved from archetype → DeckyConfig in both CLI
and INI build paths
- decnet/composer.py: base container gets sysctls + cap_add: [NET_ADMIN];
service containers inherit via shared network namespace
- tests/test_os_fingerprint.py: 48 new tests covering profiles, compose
injection, archetype coverage, and CLI propagation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces archetype profiles (windows-workstation, linux-server,
domain-controller, printer, iot-device, etc.) so users get a realistic
service+distro combination without knowing which services to pick.
Adds amount= to INI config (and CLI --archetype) so a single section
can spawn N identical deckies without copy-paste. Per-service subsections
(e.g. [group.ssh]) propagate to all expanded instances automatically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MACVLAN assigns unique MACs per container; WiFi APs typically reject
frames from unregistered MACs, making deckies unreachable from other
LAN devices. IPvlan L2 shares the host's MAC, so all traffic passes
through the AP normally.
- network.py: add create_ipvlan_network, setup_host_ipvlan,
teardown_host_ipvlan, HOST_IPVLAN_IFACE
- config.py: add ipvlan: bool = False to DecnetConfig (persisted to
state so teardown uses the right driver)
- deployer.py: branch on config.ipvlan for create/setup/teardown
- cli.py: add --ipvlan flag, wire into DecnetConfig
- tests/test_network.py: new test module covering ips_to_range,
create_macvlan/ipvlan, setup/teardown for both drivers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>