Agents already exposed POST /teardown; the master was missing the plumbing
to reach it. Add:
- POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body
{decky_id: str|null}: null tears the whole host, a value tears one decky.
On worker failure the master returns 502 and leaves DB shards intact so
master and agent stay aligned.
- BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky
cleanup after a single-decky teardown.
- SwarmHosts page: "Teardown all" button (keeps host enrolled).
- SwarmDeckies page: per-row "Teardown" button.
Also exclude setuptools' build/ staging dir from the enrollment tarball —
`pip install -e` on the master generates build/lib/decnet_web/node_modules
and the bundle walker was leaking it to agents. Align pyproject's bandit
exclude with the git-hook invocation so both skip decnet/templates/.
The docker build contexts and syslog_bridge.py lived at repo root, which
meant setuptools (include = ["decnet*"]) never shipped them. Agents
installed via `pip install $RELEASE_DIR` got site-packages/decnet/** but no
templates/, so every deploy blew up in deployer._sync_logging_helper with
FileNotFoundError on templates/syslog_bridge.py.
Move templates/ -> decnet/templates/ and declare it as setuptools
package-data. Path resolutions in services/*.py and engine/deployer.py drop
one .parent since templates now lives beside the code. Test fixtures,
bandit exclude path, and coverage omit glob updated to match.
If the agent was started outside the updater (manually, during dev,
or from a prior systemd unit), there is no agent.pid for _stop_agent
to target, so a successful code install leaves the old in-memory
agent process still serving requests. Scan /proc for any decnet agent
command and SIGTERM all matches so restart is reliable regardless of
how the agent was originally launched.
Adds a separate `decnet updater` daemon on each worker that owns the
agent's release directory and installs tarball pushes from the master
over mTLS. A normal `/update` never touches the updater itself, so the
updater is always a known-good rescuer if a bad agent push breaks
/health — the rotation is reversed and the agent restarted against the
previous release. `POST /update-self` handles updater upgrades
explicitly (no auto-rollback).
- decnet/updater/: executor, FastAPI app, uvicorn launcher
- decnet/swarm/updater_client.py, tar_tree.py: master-side push
- cli: `decnet updater`, `decnet swarm update [--host|--all]
[--include-self] [--dry-run]`, `--updater` on `swarm enroll`
- enrollment API issues a second cert (CN=updater@<host>) signed by the
same CA; SwarmHost records updater_cert_fingerprint
- tests: executor, app, CLI, tar tree, enroll-with-updater (37 new)
- wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link
New `decnet listener` command runs the master-side RFC 5425 syslog-TLS
receiver as a standalone process (mirrors `decnet api` / `decnet swarmctl`
pattern, SIGTERM/SIGINT handlers, --daemon support).
`decnet agent` now accepts --agent-dir so operators running the worker
agent under sudo/root can point at a bundle outside /root/.decnet/agent
(the HOME under sudo propagation).
Both flags were needed to stand up the full SWARM pipeline end-to-end on
a throwaway VM: mTLS control plane reachable, syslog-over-TLS wire
confirmed via tcpdump, master-crash/resume proved with zero loss and
zero duplication across 10 forwarded lines.
pyproject: bump asyncmy floor to 0.2.11 (resolver already pulled this in).
The /opt/emit_capture.py, /opt/syslog_bridge.py, and
/usr/libexec/udev/journal-relay files were plaintext and world-readable
to any attacker root-shelled into the SSH honeypot — revealing the full
capture logic on a single cat.
Pack all three into /entrypoint.sh as XOR+gzip+base64 blobs at build
time (_build_stealth.py), then decode in-memory at container start and
exec the capture loop from a bash -c string. No .py files under /opt,
no journal-relay file under /usr/libexec/udev, no argv_zap name
anywhere. The LD_PRELOAD shim is installed as
/usr/lib/x86_64-linux-gnu/libudev-shared.so.1 — sits next to the real
libudev.so.1 and blends into the multiarch layout.
A 1-byte random XOR key is chosen at image build so a bare
'base64 -d | gunzip' probe on the visible entrypoint returns binary
noise instead of readable Python.
Docker-dependent tests live under tests/docker/ behind a new 'docker'
pytest marker (excluded from the default run, same pattern as fuzz /
live / bench).
Rename the container-side logging module decnet_logging → syslog_bridge
(canonical at templates/syslog_bridge.py, synced into each template by
the deployer). Drop the stale per-template copies; setuptools find was
picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay"
so no obvious token appears in the rendered container image.
Apply the same cloaking pattern to the telnet template that SSH got:
syslog pipe moves to /run/systemd/journal/syslog-relay and the relay
is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename
99-decnet.conf → 50-journal-forward.conf. SSH capture script:
/var/decnet/captured → /var/lib/systemd/coredump (real systemd path),
logger tag decnet-capture → systemd-journal. Compose volume updated
to match the new in-container quarantine path.
SD element ID shifts decnet@55555 → relay@55555; synced across
collector, parser, sniffer, prober, formatter, tests, and docs so the
host-side pipeline still matches what containers emit.
asyncmy needs cryptography for caching_sha2_password (the MySQL 8
default auth plugin). Without it, connection handshake fails the
moment the server negotiates the modern plugin.
stdlib json was FastAPI's default. Every response body, every SSE frame,
and every add_log/state/payload write paid the stdlib encode cost.
- pyproject.toml: add orjson>=3.10 as a core dep.
- decnet/web/api.py: default_response_class=ORJSONResponse on the
FastAPI app, so every endpoint return goes through orjson without
touching call sites. Explicit JSONResponse sites in the validation
exception handlers migrated to ORJSONResponse for consistency.
- health endpoint's explicit JSONResponse → ORJSONResponse.
- SSE stream (api_stream_events.py): 6 json.dumps call sites →
orjson.dumps(...).decode() — the per-event frames that fire on every
sse tick.
- sqlmodel_repo.py: encode sites on the log-insert path switched to
orjson (fields, payload, state value). Parser sites (json.loads)
left as-is for now — not on the measured hot path.
New `profile` optional-deps group, opt-in Pyinstrument ASGI middleware
gated by DECNET_PROFILE_REQUESTS, bench marker + tests/perf/ micro-benchmarks
for repository hot paths, and scripts/profile/ helpers for py-spy/cProfile/memray.
Root cause: test_schemathesis.py mutates decnet.web.auth.SECRET_KEY at
module-level import time, poisoning JWT verification for all other tests
in the same process — even when fuzz tests are deselected.
- Add pytest_ignore_collect hook in tests/api/conftest.py to skip
collecting test_schemathesis.py unless -m fuzz is selected
- Add --dist loadscope to addopts so xdist groups by module (protects
module-scoped fixtures in live tests)
- Remove now-unnecessary xdist_group markers from live test classes
Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead).
When enabled, traces flow through FastAPI routes, background workers
(collector, ingester, profiler, sniffer, prober), engine/mutator
operations, and all DB calls via TracedRepository proxy.
Includes Jaeger docker-compose for local dev and 18 unit tests.
- templates/sniffer/decnet_logging.py: add logging configuration for sniffer integration
- templates/ssh/decnet_logging.py: add SSH service logging template
- development/DEVELOPMENT.md: document new MySQL backend, p0f, profiler, config API features
- pyproject.toml: update dependencies for MySQL, p0f, profiler functionality
Remove unused imports (ruff F401), suppress B324 false positives on
spec-mandated MD5 in HASSH/JA3/JA3S fingerprinting, drop unused
record_version assignment in JARM parser, and pin pip>=26.0 in dev
deps to address CVE-2025-8869 and CVE-2026-1703.
Spins up each service's server.py in a real subprocess via a free ephemeral
port (PORT env var), connects with real protocol clients, and asserts both
correct protocol behavior and RFC 5424 log output.
- 44 live tests across 10 services: http, ftp, smtp, redis, mqtt,
mysql, postgres, mongodb, pop3, imap
- Shared conftest.py: _ServiceProcess (bg reader thread + queue),
free_port, live_service fixture, assert_rfc5424 helper
- PORT env var added to all 10 targeted server.py templates
- New pytest marker `live`; excluded from default addopts run
- requirements-live-tests.txt: flask, twisted + protocol clients
- Add # nosec B104 to all intentional 0.0.0.0 binds in honeypot servers
(hardcoded_bind_all_interfaces is by design — deckies must accept attacker connections)
- Add # nosec B101 to assert statements used for protocol validation in ldap/snmp
- Add # nosec B105 to fake SASL placeholder in ldap
- Add # nosec B108 to /tmp usage in smb template
- Exclude root-owned auto-generated decnet_logging.py copies from bandit scan
via pyproject.toml [tool.bandit] config (synced by _sync_logging_helper at deploy)
- Rebuild repo.engine and repo.session_factory per-test using unique
in-memory SQLite URIs — fixes KeyError: 'access_token' caused by
stale session_factory pointing at production DB
- Add @pytest.mark.fuzz to all Hypothesis and Schemathesis tests;
default run excludes them (addopts = -m 'not fuzz')
- Add missing fuzz tests to bounty, fleet, histogram, and repository
- Use tmp_path for state file in patch_state_file/mock_state_file to
eliminate file-path race conditions under xdist parallelism
- Set default addopts: -v -q -x -n logical (26 tests in ~7s)