407 Commits

Author SHA1 Message Date
201d246c07 fix(ci): fix indentation on ci.yaml
Some checks failed
CI / Lint (ruff) (push) Successful in 14s
CI / SAST (bandit) (push) Successful in 15s
CI / Test (Standard) (3.11) (push) Has been cancelled
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Dependency audit (pip-audit) (push) Has been cancelled
2026-04-20 16:46:30 -04:00
47cd200e1d feat(mazenet): repo methods for topology/LAN/decky/edge/status events
Adds topology CRUD to BaseRepository (NotImplementedError defaults) and
implements them in SQLModelRepository: create/get/list/delete topologies,
add/update/list LANs and TopologyDeckies, add/list edges, plus an atomic
update_topology_status that appends a TopologyStatusEvent in the same
transaction.  Cascade delete sweeps children before the topology row.

Covered by tests/topology/test_repo.py (roundtrip, per-topology name
uniqueness, status event log, cascade delete, status filter) and an
extension to tests/test_base_repo.py for the NotImplementedError surface.
2026-04-20 16:43:49 -04:00
096a35b24a feat(mazenet): add topology schema to models.py
Introduces five new SQLModel tables for MazeNET (nested deception
topologies): Topology, LAN, TopologyDecky, TopologyEdge, and
TopologyStatusEvent.  DeckyShard is intentionally not touched —
TopologyDecky is a purpose-built sibling for MazeNET's lifecycle
(topology-scoped UUIDs, per-topology name uniqueness).

Part of MazeNET v1 (nested self-container network-of-networks).
2026-04-20 16:40:10 -04:00
8a2876fe86 fix(api): document missing HTTP status codes on router endpoints
All checks were successful
CI / Lint (ruff) (push) Successful in 16s
CI / SAST (bandit) (push) Successful in 18s
CI / Dependency audit (pip-audit) (push) Successful in 26s
CI / Test (Standard) (3.11) (push) Successful in 2m41s
CI / Test (Live) (3.11) (push) Successful in 1m6s
CI / Test (Fuzz) (3.11) (push) Successful in 1h9m14s
CI / Finalize Merge to Main (push) Has been skipped
CI / Merge dev → testing (push) Successful in 12s
CI / Prepare Merge to Main (push) Has been skipped
Schemathesis was failing CI on routes that returned status codes not
declared in their OpenAPI responses= dicts. Adds the missing codes
across swarm_updates, swarm_mgmt, swarm, fleet and attackers routers.

Also adds 400 to every POST/PUT/PATCH that accepts a JSON body —
Starlette returns 400 on malformed/non-UTF8 bodies before FastAPI's
422 validation runs, which schemathesis fuzzing trips every time.

No handler logic changed.
2026-04-20 15:25:02 -04:00
3e8e4c9e1c fix(ci): run less harsh tests on CI, let local runners run harder ones 2026-04-20 14:07:34 -04:00
64bc6fcb1d chores(pyproj): modified some values 2026-04-20 13:22:49 -04:00
af9d59d3ee fixed(api): documentation 2026-04-20 13:20:42 -04:00
4197441c01 fix(ci): skip live service isolation
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Successful in 2m47s
CI / Test (Live) (3.11) (push) Successful in 1m7s
CI / Test (Fuzz) (3.11) (push) Failing after 45m40s
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-20 13:14:48 -04:00
1b70d6db87 fix(ci): added skipif on mysql absence
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Successful in 24s
CI / Test (Standard) (3.11) (push) Successful in 2m51s
CI / Test (Live) (3.11) (push) Failing after 1m2s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-20 13:07:31 -04:00
038596776a feat(ci): added live mysql service on test-live
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Successful in 2m49s
CI / Test (Live) (3.11) (push) Failing after 1m1s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-20 12:54:03 -04:00
692ac35ee4 modification(versions): drop 3.12 tests and support only 3.11
Some checks failed
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Successful in 26s
CI / Test (Standard) (3.11) (push) Successful in 2m58s
CI / Test (Live) (3.11) (push) Failing after 58s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-20 12:43:03 -04:00
f064690452 fixed(tests): jwt_lazy
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Successful in 5m6s
CI / Test (Standard) (3.12) (push) Failing after 3h14m38s
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-20 02:26:54 -04:00
dd82cd3f39 fixed(tests): mode_gating
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Failing after 5m0s
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Standard) (3.12) (push) Has been cancelled
2026-04-20 02:18:11 -04:00
ff3e376726 modified(actions): modified actions to bypass bandit on decnet/templates
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 16s
CI / Dependency audit (pip-audit) (push) Successful in 31s
CI / Test (Standard) (3.11) (push) Failing after 4m52s
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Standard) (3.12) (push) Has been cancelled
2026-04-20 02:05:36 -04:00
47f2ca8d5f added(tests): schemathesis contract fuzzing at the agent and swarmctl level
Some checks failed
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Failing after 19s
CI / Dependency audit (pip-audit) (push) Failing after 38s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Standard) (3.12) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-20 01:27:39 -04:00
da3e675f86 fix(tests): fixed locust fixtures and rampups, since >100 generally isn't very well managed 2026-04-20 01:26:56 -04:00
2febd921bc fix(models): added lenght validation to the common name, which per RFC 5280 must be max =< 64 2026-04-20 01:26:07 -04:00
12b5c25cd7 fix(agent-routes): added undocumented responses 2026-04-20 01:24:05 -04:00
5b70a34c94 fix(routes): added undocumented responses 2026-04-20 01:23:07 -04:00
4abfac1a98 fix: monkeypatch test db URL 2026-04-20 00:03:12 -04:00
9eca33938d chore: deleted swp file 2026-04-19 23:51:59 -04:00
195580c74d test: fix templates paths, CLI gating, and stress-suite harness
- tests/**: update templates/ → decnet/templates/ paths after module move
- tests/mysql_spinup.sh: use root:root and asyncmy driver
- tests/test_auto_spawn.py: patch decnet.cli.utils._pid_dir (package split)
- tests/test_cli.py: set DECNET_MODE=master in api-command tests
- tests/stress/conftest.py: run locust out-of-process via its CLI + CSV
  stats shim to avoid urllib3 RecursionError from late gevent monkey-patch;
  raise uvicorn startup timeout to 60s, accept 401 from auth-gated health,
  strip inherited DECNET_* env, surface stderr on 0-request runs
- tests/stress/test_stress.py: loosen baseline thresholds to match hw
2026-04-19 23:50:53 -04:00
262a84ca53 refactor(cli): split decnet/cli.py monolith into decnet/cli/ package
The 1,878-line cli.py held every Typer command plus process/HTTP helpers
and mode-gating logic. Split into one module per command using a
register(app) pattern so submodules never import app at module scope,
eliminating circular-import risk.

- utils.py: process helpers, _http_request, _kill_all_services, console, log
- gating.py: MASTER_ONLY_* sets, _require_master_mode, _gate_commands_by_mode
- deploy.py: deploy + _deploy_swarm (tightly coupled)
- lifecycle.py: status, teardown, redeploy
- workers.py: probe, collect, mutate, correlate
- inventory.py, swarm.py, db.py, and one file per remaining command

__init__.py calls register(app) on each module then runs the mode gate
last, and re-exports the private symbols tests patch against
(_db_reset_mysql_async, _kill_all_services, _require_master_mode, etc.).

Test patches retargeted to the submodule where each name now resolves.
Enroll-bundle tarball test updated to assert decnet/cli/__init__.py.

No behavioral change.
2026-04-19 22:42:52 -04:00
d1b7e94325 fix(swarm): inject peer cert into ASGI scope for uvicorn <= 0.44
Uvicorn's h11/httptools HTTP protocols don't populate scope['extensions']['tls'], so /swarm/heartbeat's per-request cert pinning was 403ing every call despite CERT_REQUIRED validating the cert at handshake. Patch RequestResponseCycle.__init__ on both protocol modules to read the peer cert off the asyncio transport and write DER bytes into scope['extensions']['tls']['client_cert_chain']. Importing the module from swarm_api.py auto-installs the patch in the swarmctl uvicorn worker before any request is served.
2026-04-19 22:09:11 -04:00
33d954a61c feat(web-ui): unify SwarmDeckies into DeckyFleet with swarm card mode
DeckyFleet now branches on /system/deployment-mode: in swarm mode it
pulls /swarm/deckies and normalises DeckyShardView into the shared
Decky shape so the same card grid renders either way. Swarm cards gain
a host badge (host_name @ address), a state pill (running/degraded/
tearing_down/failed/teardown_failed with matching colors), an inline
last_error snippet, and a two-click arm/commit Teardown button lifted
from the old SwarmDeckies component. Mutate + interval controls are
hidden in swarm mode since the worker /mutate endpoint still 501s —
swarm-side rotation is a separate ticket.

Drops the standalone /swarm/deckies route + nav entry; SwarmDeckies.tsx
is deleted. The SWARM nav group keeps SwarmHosts, Remote Updates, and
Agent Enrollment.
2026-04-19 21:53:26 -04:00
bf01804736 feat(agent): periodic heartbeat loop posting status to swarmctl
New decnet.agent.heartbeat asyncio loop wired into the agent FastAPI
lifespan. Every 30 s the worker POSTs executor.status() to the master's
/swarm/heartbeat with its DECNET_HOST_UUID for self-identity; the
existing agent mTLS bundle provides the client cert the master pins
against SwarmHost.client_cert_fingerprint.

start() is a silent no-op when identity env (HOST_UUID, MASTER_HOST) is
unset or the worker bundle is missing, so dev runs and un-enrolled hosts
don't crash the agent app. On non-204 responses the loop logs loudly but
keeps ticking — an operator may re-enrol mid-session, and fail-closed
pinning shouldn't be self-silencing.
2026-04-19 21:49:34 -04:00
62f7c88b90 feat(swarmctl): --tls with auto-issued or BYOC server cert
swarmctl CLI gains --tls/--cert/--key/--client-ca flags. With --tls the
controller runs uvicorn under HTTPS + mTLS (CERT_REQUIRED) so worker
heartbeats can reach it cross-host. Default is still 127.0.0.1 plaintext
for backwards compat with the master-CLI enrollment flow.

Auto-issue path (no --cert/--key given): a server cert signed by the
existing DECNET CA is issued once and parked under ~/.decnet/swarmctl/.
Workers already ship that CA's ca.crt from the enroll bundle, so they
verify the endpoint with no extra trust config. BYOC via --cert/--key
when the operator wants a publicly-trusted or externally-managed cert.
The auto-cert path is idempotent across restarts to keep a stable
fingerprint for any long-lived mTLS sessions.
2026-04-19 21:46:32 -04:00
e411063075 feat(swarm): ship host_uuid + swarmctl-port in agent enroll bundle
The rendered /etc/decnet/decnet.ini now carries host-uuid and
swarmctl-port in [agent], which config_ini seeds into DECNET_HOST_UUID
and DECNET_SWARMCTL_PORT. Gives the worker a stable self-identity for
the heartbeat loop — the INI never has to be rewritten because cert
pinning is the real gate (a rotated UUID with a matching CA-signed
cert would still be blocked by SHA-256 fingerprint mismatch against
the stored SwarmHost row).

Also adds DECNET_MASTER_HOST so the agent can find the swarmctl URL
via the INI's existing master-host key.
2026-04-19 21:44:23 -04:00
148e51011c feat(swarm): agent→master heartbeat with per-host cert pinning
New POST /swarm/heartbeat on the swarm controller. Workers post every
~30s with the output of executor.status(); the master bumps
SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh
DeckyConfig snapshot and runtime-derived state (running/degraded).

Security: CA-signed mTLS alone is not sufficient — a decommissioned
worker's still-valid cert could resurrect ghost shards. The endpoint
extracts the presented peer cert (primary: scope["extensions"]["tls"],
fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it
to the SwarmHost.client_cert_fingerprint stored for the claimed
host_uuid. Extraction is factored into _extract_peer_fingerprint so
tests can exercise both uvicorn scope shapes and the both-unavailable
fail-closed path without mocking uvicorn's TLS pipeline.

Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel
impl reuses the indexed client_cert_fingerprint column).
2026-04-19 21:37:15 -04:00
3ebd206bca feat(swarm): persist DeckyConfig snapshot per shard + enrich list API
Dispatch now writes the full serialised DeckyConfig into
DeckyShard.decky_config (plus decky_ip as a cheap extract), so the
master can render the same rich per-decky card the local-fleet view
uses — hostname, distro, archetype, service_config, mutate_interval,
last_mutated — without round-tripping to the worker on every page
render. DeckyShardView gains the corresponding fields; the repository
flattens the snapshot at read time. Pre-migration rows keep working
(fields fall through as None/defaults).

Columns are additive + nullable so SQLModel.metadata.create_all handles
the change on both SQLite and MySQL. Backfill happens organically on
the next dispatch or (in a follow-up) agent heartbeat.
2026-04-19 21:29:45 -04:00
f576564f02 fix(agent): also wipe /etc/decnet during self-destruct 2026-04-19 21:04:31 -04:00
00d5799a79 fix(agent): escape systemd cgroup when spawning self-destruct reaper
The reaper was being SIGTERM'd mid-rm because `start_new_session=True`
only forks a new POSIX session — it does not escape decnet-agent.service's
cgroup. When the reaper ran `systemctl stop decnet-agent`, systemd
tore down the whole cgroup (reaper included) before `rm -rf /opt/decnet*`
finished, leaving the install on disk.

Spawn the reaper via `systemd-run --collect --unit decnet-reaper-<pid>`
so it runs in a fresh transient scope, outside the agent unit. Falls
back to bare Popen for non-systemd hosts.
2026-04-19 21:00:43 -04:00
14250cacad feat(swarm): self-destruct agent on decommission
Decommissioning a worker from the dashboard (or swarm controller) now
asks the agent to wipe its own install before the master forgets it.
The agent stops decky containers + every decnet-* systemd unit, then
deletes /opt/decnet*, /etc/systemd/system/decnet-*, /var/lib/decnet/*,
and /usr/local/bin/decnet*. Logs under /var/log are preserved.

The reaper runs as a detached /tmp script (start_new_session=True) so
it survives the agent process being killed. Self-destruct dispatch is
best-effort — a dead worker doesn't block master-side cleanup.
2026-04-19 20:47:09 -04:00
9d68bb45c7 feat(web): async teardowns — 202 + background task, UI allows parallel queue
Teardowns were synchronous all the way through: POST blocked on the
worker's docker-compose-down cycle (seconds to minutes), the frontend
locked tearingDown to a single string so only one button could be armed
at a time, and operators couldn't queue a second teardown until the
first returned. On a flaky worker that meant staring at a spinner for
the whole RTT.

Backend: POST /swarm/hosts/{uuid}/teardown returns 202 the instant the
request is validated. Affected shards flip to state='tearing_down'
synchronously before the response so the UI reflects progress
immediately, then the actual AgentClient call + DB cleanup run in an
asyncio.create_task (tracked in a module-level set to survive GC and
to be drainable by tests). On failure the shard flips to
'teardown_failed' with the error recorded — nothing is re-raised,
since there's no caller to catch it.

Frontend: swap tearingDown / decommissioning from 'string | null' to
'Set<string>'. Each button tracks its own in-flight state; the poll
loop picks up the final shard state from the backend. Multiple
teardowns can now be queued without blocking each other.
2026-04-19 20:30:56 -04:00
07ec4bc269 fix(fleet): INI fully replaces prior decky state on redeploy
Submitting an INI with a single [decky1] was silently redeploying the
deckies from the *previous* deploy too. POST /deckies/deploy merged the
new INI into the stored DecnetConfig by name, so a 1-decky INI on top of
a prior 3-decky run still pushed 3 deckies to the worker. Those stale
decky2/decky3 kept their old IPs, collided on the parent NIC, and the
agent failed with 'Address already in use' — the deploy the operator
never asked for.

The INI is the source of truth for which deckies exist this deploy.
Full replace: config.deckies = list(new_decky_configs). Operators who
want to add more deckies should list them all in the INI.

Update the deploy-limit test to reflect the new replace semantics, and
add a regression test asserting prior state is dropped.
2026-04-19 20:24:29 -04:00
a63301c7a3 fix(web): replace window.confirm with two-click arm/commit on swarm actions
Teardown and Decommission buttons were silently dead in the browser.
Root cause: every handler started with 'if (!window.confirm(...)) return;'
and browsers permanently disable confirm() for a tab once the user ticks
'Prevent this page from creating additional dialogs'. That returns false
with no UI, the handler early-exits, and no request is ever fired — no
network traffic, no console error, no backend activity.

Swap to an inline two-click pattern: first click arms the button (label
flips to 'Click again to confirm', resets after 4s); second click within
the window commits. Same safety against misclicks, zero dependency on
browser-native dialog primitives.
2026-04-19 20:16:51 -04:00
df18cb44cc fix(swarm): don't paint healthy deckies as failed when a shard-sibling fails
docker compose up is partial-success-friendly — a build failure on one
service doesn't roll back the others. But the master was catching the
agent's 500 and tagging every decky in the shard as 'failed' with the
same error message. From the UI that looked like all three deckies died
even though two were live on the worker.

On dispatch exception, probe the agent's /status to learn which deckies
actually have running containers, and upsert per-decky state accordingly.
Only fall back to marking the whole shard failed if the status probe
itself is unreachable.

Enhance agent.executor.status() to include a 'runtime' map keyed by
decky name with per-service container state, so the master has something
concrete to consult.
2026-04-19 20:11:08 -04:00
91549e6936 fix(deploy): prevent 'Address already in use' from stale IPAM and half-torn-down containers
Two compounding root causes produced the recurring 'Address already in use'
error on redeploy:

1. _ensure_network only compared driver+name; if a prior deploy's IPAM
   pool drifted (different subnet/gateway/range), Docker kept handing out
   addresses from the old pool and raced the real LAN. Now also compares
   Subnet/Gateway/IPRange and rebuilds on drift.

2. A prior half-failed 'up' could leave containers still holding the IPs
   and ports the new run wants. Run 'compose down --remove-orphans' as a
   best-effort pre-up cleanup so IPAM starts from a clean state.

Also surface docker compose stderr to the structured log on failure so
the agent's journal captures Docker's actual message (which IP, which
port) instead of just the exit code.
2026-04-19 19:59:06 -04:00
e8e11b2896 feat(web-ui): show decky IP on SwarmDeckies, drop compose-hash column
Operators want to know what address to poke when triaging a swarm decky;
the compose-hash column was debug scaffolding that never paid off.

DeckyShard has no IP column (the deploy-time IP lives on DecnetConfig),
so the list endpoint resolves it at read time by joining shards against
the stored deployment state by decky_name. Missing lookups render as "—"
rather than erroring — the list stays useful even after a master restart
that hasn't persisted a config yet.
2026-04-19 19:48:27 -04:00
585541016f fix(engine): teardown(decky_id=...) built malformed service names
The nested list-comp `[f"{id}-{svc}" for svc in [d.services for d ...]]`
iterated over a list of lists, so `svc` was the whole services list and
f-string stringified it -> `decky3-['sip']`. docker compose saw "no such
service" and the per-decky teardown failed 500.

Flatten: find the matching decky once, then iterate its services. Noop
early on unknown decky_id and on empty service lists. Regression test
asserts the emitted compose args have no '[' or quote characters.
2026-04-19 19:42:42 -04:00
5dad1bb315 feat(swarm): remote teardown API + UI (per-decky and per-host)
Agents already exposed POST /teardown; the master was missing the plumbing
to reach it. Add:

- POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body
  {decky_id: str|null}: null tears the whole host, a value tears one decky.
  On worker failure the master returns 502 and leaves DB shards intact so
  master and agent stay aligned.
- BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky
  cleanup after a single-decky teardown.
- SwarmHosts page: "Teardown all" button (keeps host enrolled).
- SwarmDeckies page: per-row "Teardown" button.

Also exclude setuptools' build/ staging dir from the enrollment tarball —
`pip install -e` on the master generates build/lib/decnet_web/node_modules
and the bundle walker was leaking it to agents. Align pyproject's bandit
exclude with the git-hook invocation so both skip decnet/templates/.
2026-04-19 19:39:28 -04:00
6708f26e6b fix(packaging): move templates/ into decnet/ package so they ship with pip install
The docker build contexts and syslog_bridge.py lived at repo root, which
meant setuptools (include = ["decnet*"]) never shipped them. Agents
installed via `pip install $RELEASE_DIR` got site-packages/decnet/** but no
templates/, so every deploy blew up in deployer._sync_logging_helper with
FileNotFoundError on templates/syslog_bridge.py.

Move templates/ -> decnet/templates/ and declare it as setuptools
package-data. Path resolutions in services/*.py and engine/deployer.py drop
one .parent since templates now lives beside the code. Test fixtures,
bandit exclude path, and coverage omit glob updated to match.
2026-04-19 19:30:04 -04:00
2bef3edb72 feat(swarm): unbundle master-only code from agent tarball + sync systemd units on update
Agents now ship with collector/prober/sniffer as systemd services; mutator,
profiler, web, and API stay master-only (profiler rebuilds attacker profiles
against the master DB — no per-host DB exists). Expand _EXCLUDES to drop the
full decnet/web, decnet/mutator, decnet/profiler, and decnet_web trees from
the enrollment bundle.

Updater now calls _heal_path_symlink + _sync_systemd_units after rotation so
fleets pick up new unit files and /usr/local/bin/decnet tracks the shared venv
without a manual reinstall. daemon-reload runs once per update when any unit
changed.

Fix _service_registry matchers to accept systemd-style /usr/local/bin/decnet
cmdlines (psutil returns a list — join to string before substring-checking)
so agent-mode `decnet status` reports collector/prober/sniffer correctly.
2026-04-19 19:19:17 -04:00
d2cf1e8b3a feat(updater): sync systemd unit files and daemon-reload on update
The bootstrap installer copies etc/systemd/system/*.service into
/etc/systemd/system at enrollment time, but the updater was skipping
that step — a code push could not ship a new unit (e.g. the four
per-host microservices added this session) or change ExecStart on an
existing one. systemctl alone doesn't re-read unit files; daemon-reload
is required.

run_update / run_update_self now call _sync_systemd_units after
rotation: diff each .service file against the live copy, atomically
replace changed ones, then issue a single `systemctl daemon-reload`.
No-op on legacy tarballs that don't ship etc/systemd/system/.
2026-04-19 19:07:24 -04:00
6d7877c679 feat(swarm): per-host microservices as systemd units, mutator off agents
Previously `decnet status` on an agent showed every microservice as DOWN
because deploy's auto-spawn was unihost-scoped and the agent CLI gate
hid the per-host commands. Now:

  - collect, probe, profiler, sniffer drop out of MASTER_ONLY_COMMANDS
    (they run per-host; master-side work stays master-gated).
  - mutate stays master-only (it orchestrates swarm-wide respawns).
  - decnet/mutator/ excluded from agent tarballs — never invoked there.
  - decnet/web exclusion tightened: ship db/ + auth.py + dependencies.py
    (profiler needs the repo singleton), drop api.py, swarm_api.py,
    ingester.py, router/, templates/.
  - Four new systemd unit templates (decnet-collector/prober/profiler/
    sniffer) shipped in every enrollment tarball.
  - enroll_bootstrap.sh enables + starts all four alongside agent and
    forwarder at install time.
  - updater restarts the aux units on code push so they pick up the new
    release (best-effort — legacy enrollments without the units won't
    fail the update).
  - status table hides Mutator + API rows in agent mode.
2026-04-19 18:58:48 -04:00
ee9ade4cd5 feat(enroll): strip master API and frontend from agent tarball
Agents never run the FastAPI master app (decnet/web/) or serve the React
frontend (decnet_web/) — they run decnet.agent, decnet.updater, and
decnet.forwarder, none of which import decnet.web. Shipping the master
tree bloats every enrollment payload and needlessly widens the worker's
attack surface.

Excluded paths are unreachable on the worker (all cli.py imports of
decnet.web are inside master-only command bodies that the agent-mode
gate strips). Tests assert neither tree leaks into the tarball.
2026-04-19 18:47:03 -04:00
dad29249de fix(updater): align bootstrap layout with updater; log update phases
The bootstrap was installing into /opt/decnet/.venv with an editable
`pip install -e .`, and /usr/local/bin/decnet pointed there. The updater
writes releases to /opt/decnet/releases/active/ with a shared venv at
/opt/decnet/venv — a parallel tree nothing on the box actually runs.
Result: updates appeared to succeed (release dir rotated, SHA changed)
but systemd kept executing the untouched bootstrap code.

Changes:
  - Bootstrap now installs directly into /opt/decnet/releases/active
    with the shared venv at /opt/decnet/venv and /opt/decnet/current
    symlinked. Same layout the updater rotates in and out of.
  - /usr/local/bin/decnet -> /opt/decnet/venv/bin/decnet.
  - run_update / run_update_self heal /usr/local/bin/decnet on every
    push so already-enrolled hosts recover on the next update instead
    of needing a re-enroll.
  - run_update / run_update_self now log each phase (receive, extract,
    pip install, rotate, restart, probe) so the updater log actually
    shows what happened.
2026-04-19 18:39:11 -04:00
f91ba9a16e feat(cli): allow decnet status in agent mode
Agents run deckies locally and need to inspect their own state. Removed
`status` from MASTER_ONLY_COMMANDS so it survives the agent-mode gate.
Useful for validating remote updater pushes from the master.
2026-04-19 18:29:41 -04:00
43b92c7bd6 fix(updater): restart agent+forwarder+self via systemd on push
Three holes in the systemd integration:
1. _spawn_agent_via_systemd only restarted decnet-agent.service, leaving
   decnet-forwarder.service running the pre-update code (same /opt/decnet
   tree, stale import cache).
2. run_update_self used os.execv regardless of environment — the re-execed
   process kept the updater's existing cgroup/capability inheritance but
   systemd would notice MainPID change and mark the unit degraded.
3. No path to surface a failed forwarder restart (legacy enrollments have
   no forwarder unit).

Now: agent restart first, forwarder restart as best-effort (logged but
non-fatal so legacy workers still update), MainPID still read from the
agent unit. For update-self under systemd, spawn a detached sleep+
systemctl restart so the HTTP response flushes before the unit cycles.
2026-04-19 18:23:10 -04:00
a0a241f65d feat(enroll): decnet-updater now runs under systemd, not a --daemon fork
Bootstrap used to end with `decnet updater --daemon` which forks and
detaches — invisible to systemctl, no auto-restart, dies on reboot.
Ships a decnet-updater.service template matching the pattern of the
other units (Restart=on-failure, log to /var/log/decnet/decnet.updater.log,
certs from /etc/decnet/updater, install tree at /opt/decnet), bundles
it alongside agent/forwarder/engine units, and the installer now
`systemctl enable --now`s it when --with-updater is set.
2026-04-19 18:19:24 -04:00
42b5e4cd06 fix(network): replace decnet_lan when driver differs (macvlan<->ipvlan)
The create helpers short-circuited on name alone, so a prior macvlan
deploy left Docker's decnet_lan network in place. A subsequent ipvlan
deploy would no-op the network create, then container attach would try
to add a macvlan port on enp0s3 that already had an ipvlan slave —
EBUSY, agent 500, docker ps empty.

Now: when the existing network's driver disagrees with the requested
one, disconnect any live containers and DROP it before recreating.
Parent-NIC can host one driver at a time.

Also: setup_host_{macvlan,ipvlan} opportunistically delete the opposite
host-side helper so we don't leave cruft across driver swaps.
2026-04-19 18:12:28 -04:00
6245786289 fix(cli): db-reset now drops swarm_hosts + decky_shards
_DB_RESET_TABLES was missing the swarm tables, so drop-tables mode left
them intact. create_all doesn't alter columns on existing tables, so any
schema change to SwarmHost (like use_ipvlan) never took effect after a
reset. Child FK first (decky_shards -> swarm_hosts).
2026-04-19 18:04:35 -04:00
5df995fda1 feat(enroll): opt-in IPvlan per-agent for Wi-Fi-bridged VMs
Wi-Fi APs bind one MAC per associated station, so VirtualBox/VMware
guests bridged over Wi-Fi rotate the VM's DHCP lease when Docker's
macvlan starts emitting container-MAC frames through the vNIC. Adds a
`use_ipvlan` toggle on the Agent Enrollment tab (mirrors the updater
daemon checkbox): flips the flag on SwarmHost, bakes `ipvlan=true` into
the agent's decnet.ini, and `_worker_config` forces ipvlan=True on the
per-host shard at dispatch. Safe no-op on wired/bare-metal agents.
2026-04-19 17:57:45 -04:00
6d7567b6bb fix(fleet): reset stale host_uuid on carried-over deckies before dispatch
Deckies merged in from a prior deployment's saved state kept their
original host_uuid — which dispatch_decnet_config then 404'd on if that
host had since been decommissioned or re-enrolled at a different uuid.
Before round-robin assignment, drop any host_uuid that isn't in the live
swarm_hosts set so orphaned entries get reassigned instead of exploding
with 'unknown host_uuid'.
2026-04-19 06:27:34 -04:00
dbaccde143 fix(swarm-updates): offload tarball build to worker thread
tar_working_tree (walks repo + gzips several MB) and detect_git_sha
(shells out) were called directly on the event loop, so /swarm-updates/push
and /swarm-updates/push-self froze every other request until the tarball
was ready. Wrap both in asyncio.to_thread.
2026-04-19 06:21:27 -04:00
b883f24ba2 fix(engine): pin docker compose project name to avoid empty-basename failure
systemd daemons run with WorkingDirectory=/ by default; docker compose
derives the project name from basename(cwd), which is empty at '/', and
aborts with 'project name must not be empty'. Pass -p decnet explicitly
so the project name is independent of cwd, and set WorkingDirectory=/opt/decnet
on the three DECNET units so compose artifacts (decnet-compose.yml,
build contexts) also land in the install dir.
2026-04-19 06:17:30 -04:00
79db999030 feat(fleet): auto-swarm deploy — shard across enrolled workers when master
POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence:
when the caller is a master with at least one reachable swarm host, round-
robin host_uuids are assigned over new deckies and the config is dispatched
via AgentClient. Falls back to local docker-compose otherwise.

Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config
so both endpoints share the same shard/dispatch/persist path. Adds
GET /system/deployment-mode for the UI to show 'will shard across N hosts'
vs 'will deploy locally' before the operator clicks deploy.
2026-04-19 06:09:08 -04:00
cb1a1d1270 fix(fleet): defer DecnetConfig build until deckies are expanded
Stateless /api/v1/deckies/deploy previously instantiated DecnetConfig with
deckies=[] so it could merge entries later — but DecnetConfig.deckies is
min_length=1, so Pydantic raised and the global handler mapped it to 422
'Internal data consistency error'. Construct the config after
build_deckies_from_ini returns at least one DeckyConfig.
2026-04-19 06:02:26 -04:00
899ea559d9 feat(enroll): systemd units for agent/forwarder/engine + log-directory INI key
Rename log-file-path -> log-directory (maps to DECNET_LOG_DIRECTORY). Bundle
now ships three systemd units rendered with agent_name/master_host and installs
them into /etc/systemd/system/. Bootstrap replaces direct 'decnet X --daemon'
calls with systemctl enable --now. Each unit pins DECNET_SYSTEM_LOGS so agent,
forwarder, and deckies logs land at decnet.{agent,forwarder}.log and decnet.log
under /var/log/decnet.
2026-04-19 05:46:08 -04:00
e67b6d7f73 refactor(swarm-mgmt): move agent/updater certs to /etc/decnet (root-owned) 2026-04-19 05:32:39 -04:00
bc5f43c3f7 feat(swarm-mgmt): probe-on-read for GET /swarm/hosts heartbeat + status 2026-04-19 05:26:35 -04:00
ff4c993617 refactor(swarm-mgmt): backfill host address from agent's .tgz source IP 2026-04-19 05:20:29 -04:00
e32fdf9cbf feat(swarm-mgmt): agent_host + updater opt-in; prevent duplicate forwarder spawn 2026-04-19 05:12:55 -04:00
95ae175e1b fix(swarm-mgmt): exclude .env from bundle, chmod +x decnet, mkdir log 2026-04-19 04:58:55 -04:00
b4df9ea0a1 fix(swarm-mgmt): bundle URLs target master_host, not dashboard base_url 2026-04-19 04:52:20 -04:00
02f07c7962 feat(web-ui): SWARM nav group + Hosts/Deckies/AgentEnrollment pages 2026-04-19 04:29:07 -04:00
c6f7de30d2 feat(swarm-mgmt): agent enrollment bundle flow + admin swarm endpoints 2026-04-19 04:25:57 -04:00
37b22b76a5 feat(cli): auto-spawn listener as detached sibling from decnet swarmctl
Mirrors the agent→forwarder pattern: `decnet swarmctl` now fires the
syslog-TLS listener as a detached Popen sibling so a single master
invocation brings the full receive pipeline online. --no-listener opts
out for operators who want to run the listener on a different host (or
under their own systemd unit).

Listener bind host / port come from DECNET_LISTENER_HOST and
DECNET_SWARM_SYSLOG_PORT — both seedable from /etc/decnet/decnet.ini.
PID at $(pid_dir)/listener.pid so operators can kill/restart manually.

decnet.ini.example ships alongside env.config.example as the
documented surface for the new role-scoped config. Mode, forwarder
targets, listener bind, and master ports all live there — no more
memorizing flag trees.

Extends tests/test_auto_spawn.py with two swarmctl cases: listener is
spawned with the expected argv + PID file, and --no-listener suppresses.
2026-04-19 03:25:40 -04:00
43f140a87a feat(cli): auto-spawn forwarder as detached sibling from decnet agent
New _spawn_detached(argv, pid_file) helper uses Popen with
start_new_session=True + close_fds=True + DEVNULL stdio to launch a
DECNET subcommand as a fully independent process. The parent does NOT
wait(); if it dies the child survives under init. This is deliberately
not a supervisor — if the child dies the operator restarts it manually.

_pid_dir() picks /opt/decnet when writable else ~/.decnet, so both
root-run production and non-root dev work without ceremony.

`decnet agent` now auto-spawns `decnet forwarder --daemon ...` as
that detached sibling, pulling master host + syslog port from
DECNET_SWARM_MASTER_HOST / DECNET_SWARM_SYSLOG_PORT. --no-forwarder
opts out. If DECNET_SWARM_MASTER_HOST is unset the auto-spawn is
silently skipped (single-host dev or operator wants to start the
forwarder separately).

tests/test_auto_spawn.py monkeypatches subprocess.Popen and verifies:
the detach kwargs are passed, the PID file exists and contains a
valid positive integer (PID-file corruption is a real operational
headache — catching bad writes at the test level is free), the
--no-forwarder flag suppresses the spawn, and the unset-master-host
path silently skips.
2026-04-19 03:23:42 -04:00
3223bec615 feat(cli): gate master-only commands when DECNET_MODE=agent
- MASTER_ONLY_COMMANDS / MASTER_ONLY_GROUPS frozensets enumerate every
  command a worker host must not see. Comment block at the declaration
  puts the maintenance obligation in front of anyone touching command
  registration.
- _gate_commands_by_mode() filters both app.registered_commands (for
  @app.command() registrations) and app.registered_groups (for
  add_typer sub-apps) so the 'swarm' group disappears along with
  'api', 'swarmctl', 'deploy', etc. on agent hosts.
- _require_master_mode() is the belt-and-braces in-function guard,
  added to the four highest-risk commands (api, swarmctl, deploy,
  teardown). Protects against direct function imports that would
  bypass Typer.
- DECNET_DISALLOW_MASTER=false is the escape hatch for hybrid dev
  hosts that legitimately play both roles.

tests/test_mode_gating.py exercises help-text listings via subprocess
and the defence-in-depth guard via direct import.
2026-04-19 03:20:48 -04:00
2b1b962849 feat(env): run decnet.ini loader at package import; expose DECNET_MODE
- decnet/__init__.py now calls load_ini_config() on first import of any
  decnet.* module, seeding os.environ via setdefault() so env.py's
  module-level reads pick up INI values before the shell had to export
  them. Real env vars still win.
- env.py exposes DECNET_MODE (default 'master') and
  DECNET_DISALLOW_MASTER (default true), consumed by the upcoming
  master-command gating in cli.py.

Back-compat: missing /etc/decnet/decnet.ini is a no-op. Existing
.env.local + flag-based launches behave identically.
2026-04-19 03:17:25 -04:00
65fc9ac2b9 fix(tests): clean up two pre-existing failures before config work
- decnet/agent/app.py /health: drop leftover 'push-test-2' canary
  planted during live VM push verification and never cleaned up;
  test_health_endpoint asserts the exact dict shape.
- tests/test_factory.py: switch the lazy-engine check from
  mysql+aiomysql (not in pyproject) to mysql+asyncmy (the driver the
  project actually ships). The test does not hit the wire so the
  dialect swap is safe.

Both were red on `pytest tests/` before any config/auto-spawn work
began; fixing them here so the upcoming commits land on a green
full-suite baseline.
2026-04-19 03:17:17 -04:00
1e8b73c361 feat(config): add /etc/decnet/decnet.ini loader
New decnet/config_ini.py parses a role-scoped INI file via stdlib
configparser and seeds os.environ via setdefault — real env vars still
win, keeping full back-compat with .env.local flows.

[decnet] holds role-agnostic keys (mode, disallow-master, log-file-path);
the role section matching `mode` is loaded, the other is ignored
silently so a worker never reads master-only keys (and vice versa).

Loader is standalone in this commit — not wired into cli.py yet.
2026-04-19 03:10:51 -04:00
9b1299458d fix(env): resolve DECNET_JWT_SECRET lazily so agent/updater subcommands don't need it
The module-level _require_env('DECNET_JWT_SECRET') call blocked
`decnet agent` and `decnet updater` from starting on workers that
legitimately have no business knowing the master's JWT signing key.

Move the resolution into a module `__getattr__`: only consumers that
actually read `decnet.env.DECNET_JWT_SECRET` trigger the validation,
which in practice means only decnet.web.auth (master-side).

Adds tests/test_env_lazy_jwt.py covering both the in-process lazy path
and an out-of-process `decnet agent --help` subprocess check with a
fully sanitized environment.
2026-04-19 02:43:25 -04:00
7894b9e073 feat(web-ui): Remote Updates dashboard page — push code to workers from the UI
React component for /swarm-updates: per-host table polled every 10s,
row actions for Push Update / Update Updater / Rollback, a fleet-wide
'Push to All' modal with the include_self toggle, and toast feedback
per result.

Admin-only (both server-gated and UI-gated). Unreachable hosts surface
as an explicit state; actions are disabled on them. Rollback is
disabled when the worker has no previous release slot (previous_sha
null from /hosts).
2026-04-19 01:03:04 -04:00
a266d6b17e feat(web): Remote Updates API — dashboard endpoints for pushing code to workers
Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind
require_admin. Reuses the existing UpdaterClient + tar_working_tree + the
per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is
built exactly once per /push request and fanned out to every selected
worker. /hosts filters out decommissioned hosts and agent-only
enrollments (no updater bundle = not a target).

Connection drops during /update-self are treated as success — the
updater re-execs itself mid-response, so httpx always raises.

Pydantic models live in decnet/web/db/models.py (single source of
truth). 24 tests cover happy paths, rollback, transport failures,
include_self ordering (skip on rolled-back agents), validation, and
RBAC gating.
2026-04-19 01:01:09 -04:00
f5a5fec607 feat(deploy): systemd units w/ capability-based hardening; updater restarts agent via systemctl
Add deploy/ unit files for every DECNET daemon (agent, updater, api, web,
swarmctl, listener, forwarder). All run as User=decnet with NoNewPrivileges,
ProtectSystem, PrivateTmp, LockPersonality; AmbientCapabilities=CAP_NET_ADMIN
CAP_NET_RAW only on the agent (MACVLAN/scapy). Existing api/web units migrated
to /opt/decnet layout and the same hardening stanza.

Make the updater's _spawn_agent systemd-aware: under systemd (detected via
INVOCATION_ID + systemctl on PATH), `systemctl restart decnet-agent.service`
replaces the Popen path so the new agent inherits the unit's ambient caps
instead of the updater's empty set. _stop_agent becomes a no-op in that mode
to avoid racing systemctl's own stop phase.

Tests cover the dispatcher branch selection, MainPID parsing, and the
systemd no-op stop.
2026-04-19 00:44:06 -04:00
40d3e86e55 fix(updater): bootstrap fresh venv with deps; rebuild self-update argv from env
- _run_pip: on first venv use, install decnet with its full dep tree so the
  bootstrapped environment actually has typer/fastapi/uvicorn. Subsequent
  updates keep --no-deps for a near-no-op refresh.
- run_update_self: do not reuse sys.argv to re-exec the updater. Inside the
  live process, sys.argv is the uvicorn subprocess invocation (--ssl-keyfile
  etc.), which 'decnet updater' CLI rejects. Reconstruct the operator-visible
  command from env vars set by updater.server.run.
2026-04-18 23:51:41 -04:00
ebeaf08a49 fix(updater): fall back to /proc scan when agent.pid is missing
If the agent was started outside the updater (manually, during dev,
or from a prior systemd unit), there is no agent.pid for _stop_agent
to target, so a successful code install leaves the old in-memory
agent process still serving requests. Scan /proc for any decnet agent
command and SIGTERM all matches so restart is reliable regardless of
how the agent was originally launched.
2026-04-18 23:42:26 -04:00
7765b36c50 feat(updater): remote self-update daemon with auto-rollback
Adds a separate `decnet updater` daemon on each worker that owns the
agent's release directory and installs tarball pushes from the master
over mTLS. A normal `/update` never touches the updater itself, so the
updater is always a known-good rescuer if a bad agent push breaks
/health — the rotation is reversed and the agent restarted against the
previous release. `POST /update-self` handles updater upgrades
explicitly (no auto-rollback).

- decnet/updater/: executor, FastAPI app, uvicorn launcher
- decnet/swarm/updater_client.py, tar_tree.py: master-side push
- cli: `decnet updater`, `decnet swarm update [--host|--all]
  [--include-self] [--dry-run]`, `--updater` on `swarm enroll`
- enrollment API issues a second cert (CN=updater@<host>) signed by the
  same CA; SwarmHost records updater_cert_fingerprint
- tests: executor, app, CLI, tar tree, enroll-with-updater (37 new)
- wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link
2026-04-18 21:40:21 -04:00
8914c27220 feat(swarm): add decnet swarm deckies to list deployed shards by host
`swarm list` only shows enrolled workers — there was no way to see which
deckies are running and where. Adds GET /swarm/deckies on the controller
(joins DeckyShard with SwarmHost for name/address/status) plus the CLI
wrapper with --host / --state filters and --json.
2026-04-18 21:10:07 -04:00
4db9c7464c fix(swarm): relocalize master-built config on worker before deploy
deploy --mode swarm was failing on every heterogeneous fleet: the master
populates config.interface from its own box (detect_interface() → its
default NIC), then ships that verbatim. The worker's deployer then calls
get_host_ip(config.interface), hits 'ip addr show wlp6s0' on a VM whose
NIC is enp0s3, and 500s.

Fix: agent.executor._relocalize() runs on every swarm-mode deploy.
Re-detects the worker's interface/subnet/gateway/host_ip locally and
swaps them into the config before calling deployer.deploy(). When the
worker's subnet doesn't match the master's, decky IPs are re-allocated
from the worker's subnet via allocate_ips() so they're reachable.

Unihost-mode configs are left untouched — they're already built against
the local box and second-guessing them would be wrong.

Validated against anti@192.168.1.13: master dispatched interface=wlp6s0,
agent logged 'relocalized interface=enp0s3', deployer ran successfully,
dry-run returned ok=deployed.

4 new tests cover both branches (matching-subnet preserves decky IPs;
mismatch re-allocates), the end-to-end executor.deploy() path, and the
unihost short-circuit.
2026-04-18 20:41:21 -04:00
411a797120 feat(cli): add decnet swarm check wrapper for POST /swarm/check
The swarmctl API already exposes POST /swarm/check — an active mTLS
probe that refreshes SwarmHost.status + last_heartbeat for every
enrolled worker. The CLI was missing a wrapper, so operators had to
curl the endpoint directly (which is how the VM validation run did it,
and how the wiki Deployment-Modes / SWARM-Mode pages ended up doc'ing
a command that didn't exist yet).

Matches the existing list/enroll/decommission pattern: typer subcommand
under swarm_app, --url override, Rich table output plus --json for
scripting. Three tests: populated table, empty-swarm path, and --json
emission.
2026-04-18 20:28:34 -04:00
3da5a2c4ee feat(cli): add decnet listener + --agent-dir on agent
New `decnet listener` command runs the master-side RFC 5425 syslog-TLS
receiver as a standalone process (mirrors `decnet api` / `decnet swarmctl`
pattern, SIGTERM/SIGINT handlers, --daemon support).

`decnet agent` now accepts --agent-dir so operators running the worker
agent under sudo/root can point at a bundle outside /root/.decnet/agent
(the HOME under sudo propagation).

Both flags were needed to stand up the full SWARM pipeline end-to-end on
a throwaway VM: mTLS control plane reachable, syslog-over-TLS wire
confirmed via tcpdump, master-crash/resume proved with zero loss and
zero duplication across 10 forwarded lines.

pyproject: bump asyncmy floor to 0.2.11 (resolver already pulled this in).
2026-04-18 20:15:25 -04:00
bfc7af000a test(swarm): add forwarder/listener resilience scenarios
Covers failure modes the happy-path tests miss:

- log rotation (copytruncate): st_size shrinks under the forwarder, it
  resets offset=0 and reships the new contents instead of getting wedged
  past EOF;
- listener restart: forwarder retries, resumes from the persisted offset,
  and the previously-acked lines are NOT duplicated on the master;
- listener tolerates a well-authenticated client that sends a partial
  octet-count frame and drops — the server must stay up and accept
  follow-on connections;
- peer_cn / fingerprint_from_ssl degrade to 'unknown' / None when no
  peer cert is available (defensive path that otherwise rarely fires).
2026-04-18 19:56:51 -04:00
1e8ca4cc05 feat(swarm-cli): add decnet swarm {enroll,list,decommission} + deploy --mode swarm
New sub-app talks HTTP to the local swarm controller (127.0.0.1:8770 by
default; override with --url or $DECNET_SWARMCTL_URL).

- enroll: POSTs /swarm/enroll, prints fingerprint, optionally writes
  ca.crt/worker.crt/worker.key to --out-dir for scp to the worker.
- list: renders enrolled workers as a rich table (with --status filter).
- decommission: looks up uuid by --name, confirms, DELETEs.

deploy --mode swarm now:
  1. fetches enrolled+active workers from the controller,
  2. round-robin-assigns host_uuid to each decky,
  3. POSTs the sharded DecnetConfig to /swarm/deploy,
  4. renders per-worker pass/fail in a results table.

Exits non-zero if no workers exist or any worker's dispatch failed.
2026-04-18 19:52:37 -04:00
a6430cac4c feat(swarm): add decnet forwarder CLI to run syslog-over-TLS forwarder
The forwarder module existed but had no runner — closes that gap so the
worker-side process can actually be launched and runs isolated from the
agent (asyncio.run + SIGTERM/SIGINT → stop_event).

Guards: refuses to start without a worker cert bundle or a resolvable
master host ($DECNET_SWARM_MASTER_HOST or --master-host).
2026-04-18 19:41:37 -04:00
39d2077a3a feat(swarm): syslog-over-TLS log pipeline (RFC 5425, TCP 6514)
Worker-side log_forwarder tails the local RFC 5424 log file and ships
each line as an octet-counted frame to the master over mTLS. Offset is
persisted in a tiny local SQLite so master outages never cause loss or
duplication — reconnect resumes from the exact byte where the previous
session left off. Impostor workers (cert not signed by DECNET CA) are
rejected at TLS handshake.

Master-side log_listener terminates mTLS on 0.0.0.0:6514, validates the
client cert, extracts the peer CN as authoritative worker provenance,
and appends each frame to the master's ingest log files. Attacker-
controlled syslog HOSTNAME field is ignored — the CA-controlled CN is
the only source of provenance.

7 tests added covering framing codec, offset persistence across
reopens, end-to-end mTLS delivery, crash-resilience (offset survives
restart, no duplicate shipping), and impostor-CA rejection.

DECNET_SWARM_SYSLOG_PORT / DECNET_SWARM_MASTER_HOST env bindings
added.
2026-04-18 19:33:58 -04:00
e2d6f857b5 refactor(swarm): move router DTOs into decnet/web/db/models.py
_schemas.py was a local exception to the codebase convention. The rest
of the app keeps all API request/response DTOs in decnet/web/db/models.py
alongside UserResponse, DeployIniRequest, etc. — the swarm endpoints now
follow the same convention (SwarmEnrollRequest, SwarmHostView, etc).
Deletes decnet/web/router/swarm/_schemas.py.
2026-04-18 19:28:15 -04:00
811136e600 refactor(swarm): one file per endpoint, matching existing router layout
Splits the three grouped router files into eight api_<verb>_<resource>.py
modules under decnet/web/router/swarm/ to match the convention used by
router/fleet/ and router/config/. Shared request/response models live in
_schemas.py. Keeps each endpoint easy to locate and modify without
stepping on siblings.
2026-04-18 19:23:06 -04:00
63b0a58527 feat(swarm): master-side SWARM controller (swarmctl) + agent CLI
Adds decnet/web/swarm_api.py as an independent FastAPI app with routers
for host enrollment, deployment dispatch (sharding DecnetConfig across
enrolled workers via AgentClient), and active health probing. Runs as
its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation
pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for
the worker side.

29 tests added under tests/swarm/test_swarm_api.py cover enrollment
(including bundle generation + duplicate rejection), host CRUD, sharding
correctness, non-swarm-mode rejection, teardown, and health probes with
a stubbed AgentClient.
2026-04-18 19:18:33 -04:00
cd0057c129 feat(swarm): DeckyConfig.host_uuid + fix agent log/status field refs
- decnet.models.DeckyConfig grows an optional 'host_uuid' (the SwarmHost
  that runs this decky). Defaults to None so legacy unihost state files
  deserialize unchanged.
- decnet.agent.executor: replace non-existent config.name references
  with config.mode / config.interface in logs and status payload.
- tests/swarm/test_state_schema.py covers legacy-dict roundtrip, field
  default, and swarm-mode assignments.
2026-04-18 19:10:25 -04:00
0c77cdab32 feat(swarm): master AgentClient — mTLS httpx wrapper around worker API
decnet.swarm.client exposes:
- MasterIdentity / ensure_master_identity(): the master's own CA-signed
  client bundle, issued once into ~/.decnet/ca/master/.
- AgentClient: async-context httpx wrapper that talks to a worker agent
  over mTLS. health/status/deploy/teardown methods mirror the agent API.

SSL context is built from a bare ssl.SSLContext(PROTOCOL_TLS_CLIENT)
instead of httpx.create_ssl_context — the latter layers on default-CA
and purpose logic that broke private-CA mTLS. Server cert is pinned by
CA + chain, not DNS (workers enroll with arbitrary SANs).

tests/swarm/test_client_agent_roundtrip.py spins uvicorn in-process
with real certs on disk and verifies:
- A CA-signed master client passes health + status calls.
- An impostor whose cert comes from a different CA cannot connect.
2026-04-18 19:08:36 -04:00
8257bcc031 feat(swarm): worker agent + fix pre-existing base_repo coverage test
Worker agent (decnet.agent):
- mTLS FastAPI service exposing /deploy, /teardown, /status, /health,
  /mutate. uvicorn enforces CERT_REQUIRED with the DECNET CA pinned.
- executor.py offloads the blocking deployer onto asyncio.to_thread so
  the event loop stays responsive.
- server.py refuses to start without an enrolled bundle in
  ~/.decnet/agent/ — unauthenticated agents are not a supported mode.
- docs/openapi disabled on the agent — narrow attack surface.

tests/test_base_repo.py: DummyRepo was missing get_attacker_artifacts
(pre-existing abstractmethod) and so could not be instantiated. Added
the stub + coverage for the new swarm CRUD surface on BaseRepository.
2026-04-18 07:15:53 -04:00
d3b90679c5 feat(swarm): PKI module — self-managed CA for master/worker mTLS
decnet.swarm.pki provides:
- generate_ca() / ensure_ca() — self-signed root, PKCS8 PEM, 4096-bit.
- issue_worker_cert() — per-worker keypair + cert signed by the CA with
  serverAuth + clientAuth EKU so the same identity backs the agent's
  HTTPS endpoint AND the syslog-over-TLS upstream.
- write_worker_bundle() / load_worker_bundle() — persist with 0600 on
  private keys.
- fingerprint() — SHA-256 DER hex for master-side pinning.

tests/swarm/test_pki.py covers:
- CA idempotency on disk.
- Signed chain validates against CA subject.
- SAN population (DNS + IP).
- Bundle roundtrip with 0600 key perms.
- End-to-end mTLS handshake between two CA-issued peers.
- Cross-CA client rejection (handshake fails).
2026-04-18 07:09:58 -04:00
6657d3e097 feat(swarm): add SwarmHost and DeckyShard tables + repo CRUD
Introduces the master-side persistence layer for swarm mode:
- SwarmHost: enrolled worker metadata, cert fingerprint, heartbeat.
- DeckyShard: per-decky host assignment, state, last error.
Repo methods are added as default-raising on BaseRepository so unihost
deployments are untouched; SQLModelRepository implements them (shared
between the sqlite and mysql subclasses per the existing pattern).
2026-04-18 07:09:29 -04:00
293da364a6 chores: fix linting 2026-04-18 06:46:10 -04:00
d5e6ca1949 chore(gitignore): ignore sqlite WAL files and per-subsystem runtime logs
decnet.collector.log / decnet.system.log and the *.db-shm / *.db-wal
sidecars produced by the sqlite WAL journal were slipping through the
existing rules. Extend the patterns so runtime state doesn't show up
in git status.
2026-04-18 05:38:09 -04:00
a97696fa23 chore: add env.config.example documenting DECNET env vars
Reference template for .env / .env.local showing every variable that
decnet/env.py consumes, with short rationale per section (system
logging, embedded workers, profiling, API server, …). Copy to .env
and fill in secrets; .env itself stays gitignored.
2026-04-18 05:37:50 -04:00
7864c72948 test(ssh): add unit coverage for emit_capture RFC 5424 output
Exercises the JSON → syslog formatter end to end: flat fields ride as
SD params, bulky nested metadata collapses into the meta_json_b64 blob,
and the event_type / hostname / service mapping lands in the right
RFC 5424 header slots.
2026-04-18 05:37:42 -04:00
47a0480994 feat(web-ui): event-body parser and dashboard/live-logs polish
Frontend now handles syslog lines from producers that don't use
structured-data (notably the SSH PROMPT_COMMAND hook, which emits
'CMD uid=0 user=root src=IP pwd=… cmd=…' as a plain logger message).
A new parseEventBody utility splits the body into head + key/value
pairs and preserves the final value verbatim so commands stay intact.

Dashboard and LiveLogs use this parser to render consistent pills
whether the structure came from SD params or from the MSG body.
2026-04-18 05:37:31 -04:00
2bf886e18e feat(sniffer): probe ipvlan host iface when macvlan is absent
The host-side sniffer interface depends on the deploy's driver choice
(--ipvlan flag). Instead of hardcoding HOST_MACVLAN_IFACE, probe both
names and pick whichever exists; warn and disable cleanly if neither
is present. Explicit DECNET_SNIFFER_IFACE still wins.
2026-04-18 05:37:20 -04:00
8bdc5b98c9 feat(collector): parse real PROCID and extract IPs from logger kv pairs
- Relaxed RFC 5424 regex to accept either NILVALUE or a numeric PROCID;
  sshd / sudo go through rsyslog with their real PID, while
  syslog_bridge emitters keep using '-'.
- Added a fallback pass that scans the MSG body for IP-shaped
  key=value tokens. This rescues attacker attribution for plain logger
  callers like the SSH PROMPT_COMMAND shim, which emits
  'CMD … src=IP …' without SD-element params.
2026-04-18 05:37:08 -04:00
aa39be909a feat(templates): ship syslog_bridge.py to every service template
Each honeypot container now carries its own copy of the shared RFC 5424
formatter. Services that previously rolled their own ad-hoc syslog
lines can now import syslog_line / write_syslog_file for a consistent
SD-element format that the collector already knows how to parse.
2026-04-18 05:36:56 -04:00
41fd496128 feat(web): attacker artifacts endpoint + UI drawer
Adds the server-side wiring and frontend UI to surface files captured
by the SSH honeypot for a given attacker.

- New repository method get_attacker_artifacts (abstract + SQLModel
  impl) that joins the attacker's IP to `file_captured` log rows.
- New route GET /attackers/{uuid}/artifacts.
- New router /artifacts/{decky}/{service}/{stored_as} that streams a
  quarantined file back to an authenticated viewer.
- AttackerDetail grows an ArtifactDrawer panel with per-file metadata
  (sha256, size, orig_path) and a download action.
- ssh service fragment now sets NODE_NAME=decky_name so logs and the
  host-side artifacts bind-mount share the same decky identifier.
2026-04-18 05:36:48 -04:00
39dafaf384 feat(ssh-stealth): hide capture artifacts via XOR+gzip entrypoint blob
The /opt/emit_capture.py, /opt/syslog_bridge.py, and
/usr/libexec/udev/journal-relay files were plaintext and world-readable
to any attacker root-shelled into the SSH honeypot — revealing the full
capture logic on a single cat.

Pack all three into /entrypoint.sh as XOR+gzip+base64 blobs at build
time (_build_stealth.py), then decode in-memory at container start and
exec the capture loop from a bash -c string. No .py files under /opt,
no journal-relay file under /usr/libexec/udev, no argv_zap name
anywhere. The LD_PRELOAD shim is installed as
/usr/lib/x86_64-linux-gnu/libudev-shared.so.1 — sits next to the real
libudev.so.1 and blends into the multiarch layout.

A 1-byte random XOR key is chosen at image build so a bare
'base64 -d | gunzip' probe on the visible entrypoint returns binary
noise instead of readable Python.

Docker-dependent tests live under tests/docker/ behind a new 'docker'
pytest marker (excluded from the default run, same pattern as fuzz /
live / bench).
2026-04-18 05:34:50 -04:00
b0e00a6cc4 fix(ssh-capture): drop relay FIFO, rsyslog→/proc/1/fd/1 direct
The named pipe at /run/systemd/journal/syslog-relay had two problems
beyond its argv leak: any root-in-container process could (a) `cat`
the pipe and watch the live SIEM feed, and (b) write to it and inject
forged log lines. Since an attacker with a shell is already root
inside the honeypot, file permissions can't fix it.

Point rsyslog's auth/user actions directly at /proc/1/fd/1 — the
container-stdout fd Docker attached to PID 1 — and delete the
mkfifo + cat relay from the entrypoint. No pipe on disk, nothing to
read, nothing to inject, and one fewer cloaked process in `ps`.
2026-04-18 02:12:32 -04:00
2843aafa1a fix(ssh-capture): hide watcher bash argv and sanitize script header
Two leaks remained after the inotifywait argv fix:

1. The bash running journal-relay showed its argv[1] (the script path)
   in /proc/PID/cmdline, producing a line like
     'journal-relay /usr/libexec/udev/journal-relay'
   Apply argv_zap.so to that bash too.

2. argv_zap previously hardcoded PR_SET_NAME to 'kmsg-watch', which was
   wrong for any caller other than inotifywait. The comm name now comes
   from ARGV_ZAP_COMM so each caller can pick its own (kmsg-watch for
   inotifywait, journal-relay for the watcher bash).

3. The capture.sh header started with 'SSH honeypot file-catcher' —
   fatal if an attacker runs 'cat' on it. Rewritten as a plausible
   systemd-journal relay helper; stray 'attacker' / 'honeypot' words
   in mid-script comments stripped too.
2026-04-18 02:06:36 -04:00
766eeb3d83 feat(ssh): add ping/nmap/ca-certificates to base image
A lived-in Linux box ships with iputils-ping, ca-certificates, and nmap
available. Their absence is a cheap tell, and they're handy for letting
the attacker move laterally in ways we want to observe. iproute2 (ip a)
was already installed for attribution — noted here for completeness.
2026-04-18 01:53:33 -04:00
f462835373 feat(ssh-capture): LD_PRELOAD shim to zero inotifywait argv
The kmsg-watch (inotifywait) process was the last honest giveaway in
`ps aux` — its watch paths and event flags betrayed the honeypot. The
argv_zap.so shim hooks __libc_start_main, heap-copies argv for the real
main, then memsets the contiguous argv[1..] region to NUL so the kernel's
cmdline reader returns just argv[0].

gcc is installed and purged in the same Docker layer to keep the image
slim. The shim also calls prctl(PR_SET_NAME) so /proc/self/comm mirrors
the argv[0] disguise.
2026-04-18 01:52:30 -04:00
e356829234 fix(ssh-capture): drop bash token from journal-relay ps line
exec -a replaces argv[0] so ps shows 'journal-relay /usr/libexec/udev/journal-relay'
instead of '/bin/bash /usr/libexec/udev/journal-relay' — no interpreter
hint on the watcher process.
2026-04-18 01:45:38 -04:00
a5d6860124 fix(ssh-capture): collapse duplicate journal-relay bash in ps
inotify | while spawns a subshell for the tail of the pipeline, so
two bash processes (the script itself and the while-loop subshell)
showed up under /usr/libexec/udev/journal-relay in ps aux. Enable
lastpipe so the while loop runs in the main shell — ps now shows
one bash plus the inotify child, matching a simple udev helper.
2026-04-17 23:04:33 -04:00
8dd4c78b33 refactor: strip DECNET tokens from container-visible surface
Rename the container-side logging module decnet_logging → syslog_bridge
(canonical at templates/syslog_bridge.py, synced into each template by
the deployer). Drop the stale per-template copies; setuptools find was
picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay"
so no obvious token appears in the rendered container image.

Apply the same cloaking pattern to the telnet template that SSH got:
syslog pipe moves to /run/systemd/journal/syslog-relay and the relay
is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename
99-decnet.conf → 50-journal-forward.conf. SSH capture script:
/var/decnet/captured → /var/lib/systemd/coredump (real systemd path),
logger tag decnet-capture → systemd-journal. Compose volume updated
to match the new in-container quarantine path.

SD element ID shifts decnet@55555 → relay@55555; synced across
collector, parser, sniffer, prober, formatter, tests, and docs so the
host-side pipeline still matches what containers emit.
2026-04-17 22:57:53 -04:00
69510fb880 fix(ssh-capture): cloak syslog relay pipe and cat process
Rename the rsyslog→stdout pipe from /var/run/decnet-logs (dead giveaway)
to /run/systemd/journal/syslog-relay, and launch the relay via
exec -a "systemd-journal-fwd" so ps shows a plausible systemd forwarder
instead of a bare cat. Casual ps/ls inspection now shows nothing
with "decnet" in the name.
2026-04-17 22:51:34 -04:00
09d9f8595e fix(ssh-capture): disguise watcher as udev helper in ps output
Old ps output was a dead giveaway: two "decnet-capture" bash procs
and a raw "inotifywait". Install script at /usr/libexec/udev/journal-relay
and invoke inotifywait through a /usr/libexec/udev/kmsg-watch symlink so
both now render as plausible udev/journal helpers under casual inspection.
2026-04-17 22:44:47 -04:00
bfb3edbd4a fix(ssh-capture): add ss-only attribution fallback
fuser and /proc fd walks race scp/wget/sftp — by close_write the writer
has already closed the fd, so pid-chain attribution always resolved to
unknown for non-interactive drops. Fall back to the ss snapshot: one
established session → ss-only, multiple → ss-ambiguous (still record
src_ip from the first, analysts cross-check concurrent_sessions).
2026-04-17 22:36:06 -04:00
a773dddd5c feat(ssh): capture attacker-dropped files with session attribution
inotifywait watches writable paths in the SSH decky and mirrors any
file close_write/moved_to into a per-decky host-mounted quarantine dir.
Each artifact carries a .meta.json with attacker attribution resolved
by walking the writer PID's PPid chain to the sshd session leader,
then cross-referencing ss and utmp for source IP/user/login time.
Also emits an RFC 5424 syslog line per capture for SIEM correlation.
2026-04-17 22:20:05 -04:00
edc5c59f93 docs(profiles): archive locust run artifacts under development/profiles
Commit-by-commit evidence of the perf work: each CSV is the raw
Locust output for the commit hash in its filename, plus the four
fb69a06 variants (single worker, tracing on/off, single-core pinned,
12 workers) referenced in the README baseline table.
2026-04-17 22:05:35 -04:00
1f758a3669 chore(profile): tolerate null/empty frames in walk_self_time
Some pyinstrument frame trees contain branches where an identifier is
missing (typically at the very top or with certain async boundaries),
which crashed the aggregator with a KeyError mid-run. Short-circuit
on None frames and missing identifiers so a single ugly HTML no
longer kills the summary of the other few hundred.
2026-04-17 22:04:29 -04:00
6c22f9ba59 fix(deps): add cryptography for asyncmy MySQL auth
asyncmy needs cryptography for caching_sha2_password (the MySQL 8
default auth plugin). Without it, connection handshake fails the
moment the server negotiates the modern plugin.
2026-04-17 22:04:24 -04:00
20fa1f9a63 docs: record single-worker / multi-worker perf baseline
Capture Locust numbers from the fb69a06 branch across five
configurations so future regressions have something to measure against.

- 500u tracing-on single-worker: ~960 RPS / p99 2.9 s
- 1500u tracing-on single-worker: ~880 RPS / p99 9.5 s
- 1500u tracing-off single-worker: ~990 RPS / p99 8.4 s
- 1500u tracing-off pinned to one core: ~46 RPS / p99 122 s
- 1500u tracing-off 12 workers: ~1585 RPS / p99 4.2 s

Also note MySQL max_connections math (pool_size * max_overflow *
workers = 720) to explain why the default 151 needs bumping, and the
Python 3.14 GC segfault so nobody repeats that mistake.
2026-04-17 22:03:50 -04:00
fb69a06ab3 fix(db): detach session cleanup onto fresh task on cancellation
Previous attempt (shield + sync invalidate fallback) didn't work
because shield only protects against cancellation from *other* tasks.
When the caller task itself is cancelled mid-query, its next await
re-raises CancelledError as soon as the shielded coroutine yields —
rollback inside session.close() never completes, the aiomysql
connection is orphaned, and the pool logs 'non-checked-in connection'
when GC finally reaches it.

Hand exception-path cleanup to loop.create_task() so the new task
isn't subject to the caller's pending cancellation. close() (and the
invalidate() fallback for a dead connection) runs to completion.
Success path is unchanged — still awaits close() inline so callers
see commit visibility and pool release before proceeding.
2026-04-17 21:13:43 -04:00
1446f6da94 fix(db): invalidate pool connection when cancelled close fails
Under high-concurrency MySQL load, uvicorn cancels request tasks when
clients disconnect.  If cancellation lands mid-query, session.close()
tries to ROLLBACK on a connection that aiomysql has already marked as
closed — raising InterfaceError("Cancelled during execution") and
leaving the connection checked-out until GC, which the pool then
warns about as a 'non-checked-in connection'.

The old fallback tried sync.rollback() + sync.close(), but those still
go through the async driver and fail the same way on a dead connection.
Replace them with session.sync_session.invalidate(), which just flips
the pool's internal record — no I/O, so it can't be cancelled — and
tells the pool to drop the connection immediately instead of waiting
for garbage collection.
2026-04-17 21:04:04 -04:00
e967aaabfb perf: cache get_user_by_username on the login hot path
Locust @task(2) hammers /auth/login in steady state on top of the
on_start burst. After caching the uuid-keyed user lookup and every
other read endpoint, login alone accounted for 47% of total
_execute at 500c/u — pure DB queueing on SELECT users WHERE
username=?.

5s TTL, positive hits only (misses bypass so a freshly-created
user can log in immediately). Password verify still runs against
the cached hash, so security is unchanged — the only staleness
window is: a changed password accepts the old password for up to
5s until invalidate_user_cache fires (it's called on every write).
2026-04-17 20:36:39 -04:00
255c2e5eb7 perf: cache auth user-lookup and admin list_users
The per-request SELECT users WHERE uuid=? in require_role was the
hidden tax behind every authed endpoint — it kept _execute at ~60%
across the profile even after the page caches landed. Even /health
(with its DB and Docker probes cached) was still 52% _execute from
this one query.

- dependencies.py: 10s TTL cache on get_user_by_uuid, well below JWT
  expiry. invalidate_user_cache(uuid) is called on password change,
  role change, and user delete.
- api_get_config.py: 5s TTL cache on the admin branch's list_users()
  (previously fetched every /config call). Invalidated on user
  create/update/delete.
- api_change_pass.py + api_manage_users.py: invalidation hooks on
  all user-mutating endpoints.
2026-04-17 19:56:39 -04:00
2dd86fb3bb perf: cache /bounty, /logs/histogram, /deckies; bump /config TTL to 5s
Round-2 follow-up: profile at 500c/u showed _execute still dominating
the uncached read endpoints (/bounty 76%, /logs/histogram 73%,
/deckies 56%). Same router-level TTL pattern as /stats — 5s window,
asyncio.Lock to collapse concurrent calls into one DB hit.

- /bounty: cache default unfiltered page (limit=50, offset=0,
  bounty_type=None, search=None). Filtered requests bypass.
- /logs/histogram: cache default (interval_minutes=15, no filters).
  Filtered / non-default interval requests bypass.
- /deckies: cache full response (endpoint takes no params).
- /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare,
  1s was too short for bursts to coalesce at high concurrency.
2026-04-17 19:30:11 -04:00
3106d03135 perf(db): default pool_pre_ping=false for SQLite
SQLite is a local file — a SELECT 1 per session checkout is pure
overhead. Env var DECNET_DB_POOL_PRE_PING stays for anyone running
on a network-mounted volume. MySQL backend keeps its current default.
2026-04-17 19:11:07 -04:00
3cc5ba36e8 fix(cli): keep FileNotFoundError handling on decnet api
Popen moved inside the try so a missing uvicorn falls through to the
existing error message instead of crashing the CLI. test_cli was still
patching the old subprocess.run entrypoint; switched both api command
tests to patch subprocess.Popen / os.killpg to match the current path.
2026-04-17 19:09:15 -04:00
6301504c0e perf(api): TTL-cache /stats + unfiltered pagination counts
Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT
attacker_ip) FROM logs; every /logs and /attackers call ran an
unfiltered count for the paginator. At 500 concurrent users these
serialize through aiosqlite's worker threads and dominate wall time.

Cache at the router layer (repo stays dialect-agnostic):
  - /stats response: 5s TTL
  - /logs total (only when no filters): 2s TTL
  - /attackers total (only when no filters): 2s TTL

Filtered paths bypass the cache. Pattern reused from api_get_config
and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).
2026-04-17 19:09:15 -04:00
de4b64d857 perf(auth): avoid duplicate user lookup in require_role
require_role._check previously chained from get_current_user, which
already loaded the user — then looked it up again. Inline the decode +
single user fetch + must_change_password + role check so every
authenticated request costs one SELECT users WHERE uuid=? instead of
two.
2026-04-17 17:48:42 -04:00
b5d7bf818f feat(health): 3-tier status (healthy / degraded / unhealthy)
Only database, docker, and ingestion_worker now count as critical
(→ 503 unhealthy). attacker/sniffer/collector failures drop overall
status to degraded (still 200) so the dashboard doesn't panic when a
non-essential worker isn't running.
2026-04-17 17:48:42 -04:00
257f780d0f docs(bugs): document SSE /api/v1/stream BrokenPipe storm (BUG-003) 2026-04-17 17:48:42 -04:00
a10aee282f perf(ingester): batch log writes into bulk commits
The ingester now accumulates up to DECNET_BATCH_SIZE rows (default 100)
or DECNET_BATCH_MAX_WAIT_MS (default 250ms) before flushing through
repo.add_logs — one transaction, one COMMIT per batch instead of per
row. Under attacker traffic this collapses N commits into ⌈N/100⌉ and
takes most of the SQLite writer-lock contention off the hot path.

Flush semantics are cancel-safe: _position only advances after a batch
commits successfully, and the flush helper bails without touching the
DB if the enclosing task is being cancelled (lifespan teardown).
Un-flushed lines stay in the file and are re-read on next startup.

Tests updated to assert on add_logs (bulk) instead of the per-row
add_log that the ingester no longer uses, plus a new test that 250
lines flush in ≤5 calls.
2026-04-17 16:37:34 -04:00
11b9e85874 feat(db): bulk add_logs for one-commit ingestion batches
Adds BaseRepository.add_logs (default: loops add_log for backwards
compatibility) and a real single-session/single-commit implementation
on SQLModelRepository. Introduces DECNET_BATCH_SIZE (default 100) and
DECNET_BATCH_MAX_WAIT_MS (default 250) so the ingester can flush on
either a size or a time bound when it adopts the new method.

Ingester wiring is deferred to a later pass — the single-log path was
deadlocking tests when flushed during lifespan teardown, so this change
ships the DB primitive alone.
2026-04-17 16:23:09 -04:00
45039bd621 fix(cache): lazy-init TTL cache locks to survive event-loop turnover
A module-level asyncio.Lock binds to the loop it was first awaited on.
Under pytest-anyio (and xdist) each test spins up a new loop; any later
test that hit /health or /config would wait on a lock owned by a dead
loop and the whole worker would hang.

Create the lock on first use and drop it in the test-reset helpers so a
fresh loop always gets a fresh lock.
2026-04-17 16:23:00 -04:00
4ea1c2ff4f fix(health): move Docker client+ping off the event loop
Under CPU saturation the sync docker.from_env()/ping() calls could miss
their socket timeout, cache _docker_healthy=False, and return 503 for
the full 5s TTL window. Both calls now run on a thread so the event
loop keeps serving other requests while Docker is being probed.
2026-04-17 15:43:51 -04:00
bb8d782e42 fix(cli): kill uvicorn worker tree on Ctrl+C
With --workers > 1, SIGINT from the terminal raced uvicorn's supervisor:
some workers got signaled directly, the supervisor respawned them, and
the result behaved like a forkbomb. Start uvicorn in its own session and
signal the whole process group (SIGTERM → 10s grace → SIGKILL) when we
catch KeyboardInterrupt.
2026-04-17 15:32:08 -04:00
342916ca63 feat(cli): expose --workers on decnet api
Forwards straight to uvicorn's --workers. Default stays at 1 so the
single-worker efficiency direction is preserved; raising it is available
for threat-actor load scenarios where the honeypot needs to soak real
attack traffic without queueing on one event loop.
2026-04-17 15:22:45 -04:00
d3f4bbb62b perf(locust): skip change-password in on_start when not required
Previously every user did login → change-pass → re-login in on_start
regardless of whether the server actually required a password change.
With bcrypt at ~250ms/call that's 3 bcrypt-bound requests per user.
At 2500 users the on_start queue was ~10k bcrypt ops — users never
escaped warmup, so @task endpoints never fired.

Login already returns must_change_password; only run the change-pass
+ re-login dance when the server says we have to. Cuts on_start from
3 requests to 1 for every user after the first DB initialization.
2026-04-17 15:15:59 -04:00
32340bea0d perf: migrate hot-path JSON serialization to orjson
stdlib json was FastAPI's default. Every response body, every SSE frame,
and every add_log/state/payload write paid the stdlib encode cost.

- pyproject.toml: add orjson>=3.10 as a core dep.
- decnet/web/api.py: default_response_class=ORJSONResponse on the
  FastAPI app, so every endpoint return goes through orjson without
  touching call sites. Explicit JSONResponse sites in the validation
  exception handlers migrated to ORJSONResponse for consistency.
- health endpoint's explicit JSONResponse → ORJSONResponse.
- SSE stream (api_stream_events.py): 6 json.dumps call sites →
  orjson.dumps(...).decode() — the per-event frames that fire on every
  sse tick.
- sqlmodel_repo.py: encode sites on the log-insert path switched to
  orjson (fields, payload, state value). Parser sites (json.loads)
  left as-is for now — not on the measured hot path.
2026-04-17 15:07:28 -04:00
f1e14280c0 perf: 1s TTL cache for /health DB probe and /config state reads
Locust hit /health and /config on every @task(3), so each request was
firing repo.get_total_logs() and two repo.get_state() calls against
aiosqlite — filling the driver queue for data that changes on the order
of seconds, not milliseconds.

Both caches follow the shape already used by the existing Docker cache:
- asyncio.Lock with double-checked TTL so concurrent callers collapse
  into one DB hit per 1s window.
- _reset_* helpers called from tests/api/conftest.py::setup_db so the
  module-level cache can't leak across tests.

tests/test_health_config_cache.py asserts 50 concurrent callers
produce exactly 1 repo call, and the cache expires after TTL.
2026-04-17 15:05:18 -04:00
931f33fb06 perf: cache Docker daemon ping in /health (5s TTL)
Creating a new docker.from_env() client per /health request opened a
fresh unix-socket connection each time. Under load that's wasteful and
hammers dockerd.

Keep a module-level client + last-check timestamp; actually ping every
5 seconds, return cached state in between. Reset helper provided for
tests.
2026-04-17 15:01:53 -04:00
467511e997 db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL
- aiomysql → asyncmy on both sides of the URL/import (faster, maintained).
- Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE /
  PRE_PING for both SQLite and MySQL engines so stress runs can bump
  without code edits.
- MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so
  concurrent uvicorn workers racing create_all() don't hit 'Table was
  skipped since its definition is being modified by concurrent DDL'.
- sqlite & mysql repo get_log_histogram use the shared _session() helper
  instead of session_factory() for consistency with the rest of the repo.
- SSE stream_events docstring updated to asyncmy.
2026-04-17 15:01:49 -04:00
3945e72e11 perf: run bcrypt on a thread so it doesn't block the event loop
verify_password / get_password_hash are CPU-bound and take ~250ms each
at rounds=12. Called directly from async endpoints, they stall every
other coroutine for that window — the single biggest single-worker
bottleneck on the login path.

Adds averify_password / ahash_password that wrap the sync versions in
asyncio.to_thread. Sync versions stay put because _ensure_admin_user and
tests still use them.

5 call sites updated: login, change-password, create-user, reset-password.
tests/test_auth_async.py asserts parallel averify runs concurrently (~1x
of a single verify, not 2x).
2026-04-17 14:52:22 -04:00
bd406090a7 fix: re-seed admin password when still unfinalized (must_change_password=True)
_ensure_admin_user was strict insert-if-missing: once a stale hash landed
in decnet.db (e.g. from a deploy that used a different DECNET_ADMIN_PASSWORD),
login silently 401'd because changing the env var later had no effect.

Now on startup: if the admin still has must_change_password=True (they
never finalized their own password), re-sync the hash from the current
env var. Once the admin sets a real password, we leave it alone.

Found via locustfile.py login storm — see tests/test_admin_seed.py.

Note: this commit also bundles uncommitted pool-management work already
present in sqlmodel_repo.py from prior sessions.
2026-04-17 14:49:13 -04:00
e22d057e68 added: scripts/profile/aggregate_requests.py — roll up pyinstrument request profiles
Parses every HTML in profiles/, reattributes [self]/[await] synthetic
leaves to their parent function, and reports per-endpoint wall-time
(mean/p50/p95/max) plus top hot functions by cumulative self-time.

Makes post-locust profile dirs actually readable — otherwise they're
just a pile of hundred-plus HTML files.
2026-04-17 14:48:59 -04:00
cb12e7c475 fix: logging handler must not crash its caller on reopen failure
When decnet.system.log is root-owned (e.g. created by a pre-fix 'sudo
decnet deploy') and a subsequent non-root process tries to log, the
InodeAwareRotatingFileHandler raised PermissionError out of emit(),
which propagated up through logger.debug/info and killed the collector's
log stream loop ('log stream ended ... reason=[Errno 13]').

Now matches stdlib behaviour: wrap _open() in try/except OSError and
defer to handleError() on failure. Adds a regression test.

Also: scripts/profile/view.sh 'pyinstrument' keyword was matching
memray-flamegraph-*.html files. Exclude the memray-* prefix.
2026-04-17 14:01:36 -04:00
c29ca977fd added: scripts/profile/classify_usage.py — classify memray usage_over_time.csv
Reads the memray usage CSV and emits a verdict based on tail-drop-from-
peak: CLIMB-AND-DROP, MOSTLY-RELEASED, or SUSTAINED-AT-PEAK. Deliberately
ignores net-growth-vs-baseline since any active workload grows vs. a cold
interpreter — that metric is misleading as a leak signal.
2026-04-17 13:54:37 -04:00
bf4afac70f fix: RotatingFileHandler reopens on external deletion/rotation
Mirrors the inode-check fix from 935a9a5 (collector worker) for the
stdlib-handler-based log paths. Both decnet.system.log (config.py) and
decnet.log (logging/file_handler.py) now use a subclass that stats the
target path before each emit and reopens on inode/device mismatch —
matching the behavior of stdlib WatchedFileHandler while preserving
size-based rotation.

Previously: rm decnet.system.log → handler kept writing to the orphaned
inode until maxBytes triggered; all lines between were lost.
2026-04-17 13:42:15 -04:00
4b15b7eb35 fix: chown log files to sudo-invoking user so non-root API can append
'sudo decnet deploy' needs root for MACVLAN, but the log files it creates
(decnet.log and decnet.system.log) end up owned by root. A subsequent
non-root 'decnet api' then crashes on PermissionError appending to them.

New decnet.privdrop helper reads SUDO_UID/SUDO_GID and chowns files/dirs
back to the invoking user. Best-effort: no-op when not root, not under
sudo, path missing, or chown fails. Applied at both log-file creation
sites (config.py system log, logging/file_handler.py syslog file).
2026-04-17 13:39:09 -04:00
140d2fbaad fix: gate embedded sniffer behind DECNET_EMBED_SNIFFER (default off)
The API's lifespan unconditionally spawned a MACVLAN sniffer task, which
duplicated the standalone 'decnet sniffer --daemon' process that
'decnet deploy' always starts — causing two workers to sniff the same
interface, double events, and wasted CPU.

Mirror the existing DECNET_EMBED_PROFILER pattern: sniffer is OFF by
default, opt in explicitly. Static regression tests guard against
accidental removal of the gate.
2026-04-17 13:35:43 -04:00
064c8760b6 fix: memray run needs --trace-python-allocators for frame attribution
Without it, 'Total number of frames seen: 0' in memray stats and flamegraphs
render empty / C-only. Also added --follow-fork so uvicorn workers spawned
as child processes are tracked.
2026-04-17 13:24:55 -04:00
6572c5cbaf added: scripts/profile/view.sh — auto-pick newest artifact and open viewer
Dispatches by extension: .prof -> snakeviz, memray .bin -> memray flamegraph
(overridable via VIEW=table|tree|stats|summary|leaks), .svg/.html -> xdg-open.
Positional arg can be a file path or a type keyword (cprofile, memray, pyspy,
pyinstrument).
2026-04-17 13:20:05 -04:00
ba448bae13 docs: py-spy 0.4.1 lacks Python 3.14 support; wrapper aborts early
Root cause of 'No python processes found in process <pid>': py-spy needs
per-release ABI knowledge and 0.4.1 (latest PyPI) predates 3.14. Wrapper
now detects the interpreter and points users at pyinstrument/memray/cProfile.
2026-04-17 13:17:23 -04:00
1a18377b0a fix: mysql url builder tests expect asyncmy, not aiomysql
The builder in decnet/web/db/mysql/database.py emits 'mysql+asyncmy://' URLs
(asyncmy is the declared dep in pyproject.toml). Tests were stale from a
prior aiomysql era.
2026-04-17 13:13:36 -04:00
319c1dbb61 added: profiling toolchain (py-spy, pyinstrument, pytest-benchmark, memray, snakeviz)
New `profile` optional-deps group, opt-in Pyinstrument ASGI middleware
gated by DECNET_PROFILE_REQUESTS, bench marker + tests/perf/ micro-benchmarks
for repository hot paths, and scripts/profile/ helpers for py-spy/cProfile/memray.
2026-04-17 13:13:00 -04:00
c1d8102253 modified: DEVELOPMENT roadmap. one step closer to v1 2026-04-16 11:39:07 -04:00
49f3002c94 added: docs; modified: .gitignore
Some checks failed
CI / Lint (ruff) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Successful in 40s
CI / Test (Standard) (3.11) (push) Successful in 2m38s
CI / Test (Standard) (3.12) (push) Successful in 2m56s
CI / Test (Live) (3.11) (push) Failing after 1m3s
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-16 02:10:38 -04:00
9b59f8672e chores: cleanup; added: viteconfig 2026-04-16 02:09:30 -04:00
296979003d fix: pytest -m live works without extra flags
Root cause: test_schemathesis.py mutates decnet.web.auth.SECRET_KEY at
module-level import time, poisoning JWT verification for all other tests
in the same process — even when fuzz tests are deselected.

- Add pytest_ignore_collect hook in tests/api/conftest.py to skip
  collecting test_schemathesis.py unless -m fuzz is selected
- Add --dist loadscope to addopts so xdist groups by module (protects
  module-scoped fixtures in live tests)
- Remove now-unnecessary xdist_group markers from live test classes
2026-04-16 01:55:38 -04:00
89099b903d fix: resolve schemathesis and live test failures
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
  prevent xdist from distributing them across workers (fixture isolation)
2026-04-16 01:39:04 -04:00
29578d9d99 fix: resolve all ruff and bandit lint/security issues
- Remove unused Optional import (F401) in telemetry.py
- Move imports above module-level code (E402) in web/db/models.py
- Default API/web hosts to 127.0.0.1 instead of 0.0.0.0 (B104)
- Add usedforsecurity=False to MD5 calls in JA3/HASSH fingerprinting (B324)
- Annotate intentional try/except/pass blocks with nosec (B110)
- Remove stale nosec comments that no longer suppress anything
2026-04-16 01:04:57 -04:00
70d8ffc607 feat: complete OTEL tracing across all services with pipeline bridge and docs
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.

Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.

Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
2026-04-16 00:58:08 -04:00
04db13afae feat: cross-stage trace propagation and granular per-event spans
Collector now creates a span per event and injects W3C trace context
into JSON records. Ingester extracts that context and creates child
spans, connecting the full event journey: collector -> ingester ->
db.add_log + extract_bounty -> db.add_bounty.

Profiler now creates per-IP spans inside update_profiles with rich
attributes (event_count, is_traversal, bounty_count, command_count).

Traces in Jaeger now show the complete execution map from capture
through ingestion and profiling.
2026-04-15 23:52:13 -04:00
d1a88e75bd fix: dynamic TracedRepository proxy + disable tracing in test suite
Replace brittle explicit method-by-method proxy with __getattr__-based
dynamic proxy that forwards all args/kwargs to the inner repo. Fixes
TypeError on get_logs_after_id() where concrete repo accepts extra
kwargs beyond the ABC signature.

Pin DECNET_DEVELOPER_TRACING=false in conftest.py so .env.local
settings don't leak into the test suite.
2026-04-15 23:46:46 -04:00
65ddb0b359 feat: add OpenTelemetry distributed tracing across all DECNET services
Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead).
When enabled, traces flow through FastAPI routes, background workers
(collector, ingester, profiler, sniffer, prober), engine/mutator
operations, and all DB calls via TracedRepository proxy.

Includes Jaeger docker-compose for local dev and 18 unit tests.
2026-04-15 23:23:13 -04:00
b437bc8eec fix: use unbuffered reads in proxy for SSE streaming
resp.read(4096) blocks until 4096 bytes accumulate, which stalls SSE
events (~100-500 bytes each) in the proxy buffer indefinitely. Switch
to read1() which returns bytes immediately available without waiting
for more. Also disable the 120s socket timeout for SSE connections.
2026-04-15 23:03:03 -04:00
a1ca5d699b fix: use dedicated thread pools for collector and sniffer workers
The collector spawned one permanent thread per Docker container via
asyncio.to_thread(), saturating the default asyncio executor. This
starved short-lived to_thread(load_state) calls in get_deckies() and
get_stats_summary(), causing the SSE stream and deckies endpoints to
hang indefinitely while other DB-only endpoints worked fine.

Give the collector and sniffer their own ThreadPoolExecutor so they
never compete with the default pool.
2026-04-15 22:57:03 -04:00
e9d151734d feat: deduplicate bounties on (bounty_type, attacker_ip, payload)
Before inserting a bounty, check whether an identical row already exists.
Drops silent duplicates to prevent DB saturation from aggressive scanners.
2026-04-15 18:02:52 -04:00
0ab97d0ade docs: document decnet domain models and fleet transformation 2026-04-15 18:01:27 -04:00
60de16be84 docs: document decnet collector worker 2026-04-15 17:56:24 -04:00
82ec7f3117 fix: gate embedded profiler behind DECNET_EMBED_PROFILER to prevent dual-instance cursor conflict
decnet deploy spawns a standalone profiler daemon AND api.py was also starting
attacker_profile_worker as an asyncio task inside the web server. Both instances
shared the same attacker_worker_cursor key in the state table, causing a race
where one instance could skip events already claimed by the other or overwrite
the cursor mid-batch.

Default is now OFF (embedded profiler disabled). The standalone daemon started
by decnet deploy is the single authoritative instance. Set DECNET_EMBED_PROFILER=true
only when running decnet api in isolation without a full deploy.
2026-04-15 17:49:18 -04:00
11d749f13d fix: wire prober tcpfp_fingerprint events into sniffer_rollup for OS/hop detection
The active prober emits tcpfp_fingerprint events with TTL, window, MSS etc.
from the attacker's SYN-ACK. These were invisible to the behavioral profiler
for two reasons:

1. target_ip (prober's field name for attacker IP) was not in _IP_FIELDS in
   collector/worker.py or correlation/parser.py, so the profiler re-parsed
   raw_lines and got attacker_ip=None, never attributing prober events to
   the attacker profile.

2. sniffer_rollup only handled tcp_syn_fingerprint (passive sniffer) and
   ignored tcpfp_fingerprint (active prober). Prober events use different
   field names: window_size/window_scale/sack_ok vs window/wscale/has_sack.

Changes:
- Add target_ip to _IP_FIELDS in collector and parser
- Add _PROBER_TCPFP_EVENT and _INITIAL_TTL table to behavioral.py
- sniffer_rollup now processes tcpfp_fingerprint: maps field names, derives
  OS from TTL via _os_from_ttl, computes hop_distance = initial_ttl - observed
- Expand prober DEFAULT_TCPFP_PORTS to [22,80,443,8080,8443,445,3389] for
  better SYN-ACK coverage on attacker machines
- Add 4 tests covering prober OS detection, hop distance, and field mapping
2026-04-15 17:36:40 -04:00
a4798946c1 fix: add remote_addr to IP field lookup so http/https/k8s events are attributed correctly
Templates for http, https, k8s, and docker_api log the client IP as
remote_addr (Flask's request.remote_addr) instead of src_ip. The collector
and correlation parser only checked src_ip/src/client_ip/remote_ip/ip, so
every request event from those services was stored with attacker_ip="Unknown"
and never associated with any attacker profile.

Adding remote_addr to _IP_FIELDS in both collector/worker.py and
correlation/parser.py fixes attribution. The profiler cursor was also reset
to 0 so the worker performs a cold rebuild and re-ingests existing events with
the corrected field mapping.
2026-04-15 17:23:33 -04:00
d869eb3d23 docs: document decnet engine orchestrator 2026-04-15 17:13:13 -04:00
89887ec6fd fix: serialize HTTP headers as JSON so tool detection and bounty extraction work
templates/decnet_logging.py calls str(v) on all SD-PARAM values, turning a
headers dict into Python repr ('{'User-Agent': ...}') rather than JSON.
detect_tools_from_headers() called json.loads() on that string and silently
swallowed the error, returning [] for every HTTP event. Same bug prevented
the ingester from extracting User-Agent bounty fingerprints.

- templates/http/server.py: wrap headers dict in json.dumps() before passing
  to syslog_line so the value is a valid JSON string in the syslog record
- behavioral.py: add ast.literal_eval fallback for existing DB rows that were
  stored with the old Python repr format
- ingester.py: parse headers as JSON string in _extract_bounty so User-Agent
  fingerprints are stored correctly going forward
- tests: add test_json_string_headers and test_python_repr_headers_fallback
  to exercise both formats in detect_tools_from_headers
2026-04-15 17:03:52 -04:00
02e73a19d5 fix: promote TCP-fingerprinted nmap to tool_guesses (detects -sC sans HTTP) 2026-04-15 16:44:45 -04:00
b3efd646f6 feat: replace tool attribution stat with dedicated DETECTED TOOLS block 2026-04-15 16:37:54 -04:00
2ec64ef2ef fix: rename BEHAVIOR label to ATTACK PATTERN for clarity 2026-04-15 16:36:19 -04:00
e67624452e feat: centralize microservice logging to DECNET_SYSTEM_LOGS (default: decnet.system.log) 2026-04-15 16:23:28 -04:00
e05b632e56 feat: update AttackerDetail UI for new behavior classes and multi-tool badges 2026-04-15 15:49:03 -04:00
c8f05df4d9 feat: overhaul behavioral profiler — multi-tool detection, improved classification, TTL OS fallback 2026-04-15 15:47:02 -04:00
935a9a58d2 fix: reopen collector log handles after deletion or log rotation
Replaces the single persistent open() with inode-based reopen logic.
If decnet.log or decnet.json is deleted or renamed by logrotate, the
next write detects the stale inode, closes the old handle, and creates
a fresh file — preventing silent data loss to orphaned inodes.
2026-04-15 14:04:54 -04:00
63efe6c7ba fix: persist ingester position and profiler cursor across restarts
- Ingester now loads byte-offset from DB on startup (key: ingest_worker_position)
  and saves it after each batch — prevents full re-read on every API restart
- On file truncation/rotation the saved offset is reset to 0
- Profiler worker now loads last_log_id from DB on startup — every restart
  becomes an incremental update instead of a full cold rebuild
- Updated all affected tests to mock get_state/set_state; added new tests
  covering position restore, set_state call, truncation reset, and cursor
  restore/cold-start paths
2026-04-15 13:58:12 -04:00
314e6c6388 fix: remove event-loop-blocking cold start; unify profiler to cursor-based incremental
Cold start fetched all logs in one bulk query then processed them in a tight
synchronous loop with no yields, blocking the asyncio event loop for seconds
on datasets of 30K+ rows. This stalled every concurrent await — including the
SSE stream generator's initial DB calls — causing the dashboard to show
INITIALIZING SENSORS indefinitely.

Changes:
- Drop _cold_start() and get_all_logs_raw(); uninitialized state now runs the
  same cursor loop as incremental, starting from last_log_id=0
- Yield to the event loop after every _BATCH_SIZE rows (asyncio.sleep(0))
- Add SSE keepalive comment as first yield so the connection flushes before
  any DB work begins
- Add Cache-Control/X-Accel-Buffering headers to StreamingResponse
2026-04-15 13:46:42 -04:00
12aa98a83c fix: migrate TEXT→MEDIUMTEXT for attacker/state columns on MySQL
Existing MySQL databases hit a DataError when the commands/fingerprints
JSON blobs exceed 64 KiB (TEXT limit). _BIG_TEXT emits MEDIUMTEXT only
at CREATE TABLE time; create_all() is a no-op on existing columns.

Add MySQLRepository._migrate_column_types() that queries
information_schema and issues ALTER TABLE … MODIFY COLUMN … MEDIUMTEXT
for the five affected columns (commands, fingerprints, services, deckies,
state.value) whenever they are still TEXT. Called from an overridden
initialize() after _migrate_attackers_table() and before create_all().

Add tests/test_mysql_migration.py covering: ALTER issued for TEXT columns,
no-op for already-MEDIUMTEXT, idempotency, DEFAULT clause correctness,
and initialize() call order.
2026-04-15 12:59:54 -04:00
7dbc71d664 test: add profiler behavioral analysis and RBAC endpoint tests
- test_profiler_behavioral.py: attacker behavior pattern matching tests
- api/test_rbac.py: comprehensive RBAC role separation tests
- api/config/: configuration API endpoint tests (CRUD, reinit, user management)
2026-04-15 12:51:38 -04:00
dae3687089 test: add fingerprinting and TCP analysis tests
- test_sniffer_p0f.py: p0f passive OS fingerprinting tests
- test_sniffer_tcp_fingerprint.py: TCP fingerprinting accuracy tests
- test_sniffer_retransmit.py: retransmission detection and analysis
2026-04-15 12:51:35 -04:00
187194786f test: add MySQL backend integration tests
- test_mysql_backend_live.py: live integration tests for MySQL connections
- test_mysql_histogram_sql.py: dialect-specific histogram query tests
- test_mysql_url_builder.py: MySQL connection string construction
- mysql_spinup.sh: Docker spinup script for local MySQL testing
2026-04-15 12:51:33 -04:00
9de320421e test: add repository factory and CLI db-reset tests
- test_factory.py: verify database factory selects correct backend
- test_cli_db_reset.py: test CLI database reset functionality
2026-04-15 12:51:29 -04:00
dd4e2aad91 test: update existing test suites for refactored codebase
- test_api_attackers.py: update for BaseRepository interface
- test_attacker_worker.py: full test suite for worker logic (formerly in module)
- test_base_repo.py: repository interface conformance tests
- test_cli.py: CLI enhancements (randomize-services, selective deployment)
- test_service_isolation.py: isolation validation tests
- api/conftest.py: fixture updates for RBAC-gated endpoints
- live/test_service_isolation_live.py: live integration tests
2026-04-15 12:51:26 -04:00
7d10b78d50 chore: update templates and development documentation
- templates/sniffer/decnet_logging.py: add logging configuration for sniffer integration
- templates/ssh/decnet_logging.py: add SSH service logging template
- development/DEVELOPMENT.md: document new MySQL backend, p0f, profiler, config API features
- pyproject.toml: update dependencies for MySQL, p0f, profiler functionality
2026-04-15 12:51:22 -04:00
ddfb232590 feat: add behavioral profiler for attacker pattern analysis
- decnet/profiler/: analyze attacker behavior timings, command sequences, service probing patterns
- Enables detection of coordinated attacks vs random scanning
- Feeds into attacker scoring and risk assessment
2026-04-15 12:51:19 -04:00
d7da3a7fc7 feat: add advanced OS fingerprinting via p0f integration
- decnet/sniffer/fingerprint.py: enhance TCP/IP fingerprinting pipeline
- decnet/sniffer/p0f.py: integrate p0f for passive OS classification
- Improves attacker profiling accuracy in honeypot interaction analysis
2026-04-15 12:51:17 -04:00
947efe7bd1 feat: add configuration management API endpoints
- api_get_config.py: retrieve current DECNET config (admin only)
- api_update_config.py: modify deployment settings (admin only)
- api_manage_users.py: user/role management (admin only)
- api_reinit.py: reinitialize database schema (admin only)
- Integrated with centralized RBAC per memory: server-side UI gating
2026-04-15 12:51:14 -04:00
c603531fd2 feat: add MySQL backend support for DECNET database
- Implement MySQLRepository extending BaseRepository
- Add SQLAlchemy/SQLModel ORM abstraction layer (sqlmodel_repo.py)
- Support connection pooling and tuning via DECNET_DB_URL env var
- Cross-compatible with SQLite backend via factory pattern
- Prepared for production deployment with MySQL SIEM/ELK integration
2026-04-15 12:51:11 -04:00
a78126b1ba feat: enhance UI components with config management and RBAC gating
- Add Config.tsx component for admin configuration management
- Update AttackerDetail, DeckyFleet components to use server-side RBAC gating
- Remove client-side role checks per memory: server-side UI gating is mandatory
- Add Config.css for configuration UI styling
2026-04-15 12:51:08 -04:00
0ee23b8700 refactor: enforce RBAC decorators on all API endpoints
- Add @require_role() decorators to all GET/POST/PUT endpoints
- Centralize role-based access control per memory: RBAC null-role bug required server-side gating
- Admin (manage_admins), Editor (write ops), Viewer (read ops), Public endpoints
- Removes client-side role checks as per memory: server-side UI gating is mandatory
2026-04-15 12:51:05 -04:00
0952a0b71e refactor: enhance CLI with improved service registration and deployment
- Refactor deploy command to support service randomization and selective service deployment
- Add --services flag to filter deployed services by name
- Improve status and teardown command output formatting
- Update help text for clarity
2026-04-15 12:50:53 -04:00
4683274021 refactor: remove attacker_worker.py, move logic to test_attacker_worker.py
- Worker logic refactored and tested via test_attacker_worker.py
- No longer needed as standalone module
2026-04-15 12:50:51 -04:00
ab187f70a1 refactor: migrate SQLiteRepository to BaseRepository interface
- Extract dialect-agnostic methods to BaseRepository
- Keep only SQLite-specific SQL and initialization in SQLiteRepository
- Reduces duplication for upcoming MySQL backend
- Maintains 100% backward compatibility
2026-04-15 12:50:44 -04:00
172a002d41 refactor: implement database backend factory for SQLite and MySQL
- Add `get_repository()` factory function to select DB implementation at runtime via DECNET_DB_TYPE env var
- Extract BaseRepository abstract interface from SQLiteRepository
- Update dependencies to use factory-based repository injection
- Add DECNET_DB_TYPE env var support (defaults to sqlite)
- Refactor models and repository base class for cross-dialect compatibility
2026-04-15 12:50:41 -04:00
f6cb90ee66 perf: rate-limit connect/disconnect events in collector to spare ingester
Connection-lifecycle events (connect, disconnect, accept, close) fire once
per TCP connection. During a portscan or credential-stuffing run this
firehoses the SQLite ingester with tiny WAL writes and starves all reads
until the queue drains.

The collector now deduplicates these events by
(attacker_ip, decky, service, event_type) over a 1-second window before
writing to the .json ingestion stream. The raw .log file is untouched, so
rsyslog/SIEM still see every event for forensic fidelity.

Tunable via DECNET_COLLECTOR_RL_WINDOW_SEC and DECNET_COLLECTOR_RL_EVENT_TYPES.
2026-04-15 12:04:04 -04:00
2d65d74069 chore: fix ruff lint errors, bandit suppressions, and pin pip>=26.0
Remove unused imports (ruff F401), suppress B324 false positives on
spec-mandated MD5 in HASSH/JA3/JA3S fingerprinting, drop unused
record_version assignment in JARM parser, and pin pip>=26.0 in dev
deps to address CVE-2025-8869 and CVE-2026-1703.
2026-04-14 17:32:18 -04:00
d5eb60cb41 fix: env leak from live tests caused test_failed_mutation_returns_404 to fail
The live test modules set DECNET_CONTRACT_TEST=true at module level,
which persisted across xdist workers and caused the mutate endpoint
to short-circuit before the mock was reached. Clear the env var in
affected tests with monkeypatch.delenv.
2026-04-14 17:29:02 -04:00
47f2da1d50 test: add live service isolation tests
21 live tests covering all background workers against real resources:
collector (real Docker daemon), ingester (real filesystem + DB),
attacker worker (real DB profiles), sniffer (real network interfaces),
API lifespan (real health endpoint), and cross-service cascade isolation.
2026-04-14 17:24:21 -04:00
53fdeee208 test: add live integration tests for /health endpoint
9 tests covering auth enforcement, component reporting, status
transitions, degraded mode, and real DB/Docker state validation.
Runs with -m live alongside other live service tests.
2026-04-14 17:03:43 -04:00
a2ba7a7f3c feat: add /health endpoint for microservice monitoring
Checks database, background workers (ingestion, collector, attacker,
sniffer), and Docker daemon. Reports healthy/degraded/unhealthy status
with per-component details. Returns 503 when required services fail,
200 for healthy or degraded (only optional services down).
2026-04-14 16:56:20 -04:00
3eab6e8773 test: add service isolation and cascade failure tests
23 tests verifying that each background worker degrades gracefully
when its dependencies are unavailable, and that failures don't cascade:

- Collector: Docker unavailable, no state file, empty fleet
- Ingester: missing log file, unset env var, malformed JSON, fatal DB
- Attacker: DB errors, empty database
- Sniffer: missing interface, no state, scapy crash, non-decky traffic
- API lifespan: all workers failing, DB init failure, sniffer import fail
- Cascade: collector→ingester, ingester→attacker, sniffer→collector, DB→sniffer
2026-04-14 15:07:50 -04:00
5a7ff285cd feat: fleet-wide MACVLAN sniffer microservice
Replace per-decky sniffer containers with a single host-side sniffer
that monitors all traffic on the MACVLAN interface. Runs as a background
task in the FastAPI lifespan alongside the collector, fully fault-isolated
so failures never crash the API.

- Add fleet_singleton flag to BaseService; sniffer marked as singleton
- Composer skips fleet_singleton services in compose generation
- Fleet builder excludes singletons from random service assignment
- Extract TLS fingerprinting engine from templates/sniffer/server.py
  into decnet/sniffer/ package (parameterized for fleet-wide use)
- Sniffer worker maps packets to deckies via IP→name state mapping
- Original templates/sniffer/server.py preserved for future use
2026-04-14 15:02:34 -04:00
1d73957832 feat: collapsible sections in attacker detail view
All info sections (Timeline, Services, Deckies, Commands, Fingerprints)
now have clickable headers with a chevron toggle to expand/collapse
content. Pagination controls in Commands stay clickable without
triggering the collapse. All sections default to open.
2026-04-14 13:42:52 -04:00
c2eceb147d refactor: group fingerprints by type in attacker detail view
Replace flat fingerprint card list with a structured section that
groups fingerprints by type under two categories: Active Probes
(JARM, HASSH, TCP/IP) and Passive Fingerprints (TLS, certificates,
latency, etc.). Each group shows its icon, label, and count.
2026-04-14 13:05:07 -04:00
09d9c0ec74 feat: add JARM, HASSH, and TCP/IP fingerprint rendering to frontend
AttackerDetail: dedicated render components for JARM (hash + target),
HASSHServer (hash, banner, expandable KEX/encryption algorithms), and
TCP/IP stack (TTL, window, MSS as bold stats, DF/SACK/TS as tags,
options order string).

Bounty: add fingerprint field labels and priority keys so prober
bounties display structured rows instead of raw JSON. Add FINGERPRINTS
filter option to the type dropdown.
2026-04-14 13:01:29 -04:00
2dcf47985e feat: add HASSHServer and TCP/IP stack fingerprinting to DECNET-PROBER
Extends the prober with two new active probe types alongside JARM:
- HASSHServer: SSH server fingerprinting via KEX_INIT algorithm ordering
  (MD5 hash of kex;enc_s2c;mac_s2c;comp_s2c, pure stdlib)
- TCP/IP stack: OS/tool fingerprinting via SYN-ACK analysis using scapy
  (TTL, window size, DF bit, MSS, TCP options ordering, SHA256 hash)

Worker probe cycle now runs three phases per IP with independent
per-type port tracking. Ingester extracts bounties for all three
fingerprint types.
2026-04-14 12:53:55 -04:00
5585e4ec58 refactor: prober auto-discovers attackers from log stream
Remove --probe-targets from deploy. The prober now tails the JSON log
file and automatically discovers attacker IPs, JARM-probing each on
common C2 ports (443, 8443, 8080, 4443, 50050, etc.).

- Deploy spawns prober automatically (like collector), no manual targets
- `decnet probe` runs in foreground, --daemon detaches to background
- Worker tracks probed (ip, port) pairs to avoid redundant scans
- Empty JARM hashes (no TLS server) are silently skipped
- 80 prober tests (jarm + worker discovery + bounty extraction)
2026-04-14 12:22:20 -04:00
ce2699455b feat: DECNET-PROBER standalone JARM fingerprinting service
Add active TLS probing via JARM to identify C2 frameworks (Cobalt Strike,
Sliver, Metasploit) by their TLS server implementation quirks. Runs as a
detached host-level process — no container dependency.

- decnet/prober/jarm.py: pure-stdlib JARM implementation (10 crafted probes)
- decnet/prober/worker.py: standalone async worker with RFC 5424 + JSON output
- CLI: `decnet probe --targets ip:port` and `--probe-targets` on deploy
- Ingester: JARM bounty extraction (fingerprint type)
- 68 new tests covering JARM logic and bounty extraction
2026-04-14 12:14:32 -04:00
df3f04c10e revert: undo service badge filter, parser normalization, and SSH relay
Reverts commits 8c249f6, a6c7cfd, 7ff5703. The SSH log relay approach
requires container redeployment and doesn't retroactively fix existing
attacker profiles. Rolling back to reassess the approach.
2026-04-14 02:14:46 -04:00
7ff5703250 feat: SSH log relay emits proper DECNET syslog for sshd events
New log_relay.py replaces raw 'cat' on the rsyslog pipe. Intercepts
sshd and bash lines and re-emits them as structured RFC 5424 events:
login_success, session_opened, disconnect, connection_closed, command.
Parsers updated to accept non-nil PROCID (sshd uses PID).
2026-04-14 02:07:35 -04:00
a6c7cfdf66 fix: normalize SSH bash CMD lines to service=ssh, event_type=command
The SSH honeypot logs commands via PROMPT_COMMAND logger as:
  <14>1 ... bash - - -  CMD uid=0 pwd=/root cmd=ls
These lines had service=bash and event_type=-, so the attacker worker
never recognized them as commands. Both the collector and correlation
parsers now detect the CMD pattern and normalize to service=ssh,
event_type=command, with uid/pwd/command in fields.
2026-04-14 01:54:36 -04:00
7ecb126c8e fix: cap commands endpoint limit to 200
Requests with limit > 200 get a 422, and the frontend responds
accordingly.
2026-04-14 01:46:37 -04:00
f3bb0b31ae feat: paginated commands endpoint for attacker profiles
New GET /attackers/{uuid}/commands?limit=&offset=&service= endpoint
serves commands with server-side pagination and optional service filter.
AttackerDetail frontend fetches commands from this endpoint with
page controls. Service badge filter now drives both the API query
and the local fingerprint filter.
2026-04-14 01:45:19 -04:00
8c249f6987 fix: service badges filter commands/fingerprints locally
Clicking a service badge in the attacker detail view now filters the
commands and fingerprints sections on that page instead of navigating
away. Click again to clear. Header shows filtered/total counts.
2026-04-14 01:38:24 -04:00
24e0d98425 feat: add service filter to attacker profiles
API now accepts ?service=https to filter attackers by targeted service.
Service badges are clickable in both the attacker list and detail views,
navigating to a filtered view. Active filter shows as a dismissable tag.
2026-04-14 01:35:12 -04:00
7756747787 fix: deduplicate sniffer fingerprint events
Same (src_ip, event_type, fingerprint) tuple is now suppressed within a
5-minute window (configurable via DEDUP_TTL env var). Prevents the bounty
vault from filling up with identical JA3/JA4 rows from repeated connections.
2026-04-14 01:24:44 -04:00
e312e072e4 feat: add HTTPS honeypot service template
TLS-wrapped variant of the HTTP honeypot. Auto-generates a self-signed
certificate on startup if none is provided. Supports all the same persona
options (fake_app, server_header, custom_body, etc.) plus TLS_CERT,
TLS_KEY, and TLS_CN configuration.
2026-04-14 00:57:38 -04:00
5631d09aa8 fix: reject empty HELO/EHLO with 501 per RFC 5321
EHLO/HELO require a domain or address-literal argument. Previously
the server accepted bare EHLO with no argument and responded 250,
which deviates from the spec and makes the honeypot easier to
fingerprint.
2026-04-14 00:30:46 -04:00
c2f7622fbb fix: teardown --all now kills collector processes
The collector kept streaming stale container IDs after a redeploy,
causing new service logs to never reach decnet.log. Now _kill_api()
also matches and SIGTERMs any running decnet.cli collect process.
2026-04-14 00:17:57 -04:00
8335c5dc4c fix: remove duplicate print() in _log() across all service templates
Every service's _log() called print() then write_syslog_file() which also
calls print(), causing every log line to appear twice in Docker logs. The
collector streamed both copies, doubling ingested events. Removed the
redundant print() from all 22 service server.py files.
2026-04-14 00:16:18 -04:00
b71db65149 fix: SMTP server handles bare LF line endings and AUTH PLAIN continuation
Two bugs fixed:
- data_received only split on CRLF, so clients sending bare LF (telnet, nc,
  some libraries) got no responses at all. Now splits on LF and strips
  trailing CR, matching real Postfix behavior.
- AUTH PLAIN without inline credentials set state to "await_plain" but no
  handler existed for that state, causing the next line to be dispatched as
  a normal command. Added the missing state handler.
2026-04-13 23:46:50 -04:00
fd62413935 feat: rich fingerprint rendering in attacker detail view
Replace raw JSON dump with typed fingerprint cards:
- JA3/JA4/JA3S/JA4S shown as labeled hash rows with TLS version, SNI, ALPN tags
- JA4L displayed as prominent RTT/TTL metrics
- TLS session resumption mechanisms rendered as colored tags
- Certificate details with subject CN, issuer, validity, SANs, self-signed badge
- HTTP User-Agent and VNC client shown with monospace value display
- Generic fallback for unknown fingerprint types
2026-04-13 23:24:37 -04:00
ea340065c6 feat: JA4/JA4S/JA4L fingerprints, TLS session resumption, certificate extraction
Extend the passive TLS sniffer with next-gen attacker fingerprinting:

- JA4 (ClientHello) and JA4S (ServerHello) computation with
  supported_versions, signature_algorithms, and ALPN parsing
- JA4L latency measurement via TCP SYN→SYN-ACK RTT tracking
- TLS session resumption detection (session tickets, PSK, 0-RTT early data)
- Certificate extraction for TLS ≤1.2 with minimal DER/ASN.1 parser
  (subject CN, issuer, SANs, validity period, self-signed flag)
- Ingester bounty extraction for all new fingerprint types
- 116 tests covering all new functionality (1255 total passing)
2026-04-13 23:20:37 -04:00
a022b4fed6 feat: attacker profiles — UUID model, API routes, list/detail frontend
Migrate Attacker model from IP-based to UUID-based primary key with
auto-migration for old schema. Add GET /attackers (paginated, search,
sort) and GET /attackers/{uuid} API routes. Rewrite Attackers.tsx as
a card grid with full threat info and create AttackerDetail.tsx as a
dedicated detail page with back navigation, stats, commands table,
and fingerprints.
2026-04-13 22:35:13 -04:00
3dc5b509f6 feat: Phase 1 — JA3/JA3S sniffer, Attacker model, profile worker
Add passive TLS fingerprinting via a sniffer container on the MACVLAN
interface, plus the Attacker table and periodic rebuild worker that
correlates per-IP profiles from Log + Bounty + CorrelationEngine.

- templates/sniffer/: Scapy sniffer with pure-Python TLS parser;
  emits tls_client_hello / tls_session RFC 5424 lines with ja3, ja3s,
  sni, alpn, raw_ciphers, raw_extensions; GREASE filtered per RFC 8701
- decnet/services/sniffer.py: service plugin (no ports, NET_RAW/NET_ADMIN)
- decnet/web/db/models.py: Attacker SQLModel table + AttackersResponse
- decnet/web/db/repository.py: 5 new abstract methods
- decnet/web/db/sqlite/repository.py: implement all 5 (upsert, pagination,
  sort by recent/active/traversals, bounty grouping)
- decnet/web/attacker_worker.py: 30s periodic rebuild via CorrelationEngine;
  extracts commands from log fields, merges fingerprint bounties
- decnet/web/api.py: wire attacker_profile_worker into lifespan
- decnet/web/ingester.py: extract JA3 bounty (fingerprint_type=ja3)
- development/DEVELOPMENT.md: full attacker intelligence collection roadmap
- pyproject.toml: scapy>=2.6.1 added to dev deps
- tests: test_sniffer_ja3.py (40+ vectors), test_attacker_worker.py,
  test_base_repo.py / test_web_api.py updated for new surface
2026-04-13 20:22:08 -04:00
c9be447a38 fix: set busy_timeout and WAL pragmas on every async SQLite connection 2026-04-13 19:17:53 -04:00
62db686b42 chore: bump all dev deps to latest versions, suppress schemathesis filter_too_much health check 2026-04-13 19:08:28 -04:00
57d395d6d7 fix: auth redirect, SSE reconnect, stats polling removal, active decky count, schemathesis health check
Some checks failed
CI / Lint (ruff) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 19s
CI / Dependency audit (pip-audit) (push) Failing after 27s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Standard) (3.12) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-13 18:33:32 -04:00
ac094965b5 fix: redirect to login on expired/missing JWT and 401 responses 2026-04-13 08:17:57 -04:00
435c004760 feat: extract HTTP User-Agent and VNC client version as fingerprint bounties
Some checks failed
CI / Lint (ruff) (push) Successful in 11s
CI / SAST (bandit) (push) Successful in 14s
CI / Dependency audit (pip-audit) (push) Successful in 24s
CI / Test (Standard) (3.11) (push) Successful in 2m2s
CI / Test (Standard) (3.12) (push) Successful in 2m5s
CI / Test (Live) (3.11) (push) Successful in 56s
CI / Test (Fuzz) (3.11) (push) Failing after 6m25s
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-13 08:14:38 -04:00
89a2132c61 fix: use semver 0.x.0 schema for auto-tagging
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 14s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Successful in 2m4s
CI / Test (Standard) (3.12) (push) Successful in 2m6s
CI / Test (Live) (3.11) (push) Successful in 57s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
2026-04-13 08:05:32 -04:00
3d01ca2c2a fix: resolve ruff lint errors (unused import, E402 import order)
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 14s
CI / Dependency audit (pip-audit) (push) Successful in 27s
CI / Test (Standard) (3.11) (push) Successful in 2m7s
CI / Test (Standard) (3.12) (push) Successful in 2m8s
CI / Test (Live) (3.11) (push) Successful in 58s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
2026-04-13 07:58:13 -04:00
8124424e96 fix: replace trivy-action with direct install to avoid GitHub credential dependency
Some checks failed
CI / Lint (ruff) (push) Failing after 18s
CI / SAST (bandit) (push) Successful in 18s
CI / Dependency audit (pip-audit) (push) Successful in 27s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Standard) (3.12) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-13 07:56:44 -04:00
a4da9b8f32 feat: embed changelog in release tag message
Some checks failed
CI / Dependency audit (pip-audit) (push) Has been cancelled
CI / Test (Standard) (3.11) (push) Has been cancelled
CI / Test (Standard) (3.12) (push) Has been cancelled
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Lint (ruff) (push) Has been cancelled
CI / SAST (bandit) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-13 07:54:37 -04:00
448cb9cee0 chore: untrack .claude/settings.local.json (already covered by .gitignore)
Some checks failed
CI / Lint (ruff) (push) Has been cancelled
CI / SAST (bandit) (push) Has been cancelled
CI / Dependency audit (pip-audit) (push) Has been cancelled
CI / Test (Standard) (3.11) (push) Has been cancelled
CI / Test (Standard) (3.12) (push) Has been cancelled
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-13 07:45:12 -04:00
035499f255 feat: add component-aware RFC 5424 application logging system
- Modify Rfc5424Formatter to read decnet_component from LogRecord
  and use it as RFC 5424 APP-NAME field (falls back to 'decnet')
- Add get_logger(component) factory in decnet/logging/__init__.py
  with _ComponentFilter that injects decnet_component on each record
- Wire all five layers to their component tag:
    cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester,
    routers), mutator -> 'mutator', collector -> 'collector'
- Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each
  layer per the defined vocabulary; DEBUG calls are suppressed unless
  DECNET_DEVELOPER=true
- Add tests/test_logging.py covering factory, filter, formatter
  component-awareness, fallback behaviour, and level gating
2026-04-13 07:39:01 -04:00
0706919469 modified: gitignore to ignore temporary log files
All checks were successful
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Successful in 16s
CI / Dependency audit (pip-audit) (push) Successful in 26s
CI / Test (Standard) (3.11) (push) Successful in 2m8s
CI / Test (Standard) (3.12) (push) Successful in 2m12s
CI / Test (Live) (3.11) (push) Successful in 58s
CI / Test (Fuzz) (3.11) (push) Successful in 6m45s
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
CI / Merge dev → testing (push) Successful in 11s
2026-04-13 01:44:52 -04:00
f2cc585d72 fix: align tests with model validation and API error reporting 2026-04-13 01:43:52 -04:00
89abb6ecc6 Merge branch 'dev' of https://git.resacachile.cl/anti/DECNET into dev
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 14s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Successful in 1m33s
CI / Test (Standard) (3.12) (push) Successful in 1m35s
CI / Test (Live) (3.11) (push) Successful in 56s
CI / Test (Fuzz) (3.11) (push) Failing after 4m8s
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 08:02:06 -04:00
03f5a7826f Fix: resolved sqlite concurrency errors (table users already exists) by moving DDL to explicit async initialize() and implementing lazy singleton dependency. 2026-04-12 08:01:21 -04:00
a5eaa3291e Fix: resolved sqlite concurrency errors (table users already exists) by moving DDL to explicit async initialize() and implementing lazy singleton dependency.
Some checks failed
CI / SAST (bandit) (push) Successful in 15s
CI / Lint (ruff) (push) Failing after 18s
CI / Dependency audit (pip-audit) (push) Successful in 26s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Standard) (3.12) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 07:59:45 -04:00
b2e4706a14 Refactor: implemented Repository Factory and Async Mutator Engine. Decoupled storage logic and enforced Dependency Injection across CLI and Web API. Updated documentation.
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 13s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Failing after 54s
CI / Test (Standard) (3.12) (push) Successful in 1m35s
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 07:48:17 -04:00
6095d0d2ed ci: solidify promotion dependencies with explicit test list
Some checks failed
CI / Lint (ruff) (push) Successful in 11s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 21s
CI / Test (Standard) (3.11) (push) Successful in 1m9s
CI / Test (Standard) (3.12) (push) Successful in 1m11s
CI / Test (Live) (3.11) (push) Successful in 54s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
2026-04-12 04:24:29 -04:00
04685ba1c4 ci: reorder heavy tests (Live before Fuzz) 2026-04-12 04:22:33 -04:00
2ce3f7ee90 ci: delegate release tagging and versioning to release.yml 2026-04-12 04:21:28 -04:00
cb4bac4b42 ci: segment pytest into standard, fuzz, and live categories
Some checks failed
CI / Lint (ruff) (push) Successful in 11s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Successful in 1m10s
CI / Test (Standard) (3.12) (push) Successful in 1m13s
CI / Test (Live) (3.11) (push) Has been cancelled
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
2026-04-12 04:17:05 -04:00
8d5944f775 ci: implement automated RC flow and finalize optimizations on dev 2026-04-12 04:15:42 -04:00
ea9f7e734b ci: sequential checks, heavy pytest, and skip ci on auto-merge 2026-04-12 03:55:12 -04:00
fe18575a9c modified: pyproject, moved [live] deps to [dev] deps.
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 1m19s
CI / Test (pytest) (3.12) (push) Successful in 1m22s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 21s
CI / Merge dev → testing (push) Successful in 10s
CI / Open PR to main (push) Has been skipped
2026-04-12 03:49:20 -04:00
0f63820ee6 chore: fix unused imports in tests and update development roadmap
Some checks failed
CI / Lint (ruff) (push) Successful in 16s
CI / Test (pytest) (3.11) (push) Failing after 34s
CI / Test (pytest) (3.12) (push) Failing after 36s
CI / SAST (bandit) (push) Successful in 12s
CI / Merge dev → testing (push) Has been cancelled
CI / Open PR to main (push) Has been cancelled
CI / Dependency audit (pip-audit) (push) Has been cancelled
2026-04-12 03:46:23 -04:00
fdc404760f moved: mermaid graph to development folder 2026-04-12 03:42:43 -04:00
95190946e0 moved: AST graphs into develpment/ folder 2026-04-12 03:42:08 -04:00
1692df7360 deleted: trash vscode stuff 2026-04-12 03:41:15 -04:00
aac39e818e Docs: Generated full coverage report in development/COVERAGE.md 2026-04-12 03:36:13 -04:00
ff38d58508 Testing: Stabilized test suite and achieved 93% total coverage.
- Fixed CLI tests by patching local imports at source (psutil, os, Path).
- Fixed Collector tests by globalizing docker.from_env mock.
- Stabilized SSE stream tests via AsyncMock and immediate generator termination to prevent hangs.
- Achieved >80% coverage on CLI (84%), Collector (97%), and DB Repository (100%).
- Implemented SMTP Relay service tests (100%).
2026-04-12 03:30:06 -04:00
f78104e1c8 fix: resolve all ruff lint errors and SQLite UNIQUE constraint issue
Ruff fixes (20 errors → 0):
- F401: Remove unused imports (DeckyConfig, random_hostname, IniConfig,
  COMPOSE_FILE, sys, patch) across cli.py, mutator/engine.py,
  templates/ftp, templates/rdp, test_mysql.py, test_postgres.py
- F541: Remove extraneous f-prefixes on strings with no placeholders
  in templates/imap, test_ftp_live, test_http_live
- E741: Rename ambiguous variable 'l' to descriptive names (line, entry,
  part) across conftest.py, test_ftp_live, test_http_live,
  test_mongodb_live, test_pop3, test_ssh

SQLite fix:
- Change _initialize_sync() admin seeding from SELECT-then-INSERT to
  INSERT OR IGNORE, preventing IntegrityError when admin user already
  exists from a previous run
2026-04-12 02:17:50 -04:00
99be4e64ad ci: rework pipeline to dev → testing → main promotion
- Add merge-to-testing job: after all CI checks pass on dev, auto-merge
  into testing with --no-ff for clear merge history
- Move open-pr job to trigger on testing branch instead of dev
- PR now opens testing → main instead of dev → main
- Add bandit and pip-audit jobs to pr.yml PR gate for full suite coverage
- PR gate test job now installs dev dependencies consistently
2026-04-12 02:11:24 -04:00
c3c1cd2fa6 modified: .gitignore
Some checks failed
CI / Lint (ruff) (push) Failing after 16s
CI / Test (pytest) (3.11) (push) Failing after 47s
CI / Test (pytest) (3.12) (push) Failing after 49s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Open PR to main (push) Has been skipped
2026-04-12 02:03:49 -04:00
68b13b8a59 added: decnet_logging.py stub for telnet monitoring 2026-04-12 02:03:06 -04:00
f8bb134d70 added: fixed mssql service 2026-04-12 02:01:45 -04:00
20fba18711 fix(telnet): disable imklog in rsyslog — containers cannot access /proc/kmsg 2026-04-12 01:45:46 -04:00
b325fc8c5f fix(logging): silence Twisted internal logs and Werkzeug startup banner from stdout 2026-04-12 01:43:42 -04:00
1484d2f625 fix(telnet): use busybox-static for telnetd applet, rm stale fifo on restart 2026-04-12 01:39:31 -04:00
f8ae9ce2a6 refactor(deps): move live test deps to pyproject.toml optional-dependencies[live] 2026-04-12 01:35:16 -04:00
662a5e43e8 feat(tests): add live subprocess integration test suite for services
Spins up each service's server.py in a real subprocess via a free ephemeral
port (PORT env var), connects with real protocol clients, and asserts both
correct protocol behavior and RFC 5424 log output.

- 44 live tests across 10 services: http, ftp, smtp, redis, mqtt,
  mysql, postgres, mongodb, pop3, imap
- Shared conftest.py: _ServiceProcess (bg reader thread + queue),
  free_port, live_service fixture, assert_rfc5424 helper
- PORT env var added to all 10 targeted server.py templates
- New pytest marker `live`; excluded from default addopts run
- requirements-live-tests.txt: flask, twisted + protocol clients
2026-04-12 01:34:16 -04:00
d63e396410 fix(protocols): guard against zero/malformed length fields in binary protocol parsers
MongoDB had the same infinite-loop bug as MSSQL (msg_len=0 → buffer never
shrinks in while loop). Postgres, MySQL, and MQTT had related length-field
issues (stuck state, resource exhaustion, overlong remaining-length).

Also fixes an existing MongoDB _op_reply struct.pack format bug (extra 'q'
specifier caused struct.error on any OP_QUERY response).

Adds 53 regression + protocol boundary tests across MSSQL, MongoDB,
Postgres, MySQL, and MQTT, including a _run_with_timeout threading harness
to catch infinite loops and @pytest.mark.fuzz hypothesis tests for each.
2026-04-12 01:01:13 -04:00
65d585569b fix(telnet): replace Cowrie with real busybox telnetd + rsyslog logging
Cowrie was exposing an SSH daemon on port 22 alongside the telnet service
even when COWRIE_SSH_ENABLED=false, contaminating deployments that did not
request an SSH service.

New implementation mirrors the SSH service pattern:
- busybox telnetd in foreground mode on port 23
- /bin/login for real PAM authentication (brute-force attempts logged)
- rsyslog RFC 5424 bridge piped to stdout for Docker log capture
- Configurable root password and hostname via env vars
- No Cowrie dependency
2026-04-12 00:34:45 -04:00
c384a3103a refactor: separate engine, collector, mutator, and fleet into independent subpackages
- decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed
- decnet/collector/ — Docker log streaming (moved from web/collector.py)
- decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code)
- decnet/fleet.py — shared decky-building logic extracted from cli.py

Cross-contamination eliminated:
- web router no longer imports from decnet.cli
- mutator no longer imports from decnet.cli
- cli no longer imports from decnet.web
- _kill_api() moved to cli (process management, not engine concern)
- _compose_with_retry duplicate removed from mutator
2026-04-12 00:26:22 -04:00
c79f96f321 refactor(ssh): consolidate real_ssh into ssh, remove duplication
real_ssh was a separate service name pointing to the same template and
behaviour as ssh. Merged them: ssh is now the single real-OpenSSH service.

- Rename templates/real_ssh/ → templates/ssh/
- Remove decnet/services/real_ssh.py
- Deaddeck archetype updated: services=["ssh"]
- Merge test_real_ssh.py into test_ssh.py (includes deaddeck + logging tests)
- Drop decnet.services.real_ssh from test_build module list
2026-04-11 19:51:41 -04:00
d77def64c4 fix(cli): import Path locally in deploy to fix NameError 2026-04-11 19:46:58 -04:00
ce182652ad fix(cli): add __main__ guard so python -m decnet.cli actually runs the app
The collector subprocess was spawned via 'python3 -m decnet.cli collect'
but cli.py had no 'if __name__ == __main__: app()' guard. Python executed
the module, defined all functions, then exited cleanly with code 0 without
ever calling the collect command. No output, no log file, exit 0 — silent
non-start every time.

Also route collector stderr to <log_file>.collector.log so future crashes
are visible instead of disappearing into DEVNULL.
2026-04-11 19:42:10 -04:00
a6063efbb9 fix(collector): daemonize background subprocesses with start_new_session
Collector and mutator watcher subprocesses were spawned without
start_new_session=True, leaving them in the parent's process group.
SIGHUP (sent when the controlling terminal closes) killed both
processes silently — stdout/stderr were DEVNULL so the crash was
invisible.

Also update test_services and test_composer to reflect the ssh plugin
no longer using Cowrie env vars (replaced with SSH_ROOT_PASSWORD /
SSH_HOSTNAME matching the real_ssh plugin).
2026-04-11 19:36:46 -04:00
d4ac53c0c9 feat(ssh): replace Cowrie with real OpenSSH + rsyslog logging pipeline
Scraps the Cowrie emulation layer. The real_ssh template now runs a
genuine sshd backed by a three-layer logging stack forwarded to stdout
as RFC 5424 for the DECNET collector:

  auth,authpriv.*  → rsyslogd → named pipe → stdout  (logins/failures)
  user.*           → rsyslogd → named pipe → stdout  (PROMPT_COMMAND cmds)
  sudo syslog=auth → rsyslogd → named pipe → stdout  (privilege escalation)
  sudo logfile     → /var/log/sudo.log               (local backup with I/O)

The ssh.py service plugin now points to templates/real_ssh and drops all
COWRIE_* / NODE_NAME env vars, sharing the same compose fragment shape as
real_ssh.py.
2026-04-11 19:12:54 -04:00
9ca3b4691d docs(roadmap): tick completed service implementations 2026-04-11 04:02:50 -04:00
babad5ce65 refactor(collector): use state file for container detection, drop label heuristics
_load_service_container_names() reads decnet-state.json and builds the
exact set of expected container names ({decky}-{service}). is_service_container()
and is_service_event() do a direct set lookup — no regex, no label
inspection, no heuristics.
2026-04-11 03:58:52 -04:00
7abae5571a fix(collector): fix container detection and auto-start on deploy
Two bugs caused the log file to never be written:

1. is_service_container() used regex '^decky-\d+-\w' which only matched
   the old decky-01-smtp naming style. Actual containers are named
   omega-decky-smtp, relay-decky-smtp, etc. Fixed by using Docker Compose
   labels instead: com.docker.compose.project=decnet + non-empty
   depends_on discriminates service containers from base (sleep infinity)
   containers reliably regardless of decky naming convention.
   Added is_service_event() for the Docker events path.

2. The collector was only started when --api was used. Added a 'collect'
   CLI subcommand (decnet collect --log-file <path>) and wired it into
   deploy as an auto-started background process when --api is not in use.
   Default log path: /var/log/decnet/decnet.log
2026-04-11 03:56:53 -04:00
377ba0410c feat(deploy): add --parallel flag for concurrent image builds
When --parallel is set:
- DOCKER_BUILDKIT=1 is injected into the subprocess environment to
  ensure BuildKit is active regardless of host daemon config
- docker compose build runs first (all images built concurrently)
- docker compose up -d follows without --build (no redundant checks)

Without --parallel the original up --build path is preserved.
--parallel and --no-cache compose correctly (build --no-cache).
2026-04-11 03:46:52 -04:00
5ef48d60be fix(conpot): add syslog bridge entrypoint for logging pipeline
Conpot is a third-party app with its own Python logger — it never calls
decnet_logging. Added entrypoint.py as a subprocess wrapper that:
- Launches conpot and captures its stdout/stderr
- Classifies each line (startup/request/warning/error/log)
- Extracts source IPs via regex
- Emits RFC 5424 syslog lines to stdout for Docker/collector pickup

Entrypoint is self-contained (no import of shared decnet_logging.py)
because the conpot base image runs Python 3.6, which cannot parse the
dict[str, Any] / str | None type syntax used in the canonical file.
2026-04-11 03:44:41 -04:00
fe46b8fc0b fix(conpot): use honeynet/conpot:latest base, run as conpot user
The BASE_IMAGE build arg was being unconditionally overwritten by
composer.py with the decky's distro build_base (debian:bookworm-slim),
turning the conpot container into a bare Debian image with no conpot
installation — hence the silent restart loop.

Two fixes:
1. composer.py: use args.setdefault() so services that pre-declare
   BASE_IMAGE in their compose_fragment() win over the distro default.
2. conpot.py: pre-declare BASE_IMAGE=honeynet/conpot:latest in build
   args so it always uses the upstream image regardless of decky distro.

Also removed the USER decnet switch from the conpot Dockerfile. The
upstream image already runs as the non-root 'conpot' user; switching to
'decnet' broke pkg_resources because conpot's eggs live under
/home/conpot/.local and are only on sys.path for that user.
2026-04-11 03:32:11 -04:00
c7713c6228 feat(imap,pop3): full IMAP4rev1 + POP3 bait mailbox implementation
IMAP: extended to full IMAP4rev1 — 10 bait emails (AWS keys, DB creds,
tokens, VPN config, root pw etc.), LIST/LSUB/STATUS/FETCH/UID FETCH/
SEARCH/CLOSE/NOOP, proper SELECT untagged responses (EXISTS, UIDNEXT,
FLAGS, PERMANENTFLAGS), CAPABILITY with IDLE/LITERAL+/AUTH=PLAIN.
FETCH correctly handles sequence sets (1:*, 1:3, *), item dispatch
(FLAGS, ENVELOPE, BODY[], RFC822, RFC822.SIZE), and places body literals
last per RFC 3501.

POP3: extended with same 10 bait emails, fixed banner env var key
(POP3_BANNER not IMAP_BANNER), CAPA fully populated (TOP/UIDL/USER/
RESP-CODES/SASL), TOP (headers + N body lines), UIDL (msg-N format),
DELE/RSET with _deleted set tracking, NOOP. _active_messages() helper
excludes DELE'd messages from STAT/LIST/UIDL.

Both: DEBT-026 stub added (_EMAIL_SEED_PATH env var, documented in
DEBT.md for next-session JSON seed file wiring).

Tests: test_imap.py expanded to 27 cases, test_pop3.py to 22 cases —
860 total tests passing.
2026-04-11 03:12:32 -04:00
1196363d0b feat(os_fingerprint): Phase 2 — add icmp_ratelimit + icmp_ratemask sysctls
Windows: both 0 (no ICMP rate limiting — matches real Windows behavior)
Linux: 1000ms / mask 6168 (kernel defaults)
BSD: 250ms / mask 6168 (FreeBSD default is faster than Linux)
Embedded/Cisco: both 0 (most firmware doesn't rate-limit ICMP)

These affect nmap's IE and U1 probe groups which measure ICMP error
response timing to closed UDP ports. Windows responds to all probes
instantly while Linux throttles to ~1/sec.

Tests: 10 new cases (5 per sysctl). Suite: 822 passed.
2026-04-10 16:41:23 -04:00
62a67f3d1d docs(HARDENING): rewrite roadmap based on live scan findings
Phase 1 is complete. Live testing revealed:
- Window size (64240) is already correct — Phase 2 window mangling unnecessary
- TI=Z (IP ID = 0) is the single remaining blocker for Windows spoofing
- ip_no_pmtu_disc does NOT fix TI=Z (tested and confirmed)

Revised phase plan:
- Phase 2: ICMP tuning (icmp_ratelimit + icmp_ratemask sysctls)
- Phase 3: NFQUEUE daemon for IP ID rewriting (fixes TI=Z)
- Phase 4: diminishing returns, not recommended

Added detailed NFQUEUE architecture, TCPOPTSTRIP notes, and
note clarifying P= field in nmap output.
2026-04-10 16:38:27 -04:00
6df2c9ccbf revert(os_fingerprint): undo ip_no_pmtu_disc=1 for windows — was incorrect
ip_no_pmtu_disc controls PMTU discovery for UDP/ICMP paths only.
TI=Z originates from ip_select_ident() in the kernel TCP stack setting
IP ID=0 for DF=1 TCP packets — a namespace-scoped sysctl cannot change this.
The previous commit was based on incorrect root-cause analysis.
2026-04-10 16:29:44 -04:00
b1f6c3b84a fix(os_fingerprint): set ip_no_pmtu_disc=1 for windows to eliminate TI=Z
When ip_no_pmtu_disc=0 the Linux kernel sets DF=1 on TCP packets and uses
IP ID=0 (RFC 6864). nmap's TI=Z fingerprint has no Windows match in its DB,
causing 91% confidence guesses of 'Linux 2.4/2.6 embedded' regardless of
TTL being 128. Setting ip_no_pmtu_disc=1 allows non-zero IP ID generation.

Trade-off: DF bit is not set on outgoing packets (slightly wrong for Windows)
but TI=Z is far more damaging to the spoof than losing DF accuracy.
2026-04-10 16:19:32 -04:00
5fdfe67f2f fix(cowrie): add missing COPY+chmod for entrypoint.sh in Dockerfile
The entrypoint.sh was present in the build context but never COPYed into
the image, causing 'stat /entrypoint.sh: no such file or directory' at
container start. Added COPY+chmod before the USER decnet instruction so
the script is installed as root and is executable by all users.
2026-04-10 16:15:05 -04:00
4fac9570ec chore: add arche-test.ini OS fingerprint smoke-test fleet 2026-04-10 16:11:18 -04:00
5e83c9e48d feat(os_fingerprint): Phase 1 — extend OS sysctls with 6 new fingerprint knobs
Add tcp_timestamps, tcp_window_scaling, tcp_sack, tcp_ecn, ip_no_pmtu_disc,
and tcp_fin_timeout to every OS profile in OS_SYSCTLS.

All 6 are network-namespace-scoped and safe to set per-container without
--privileged. They directly influence nmap's OPS, WIN, ECN, and T2-T6
probe groups, making OS family detection significantly more convincing.

Key changes:
- tcp_timestamps=0 for windows/embedded/cisco (strongest Windows discriminator)
- tcp_ecn=2 for linux (ECN offer), 0 for all others
- tcp_sack=0 / tcp_window_scaling=0 for embedded/cisco
- ip_no_pmtu_disc=1 for embedded/cisco (DF bit ICMP behaviour)
- Expose _REQUIRED_SYSCTLS frozenset for completeness assertions

Tests: 88 new test cases across all OS families and composer integration.
Total suite: 812 passed.
2026-04-10 16:06:36 -04:00
d8457c57f3 docs: add OS fingerprint spoofing hardening roadmap 2026-04-10 16:02:00 -04:00
38d37f862b docs: Detail attachable Swarm overlay backend in FUTURE.md 2026-04-10 03:00:03 -04:00
fa8b0f3cb5 docs: Add latency simulation to FUTURE.md 2026-04-10 02:53:00 -04:00
db425df6f2 docs: Add FUTURE.md to capture long-term architectural visions 2026-04-10 02:48:28 -04:00
73e68388c0 fix(conpot): Refactor permissions to use dedicated decnet user via chown 2026-04-10 02:27:02 -04:00
682322d564 fix(conpot): Resolve silent crash by running as nobody and ensuring permissions 2026-04-10 02:25:45 -04:00
33885a2eec fix(conpot): Keep container as root to allow port 502 binding and fix user not found error 2026-04-10 02:20:46 -04:00
f583b3d699 fix(services): Resolve protocol realism gaps and update technical debt register
- Add dynamic challenge nonces to Postgres, VNC, and SIP.
- Add basic keyspace lookup and mock data to Redis.
- Correct MSSQL TDS pre-login offset bounds.
- Support MongoDB OP_MSG handshake version checking.
- Suppress Werkzeug HTTP server headers and normalize FTPAnonymousShell response.
- Add tracking for Dynamic Bait Store (DEBT-027) via DEBT.md.
2026-04-10 02:16:42 -04:00
5cb6666d7b docs: Append bug ledger implementation plan to REALISM_AUDIT.md 2026-04-10 01:58:23 -04:00
25b6425496 Update REALISM_AUDIT.md with completed tasks 2026-04-10 01:55:14 -04:00
08242a4d84 Implement ICS/SCADA and IMAP Bait features 2026-04-10 01:50:08 -04:00
63fb477e1f feat: add smtp_relay service; add service_testing/ init
- decnet/services/smtp_relay.py: open relay variant of smtp, same template
  with SMTP_OPEN_RELAY=1 baked into the environment
- tests/service_testing/__init__.py: init so pytest discovers the subdirectory
2026-04-10 01:09:15 -04:00
94f82c9089 feat(smtp): fix DATA state machine; add SMTP_OPEN_RELAY mode
- Buffer DATA body until CRLF.CRLF terminator — fixes 502-on-every-body-line bug
- SMTP_OPEN_RELAY=1: AUTH accepted (235), RCPT TO accepted for any domain,
  full DATA pipeline with queued-as message ID
- Default (SMTP_OPEN_RELAY=0): credential harvester — AUTH rejected (535)
  but connection stays open, RCPT TO returns 554 relay denied
- SASL PLAIN and LOGIN multi-step AUTH both decoded and logged
- RSET clears all per-transaction state
- Add development/SMTP_RELAY.md, IMAP_BAIT.md, ICS_SCADA.md, BUG_FIXES.md
  (live-tested service realism plans)
2026-04-10 01:03:47 -04:00
40cd582253 fix: restore forward_syslog as no-op stub; all service server.py files import it 2026-04-10 00:43:50 -04:00
24f02c3466 fix: resolve all bandit SAST findings in templates/
- Add # nosec B104 to all intentional 0.0.0.0 binds in honeypot servers
  (hardcoded_bind_all_interfaces is by design — deckies must accept attacker connections)
- Add # nosec B101 to assert statements used for protocol validation in ldap/snmp
- Add # nosec B105 to fake SASL placeholder in ldap
- Add # nosec B108 to /tmp usage in smb template
- Exclude root-owned auto-generated decnet_logging.py copies from bandit scan
  via pyproject.toml [tool.bandit] config (synced by _sync_logging_helper at deploy)
2026-04-10 00:24:40 -04:00
25ba3fb56a feat: replace bind-mount log pipeline with Docker log streaming
Services now print RFC 5424 to stdout; Docker captures via json-file driver.
A new host-side collector (decnet.web.collector) streams docker logs from all
running decky service containers and writes RFC 5424 + parsed JSON to the host
log file. The existing ingester continues to tail the .json file unchanged.
rsyslog can consume the .log file independently — no DECNET involvement needed.

Removes: bind-mount volume injection, _LOG_NETWORK bridge, log_target config
field and --log-target CLI flag, TCP syslog forwarding from service templates.
2026-04-10 00:14:14 -04:00
8d023147cc fix: chmod 777 log dir on compose generation so container decnet user can write logs 2026-04-09 19:36:53 -04:00
14f7a535db fix: use model_dump(mode='json') to serialize datetime fields; fixes SSE stream silently dying post-ORM migration 2026-04-09 19:29:27 -04:00
cea6279a08 fix: add Last-Event-ID to CORS allow_headers to unblock SSE reconnects 2026-04-09 19:26:24 -04:00
6b8392102e fix: emit stats/histogram snapshot on SSE connect; remove polling api.get('/stats') from Dashboard 2026-04-09 19:23:24 -04:00
d2a569496d fix: add get_stream_user dependency for SSE endpoint; allow query-string token for EventSource 2026-04-09 19:20:38 -04:00
f20e86826d fix: derive default CORS origin from DECNET_WEB_HOST/PORT instead of hardcoded ports 2026-04-09 19:15:45 -04:00
29da2a75b3 fix: add localhost:9090 to CORS defaults; revert broken relative-URL and proxy changes 2026-04-09 19:14:40 -04:00
3362325479 fix: resolve CORS blocking Vite dev server (add 5173 to defaults, add proxy) 2026-04-09 19:10:10 -04:00
34a57d6f09 fix: make setcap resilient — no-op when Python absent or symlink-only 2026-04-09 19:04:52 -04:00
016115a523 fix: clear all addressable technical debt (DEBT-005 through DEBT-025)
Security:
- DEBT-008: remove query-string token auth; header-only Bearer now enforced
- DEBT-013: add regex constraint ^[a-z0-9\-]{1,64}$ on decky_name path param
- DEBT-015: stop leaking raw exception detail to API clients; log server-side
- DEBT-016: validate search (max_length=512) and datetime params with regex

Reliability:
- DEBT-014: wrap SSE event_generator in try/except; yield error frame on failure
- DEBT-017: emit log.warning/error on DB init retry; silent failures now visible

Observability / Docs:
- DEBT-020: add 401/422 response declarations to all route decorators

Infrastructure:
- DEBT-018: add HEALTHCHECK to all 24 template Dockerfiles
- DEBT-019: add USER decnet + setcap cap_net_bind_service to all 24 Dockerfiles
- DEBT-024: bump Redis template version 7.0.12 → 7.2.7

Config:
- DEBT-012: validate DECNET_API_PORT and DECNET_WEB_PORT range (1-65535)

Code quality:
- DEBT-010: delete 22 duplicate decnet_logging.py copies; deployer injects canonical
- DEBT-022: closed as false positive (print only in module docstring)
- DEBT-009: closed as false positive (templates already use structured syslog_line)

Build:
- DEBT-025: generate requirements.lock via pip freeze

Testing:
- DEBT-005/006/007: comprehensive test suite added across tests/api/
- conftest: in-memory SQLite + StaticPool + monkeypatched session_factory
- fuzz mark added; default run excludes fuzz; -n logical parallelism

DEBT.md updated: 23/25 items closed; DEBT-011 (Alembic) and DEBT-023 (digest pinning) remain
2026-04-09 19:02:51 -04:00
0166d0d559 fix: clean up db layer — model_dump, timezone-aware timestamps, unified histogram, async load_state 2026-04-09 18:46:35 -04:00
dbf6d13b95 fix: use :memory: + StaticPool for test DBs, eliminates file:testdb_* garbage 2026-04-09 18:39:36 -04:00
d15c106b44 test: fix async fixture isolation, add fuzz marks, parallelize with xdist
- Rebuild repo.engine and repo.session_factory per-test using unique
  in-memory SQLite URIs — fixes KeyError: 'access_token' caused by
  stale session_factory pointing at production DB
- Add @pytest.mark.fuzz to all Hypothesis and Schemathesis tests;
  default run excludes them (addopts = -m 'not fuzz')
- Add missing fuzz tests to bounty, fleet, histogram, and repository
- Use tmp_path for state file in patch_state_file/mock_state_file to
  eliminate file-path race conditions under xdist parallelism
- Set default addopts: -v -q -x -n logical (26 tests in ~7s)
2026-04-09 18:32:46 -04:00
6fc1a2a3ea test: refactor suite to use AsyncClient, in-memory DBs, and parallel coverage 2026-04-09 16:43:49 -04:00
de84cc664f refactor: migrate database to SQLModel and implement modular DB structure 2026-04-09 16:43:30 -04:00
1541b4b7e0 docs: close DEBT-002 as by-design 2026-04-09 13:25:40 -04:00
2b7d872ab7 fix: revert DECNET_ADMIN_PASSWORD to default 'admin'; first-login change enforces security 2026-04-09 13:25:29 -04:00
4ae6f4f23d test: expand coverage 64%→76%; add BUGS.md for Gemini migration issues 2026-04-09 12:55:52 -04:00
310c2a1fbe feat: add pytest-asyncio, freezegun, schemathesis, pytest-cov to test toolchain 2026-04-09 12:40:59 -04:00
44de453bb2 refactor: modularize API tests to match router structure 2026-04-09 12:32:31 -04:00
ec66e01f55 fix: add missing __init__.py to tests/api subpackages to fix relative imports 2026-04-09 12:24:09 -04:00
a22f996027 docs: mark DEBT-001–004 as resolved in DEBT.md 2026-04-09 12:14:16 -04:00
b6b046c90b fix: harden startup security — require strong secrets, restrict CORS
- decnet/env.py: DECNET_JWT_SECRET and DECNET_ADMIN_PASSWORD are now
  required env vars; startup raises ValueError if unset or set to a
  known-bad default ("admin", "password", etc.)
- decnet/env.py: add DECNET_CORS_ORIGINS (comma-separated, defaults to
  http://localhost:8080) replacing the previous allow_origins=["*"]
- decnet/web/api.py: use DECNET_CORS_ORIGINS and tighten allow_methods
  and allow_headers to explicit lists
- tests/conftest.py: set required env vars at module level so test
  collection works without real credentials
- tests/test_web_api.py, test_web_api_fuzz.py: use DECNET_ADMIN_PASSWORD
  from env instead of hardcoded "admin"

Closes DEBT-001, DEBT-002, DEBT-004
2026-04-09 12:13:22 -04:00
29a2cf2738 refactor: modularize API routes into separate files and clean up dependencies 2026-04-09 11:58:57 -04:00
551664bc43 fix: stabilize test suite by ensuring proper test DB isolation and initialization 2026-04-09 02:31:14 -04:00
a2d07bd67c fix: refactor Bounty UI to match dashboard style and fix layout 2026-04-09 02:00:49 -04:00
a3b92d4dd6 docs: tag API endpoints for better organization 2026-04-09 01:58:54 -04:00
30edf9a55d feat: add DECNET_DEVELOPER toggle for API documentation 2026-04-09 01:55:31 -04:00
69626d705d feat: implement Bounty Vault for captured credentials and artifacts 2026-04-09 01:52:50 -04:00
0f86f883fe fix: resolve remaining bandit warnings and stabilize lifespan 2026-04-09 01:35:08 -04:00
13f3d15a36 fix: stabilize tests with synchronous DB init and handle Bandit security findings 2026-04-09 01:33:15 -04:00
8c7ec2953e fix: handle bcrypt 72-byte limit and increase JWT secret length 2026-04-09 01:11:32 -04:00
0123e1c69e fix: suppress noisy cleanup warnings in pytest and fix fleet test auth 2026-04-09 01:05:34 -04:00
9dc6ff3887 ui: ensure inputs and buttons inherit Ubuntu Mono font 2026-04-08 21:31:44 -04:00
fe25798425 ui: change main dashboard font to Ubuntu Mono 2026-04-08 21:30:30 -04:00
6c2478ede3 fix: restore missing API endpoints, fix chart rendering, and update date filter formatting 2026-04-08 21:25:59 -04:00
532a4e2dc5 fix: resolve SSE CORS issues and fix date filter format mismatch 2026-04-08 21:15:26 -04:00
ec503b9ec6 feat: implement advanced live logs with KQL search, histogram, and live/historical modes 2026-04-08 21:01:05 -04:00
fe6b349e5e modified: ci.yml, fucked up last time lol
Some checks failed
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 1m42s
CI / Test (pytest) (3.12) (push) Successful in 1m45s
CI / SAST (bandit) (push) Failing after 12s
CI / Dependency audit (pip-audit) (push) Successful in 20s
CI / Open PR to main (push) Has been skipped
2026-04-08 15:53:49 -04:00
65b220fdbe modified: ci.yml, pyproject: added missing installs and modified pip install command
Some checks failed
CI / Lint (ruff) (push) Successful in 16s
CI / Test (pytest) (3.11) (push) Failing after 20s
CI / Test (pytest) (3.12) (push) Failing after 20s
CI / SAST (bandit) (push) Failing after 11s
CI / Dependency audit (pip-audit) (push) Successful in 19s
CI / Open PR to main (push) Has been skipped
2026-04-08 15:50:17 -04:00
6f10e7556f chore: deleted trash
Some checks failed
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Failing after 18s
CI / Test (pytest) (3.12) (push) Failing after 18s
CI / SAST (bandit) (push) Failing after 11s
CI / Dependency audit (pip-audit) (push) Successful in 18s
CI / Open PR to main (push) Has been skipped
2026-04-08 02:07:11 -04:00
fc99375c62 feat: add systemd service templates for API and Web Dashboard
Some checks failed
CI / Lint (ruff) (push) Successful in 15s
CI / Test (pytest) (3.11) (push) Failing after 21s
CI / Test (pytest) (3.12) (push) Failing after 22s
CI / SAST (bandit) (push) Failing after 13s
CI / Dependency audit (pip-audit) (push) Successful in 19s
CI / Open PR to main (push) Has been skipped
2026-04-08 01:48:05 -04:00
6bdb5922fa fix: ensure shared log volume mount by default and disable container-side rotation 2026-04-08 01:42:05 -04:00
32b06afef6 feat: add .env based configuration for API, Web, and Auth options 2026-04-08 01:27:11 -04:00
31e0c5151b fix: ensure API-deployed deckies inherit the correct log ingestion path 2026-04-08 01:09:48 -04:00
cc3d434c02 feat: add server-side validation for web-based INI deployments 2026-04-08 01:04:59 -04:00
1b5d366b38 ui: add file upload support to web-based INI deployment 2026-04-08 00:59:53 -04:00
168ecf14ab feat: add API-only mode and web-based INI deployment 2026-04-08 00:56:25 -04:00
db9a2699b9 ui: fix dashboard overflow and overlap with sidebar 2026-04-08 00:44:33 -04:00
d139729fa2 docs: revert incorrect roadmap ticks 2026-04-08 00:38:03 -04:00
dd363629ab docs: update roadmap items in DEVELOPMENT.md 2026-04-08 00:35:43 -04:00
c544964f57 feat: migrate dashboard live logs to Server-Sent Events (SSE) 2026-04-08 00:30:31 -04:00
6e19848723 ui: improve mutation feedback and increase timeout for long-running docker ops 2026-04-08 00:22:23 -04:00
e24da92e0f fix: increase timeout for mutate API call to handle slow docker ops 2026-04-08 00:21:16 -04:00
47f0e6da8f fix: correctly iterate over all deckies in _build_deckies_from_ini 2026-04-08 00:19:42 -04:00
18de381a43 feat: implement dynamic decky mutation and fix dot-separated INI sections 2026-04-08 00:16:57 -04:00
1f5c6604d6 feat: integrate API lifecycle with teardown and update dependencies 2026-04-07 23:30:08 -04:00
a9c7ddec2b fix: enforce absolute paths for state and database files 2026-04-07 23:21:16 -04:00
eb4be44c9a feat: add dedicated Decoy Fleet inventory page and API 2026-04-07 23:15:20 -04:00
1a2ad27eca test: add comprehensive property-based fuzzing for all API endpoints 2026-04-07 20:14:53 -04:00
b1f09b9c6a chore: move development docs to development/ and clean up project root 2026-04-07 20:07:56 -04:00
3656a89d60 docs: add comprehensive EVENTS.md detailing all service log events 2026-04-07 20:02:54 -04:00
ba2faba5d5 chore: enforce strict typing and internal naming conventions across web components 2026-04-07 19:56:15 -04:00
950280a97b feat: render structured syslog tags and msg in Dashboard 2026-04-07 15:56:45 -04:00
7bc8d75242 feat: parse RFC 5424 fields and msg directly in backend 2026-04-07 15:56:01 -04:00
5f637b5272 feat: switch to JSON-based log ingestion for higher reliability 2026-04-07 15:47:29 -04:00
6ed92d080f fix: invoke uvicorn via sys.executable to handle sudo PATH restrictions 2026-04-07 15:39:32 -04:00
1b593920cd feat: add --api flag to deploy and new web command for dashboard 2026-04-07 15:32:04 -04:00
bad90dfb75 feat: implement background log ingestion from local file 2026-04-07 15:30:44 -04:00
05e71f6d2e feat: frontend support for mandatory password change and react-router integration 2026-04-07 15:16:11 -04:00
52c26a2891 feat: backend support for mandatory password change on first login 2026-04-07 15:15:03 -04:00
81135cb861 fix: switch to direct bcrypt usage for Python 3.14 compatibility 2026-04-07 15:07:46 -04:00
50e53120df feat: initialize React frontend with minimalistic Matrix theme 2026-04-07 15:05:06 -04:00
697929a127 feat: implement Stats endpoints for web dashboard 2026-04-07 14:58:09 -04:00
b46934db46 feat: implement Logs endpoints for web dashboard 2026-04-07 14:56:25 -04:00
5b990743db feat: implement Auth endpoints for web dashboard 2026-04-07 14:54:36 -04:00
fbb16a960c feat: add web dashboard dependencies to support real-time monitoring 2026-04-07 14:51:37 -04:00
c32ad82d0a Modified README: added more examples to the config.ini section and modified instructions for quick setup. 2026-04-06 11:28:29 -04:00
850a6f2ad7 Finished: CI/CD pipeline. 2026-04-06 11:18:10 -04:00
d344e4c8bb revert f8a9f8fc64
revert Added: modified notes. Finished CI/CD pipeline.
2026-04-06 17:17:31 +02:00
f8a9f8fc64 Added: modified notes. Finished CI/CD pipeline.
All checks were successful
PR Gate / Lint (ruff) (pull_request) Successful in 18s
PR Gate / Test (pytest) (3.11) (pull_request) Successful in 20s
PR Gate / Test (pytest) (3.12) (pull_request) Successful in 22s
2026-04-06 11:10:56 -04:00
a428410c8e Modified README.md: added AI disclosure 2026-04-06 11:09:44 -04:00
e5a6c2d9a7 Skip CI on markdown-only changes
All checks were successful
CI / Lint (ruff) (push) Successful in 16s
CI / Test (pytest) (3.11) (push) Successful in 19s
CI / Test (pytest) (3.12) (push) Successful in 20s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 19s
CI / Open PR to main (push) Successful in 6s
PR Gate / Lint (ruff) (pull_request) Successful in 11s
PR Gate / Test (pytest) (3.11) (pull_request) Successful in 18s
PR Gate / Test (pytest) (3.12) (pull_request) Successful in 20s
2026-04-04 23:07:40 -04:00
ea409650fa Trigger CI: token now has repo:write permission
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 18s
CI / Test (pytest) (3.12) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 11s
CI / Dependency audit (pip-audit) (push) Successful in 19s
CI / Open PR to main (push) Successful in 5s
PR Gate / Lint (ruff) (pull_request) Successful in 10s
PR Gate / Test (pytest) (3.11) (pull_request) Successful in 17s
PR Gate / Test (pytest) (3.12) (pull_request) Successful in 18s
2026-04-04 17:54:37 -03:00
d92aa99b81 Add DEVELOPMENT.md for CI/CD pipeline test
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 17s
CI / Test (pytest) (3.12) (push) Successful in 19s
CI / SAST (bandit) (push) Successful in 11s
CI / Dependency audit (pip-audit) (push) Successful in 19s
CI / Open PR to main (push) Successful in 4s
2026-04-04 17:51:51 -03:00
fc7fca998f Add API response logging to open-pr step for debugging
All checks were successful
CI / Lint (ruff) (push) Successful in 10s
CI / Test (pytest) (3.11) (push) Successful in 19s
CI / Test (pytest) (3.12) (push) Successful in 18s
CI / SAST (bandit) (push) Successful in 11s
CI / Dependency audit (pip-audit) (push) Successful in 18s
CI / Open PR to main (push) Successful in 3s
2026-04-04 17:47:43 -03:00
ed749a8c31 Merge security jobs into CI workflow so open-pr needs all checks
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 18s
CI / Test (pytest) (3.12) (push) Successful in 19s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 18s
CI / Open PR to main (push) Successful in 4s
2026-04-04 17:43:55 -03:00
cf36ebcd84 Auto-open PR to main when CI passes on dev
Some checks failed
CI / Lint (ruff) (push) Successful in 13s
CI / Test (pytest) (3.11) (push) Successful in 19s
CI / Test (pytest) (3.12) (push) Successful in 20s
Security / SAST (bandit) (push) Successful in 12s
CI / Open PR to main (push) Has been cancelled
Security / Dependency audit (pip-audit) (push) Successful in 18s
2026-04-04 17:42:42 -03:00
6a5c6f098e Remove accidentally committed artifacts and update .gitignore
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 18s
CI / Test (pytest) (3.12) (push) Successful in 18s
Security / SAST (bandit) (push) Successful in 12s
Security / Dependency audit (pip-audit) (push) Successful in 19s
2026-04-04 17:36:35 -03:00
988732f4f9 Fix all ruff lint errors across decnet/, templates/, and tests/
Some checks failed
CI / Test (pytest) (3.11) (push) Has been cancelled
CI / Test (pytest) (3.12) (push) Has been cancelled
Security / SAST (bandit) (push) Has been cancelled
Security / Dependency audit (pip-audit) (push) Has been cancelled
CI / Lint (ruff) (push) Has been cancelled
2026-04-04 17:36:16 -03:00
4acfa3f779 Fix pip-audit skipping local editable package
Some checks failed
CI / Lint (ruff) (push) Failing after 10s
CI / Test (pytest) (3.11) (push) Successful in 18s
CI / Test (pytest) (3.12) (push) Successful in 19s
Security / SAST (bandit) (push) Successful in 11s
Security / Dependency audit (pip-audit) (push) Successful in 18s
2026-04-04 17:31:16 -03:00
35c67ec34d Fix registry auto-discovery skipping non-service subclasses (CustomService)
Some checks failed
CI / Lint (ruff) (push) Failing after 11s
CI / Test (pytest) (3.11) (push) Successful in 19s
CI / Test (pytest) (3.12) (push) Successful in 18s
Security / SAST (bandit) (push) Successful in 11s
Security / Dependency audit (pip-audit) (push) Successful in 18s
2026-04-04 17:29:30 -03:00
fe7354554f Add bandit, pip-audit and trivy to CI/CD security pipeline
Some checks failed
CI / Lint (ruff) (push) Failing after 10s
CI / Test (pytest) (3.11) (push) Failing after 39s
CI / Test (pytest) (3.12) (push) Failing after 1m4s
Security / SAST (bandit) (push) Successful in 11s
Security / Dependency audit (pip-audit) (push) Successful in 18s
2026-04-04 17:24:43 -03:00
b3b3597011 Add smoke test: verify all modules import cleanly
Some checks failed
CI / Lint (ruff) (push) Failing after 4s
CI / Test (pytest) (3.11) (push) Failing after 3s
CI / Test (pytest) (3.12) (push) Failing after 4s
2026-04-04 17:18:21 -03:00
38b1efa8c0 Add Gitea Actions CI/CD workflows and ruff dependency
Some checks failed
CI / Test (pytest) (3.11) (push) Failing after 3s
CI / Test (pytest) (3.12) (push) Failing after 3s
CI / Lint (ruff) (push) Failing after 1m49s
2026-04-04 17:16:45 -03:00
d7a6aeff86 dev: add pytest as core dependency 2026-04-04 16:29:17 -03:00
634 changed files with 70475 additions and 5569 deletions

View File

@@ -1,7 +0,0 @@
{
"permissions": {
"allow": [
"mcp__plugin_context-mode_context-mode__ctx_batch_execute"
]
}
}

12
.env.example Normal file
View File

@@ -0,0 +1,12 @@
# API Options
DECNET_API_HOST=0.0.0.0
DECNET_API_PORT=8000
DECNET_JWT_SECRET=supersecretkey12345678901234567
DECNET_INGEST_LOG_FILE=/var/log/decnet/decnet.log
# Web Dashboard Options
DECNET_WEB_HOST=0.0.0.0
DECNET_WEB_PORT=8080
DECNET_ADMIN_USER=admin
DECNET_ADMIN_PASSWORD=admin
DECNET_DEVELOPER=False

175
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,175 @@
name: CI
on:
push:
branches: [dev, testing, "temp/merge-*"]
paths-ignore:
- "**/*.md"
- "docs/**"
jobs:
lint:
name: Lint (ruff)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install ruff
- run: ruff check .
bandit:
name: SAST (bandit)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install bandit
- run: bandit -r decnet/ -ll -x decnet/services/registry.py -x decnet/templates/
pip-audit:
name: Dependency audit (pip-audit)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install pip-audit
- run: pip install -e .[dev]
- run: pip-audit --skip-editable --ignore-vuln CVE-2025-65896
test-standard:
name: Test (Standard)
runs-on: ubuntu-latest
needs: [lint, bandit, pip-audit]
strategy:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -e .[dev]
- run: pytest
test-live:
name: Test (Live)
runs-on: ubuntu-latest
needs: [test-standard]
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: decnet_test
ports:
- 3307:3306
options: >-
--health-cmd="mysqladmin ping -h 127.0.0.1"
--health-interval=10s
--health-timeout=5s
--health-retries=5
strategy:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -e .[dev]
- run: pytest -m live
env:
DECNET_MYSQL_HOST: 127.0.0.1
DECNET_MYSQL_PORT: 3307
DECNET_MYSQL_USER: root
DECNET_MYSQL_PASSWORD: root
DECNET_MYSQL_DATABASE: decnet_test
test-fuzz:
name: Test (Fuzz)
runs-on: ubuntu-latest
needs: [test-live]
strategy:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -e .[dev]
- run: pytest -m fuzz
env:
SCHEMATHESIS_CONFIG: schemathesis.ci.toml
merge-to-testing:
name: Merge dev → testing
runs-on: ubuntu-latest
needs: [test-standard, test-live, test-fuzz]
if: github.ref == 'refs/heads/dev'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.DECNET_PR_TOKEN }}
- name: Configure git
run: |
git config user.name "DECNET CI"
git config user.email "ci@decnet.local"
- name: Merge dev into testing
run: |
git fetch origin testing
git checkout testing
git merge origin/dev --no-ff -m "ci: auto-merge dev → testing [skip ci]"
git push origin testing
prepare-merge-to-main:
name: Prepare Merge to Main
runs-on: ubuntu-latest
needs: [test-standard, test-live, test-fuzz]
if: github.ref == 'refs/heads/testing'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.DECNET_PR_TOKEN }}
- name: Configure git
run: |
git config user.name "DECNET CI"
git config user.email "ci@decnet.local"
- name: Create temp branch and sync with main
run: |
git fetch origin main
git checkout -b temp/merge-testing-to-main
echo "--- Switched to temp branch, merging main into it ---"
git merge origin/main --no-edit || { echo "CONFLICT: Manual resolution required"; exit 1; }
git push origin temp/merge-testing-to-main --force
finalize-merge-to-main:
name: Finalize Merge to Main
runs-on: ubuntu-latest
needs: [test-standard, test-live, test-fuzz]
if: startsWith(github.ref, 'refs/heads/temp/merge-')
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.DECNET_PR_TOKEN }}
- name: Configure git
run: |
git config user.name "DECNET CI"
git config user.email "ci@decnet.local"
- name: Merge RC into main
run: |
git fetch origin main
git checkout main
git merge ${{ github.ref }} --no-ff -m "ci: auto-merge testing → main"
git push origin main
echo "--- Cleaning up temp branch ---"
git push origin --delete ${{ github.ref_name }}

57
.gitea/workflows/pr.yml Normal file
View File

@@ -0,0 +1,57 @@
name: PR Gate
on:
pull_request:
branches: [main]
paths-ignore:
- "**/*.md"
- "docs/**"
jobs:
lint:
name: Lint (ruff)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install ruff
- run: ruff check .
test:
name: Test (pytest)
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -e .[dev]
- run: pytest tests/ -v --tb=short
bandit:
name: SAST (bandit)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install bandit
- run: bandit -r decnet/ -ll -x decnet/services/registry.py
pip-audit:
name: Dependency audit (pip-audit)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install pip-audit
- run: pip install -e .[dev]
- run: pip-audit --skip-editable

View File

@@ -0,0 +1,135 @@
name: Release
on:
push:
branches: [main]
paths-ignore:
- "**/*.md"
- "docs/**"
env:
REGISTRY: git.resacachile.cl
OWNER: anti
jobs:
tag:
name: Auto-tag release
runs-on: ubuntu-latest
outputs:
version: ${{ steps.version.outputs.version }}
tag_created: ${{ steps.tag.outputs.created }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.DECNET_PR_TOKEN }}
- name: Configure git
run: |
git config user.name "DECNET CI"
git config user.email "ci@decnet.local"
- name: Bump version and Tag
id: version
run: |
# Calculate next version (v0.x)
LATEST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
NEXT_VER=$(python3 -c "
tag = '$LATEST_TAG'.lstrip('v')
parts = tag.split('.')
major = int(parts[0]) if parts[0] else 0
minor = int(parts[1]) if len(parts) > 1 else 0
print(f'{major}.{minor + 1}.0')
")
echo "Next version: $NEXT_VER (calculated from $LATEST_TAG)"
# Update pyproject.toml
sed -i "s/^version = \".*\"/version = \"$NEXT_VER\"/" pyproject.toml
git add pyproject.toml
git commit -m "chore: auto-release v$NEXT_VER [skip ci]" || echo "No changes to commit"
CHANGELOG=$(git log ${LATEST_TAG}..HEAD --oneline --no-decorate --no-merges)
git tag -a "v$NEXT_VER" -m "Auto-release v$NEXT_VER
Changes since $LATEST_TAG:
$CHANGELOG"
git push origin main --follow-tags
echo "version=$NEXT_VER" >> $GITHUB_OUTPUT
echo "created=true" >> $GITHUB_OUTPUT
docker:
name: Build, scan & push ${{ matrix.service }}
runs-on: ubuntu-latest
needs: tag
strategy:
fail-fast: false
matrix:
service:
- conpot
- docker_api
- elasticsearch
- ftp
- http
- imap
- k8s
- ldap
- llmnr
- mongodb
- mqtt
- mssql
- mysql
- pop3
- postgres
- rdp
- redis
- sip
- smb
- smtp
- snmp
- ssh
- telnet
- tftp
- vnc
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Gitea container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_TOKEN }}
- name: Build image locally
uses: docker/build-push-action@v5
with:
context: templates/${{ matrix.service }}
load: true
push: false
tags: decnet-${{ matrix.service }}:scan
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Install Trivy
run: |
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
- name: Scan with Trivy
run: |
trivy image --exit-code 1 --severity CRITICAL --ignore-unfixed decnet-${{ matrix.service }}:scan
- name: Push image
if: success()
uses: docker/build-push-action@v5
with:
context: templates/${{ matrix.service }}
push: true
tags: |
${{ env.REGISTRY }}/${{ env.OWNER }}/decnet-${{ matrix.service }}:latest
${{ env.REGISTRY }}/${{ env.OWNER }}/decnet-${{ matrix.service }}:v${{ needs.tag.outputs.version }}
cache-from: type=gha

21
.gitignore vendored
View File

@@ -1,4 +1,7 @@
.venv/
logs/
.claude/*
CLAUDE.md
__pycache__/
*.pyc
*.pyo
@@ -8,4 +11,20 @@ build/
decnet-compose.yml
decnet-state.json
*.ini
.env
decnet.log*
*.loggy
*.nmap
linterfails.log
webmail
windows1
*.db
*.db-shm
*.db-wal
decnet.*.log
decnet.json
.env*
.env.local
.coverage
.hypothesis/
profiles/*
tests/test_decnet.db*

View File

@@ -1,56 +0,0 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Commands
```bash
# Install (dev)
pip install -e .
# List registered service plugins
decnet services
# Dry-run (generates compose, no containers)
decnet deploy --mode unihost --deckies 3 --randomize-services --dry-run
# Full deploy (requires root for MACVLAN)
sudo decnet deploy --mode unihost --deckies 5 --interface eth0 --randomize-services
sudo decnet deploy --mode unihost --deckies 3 --services ssh,smb --log-target 192.168.1.5:5140
# Status / teardown
decnet status
sudo decnet teardown --all
sudo decnet teardown --id decky-01
```
## Project Overview
DECNET is a honeypot/deception network framework. It deploys fake machines (called **deckies**) with realistic services (RDP, SMB, SSH, FTP, etc.) to lure and profile attackers. All attacker interactions are aggregated to an isolated logging network (ELK stack / SIEM).
## Deployment Models
**UNIHOST** — one real host spins up _n_ deckies via a container orchestrator. Simpler, single-machine deployment.
**SWARM (MULTIHOST)**_n_ real hosts each running deckies. Orchestrated via Ansible/sshpass or similar tooling.
## Core Technology Choices
- **Containers**: Docker Compose is the starting point but other orchestration frameworks should be evaluated if they serve the project better. `debian:bookworm-slim` is the default base image; mixing in Ubuntu, CentOS, or other distros is encouraged to make the decoy network look heterogeneous.
- **Networking**: Deckies need to appear as real machines on the LAN (own MACs/IPs). MACVLAN and IPVLAN are candidates; the right driver depends on the host environment. WSL has known limitations — bare metal or a VM is preferred for testing.
- **Log pipeline**: Logstash → ELK stack → SIEM (isolated network, not reachable from decoy network)
## Architecture Constraints
- The decoy network must be reachable from the outside (attacker-facing).
- The logging/aggregation network must be isolated from the decoy network.
- A publicly accessible real server acts as the bridge between the two networks.
- Deckies should differ in exposed services and OS fingerprints to appear as a heterogeneous network.
## Development and testing
- For every new feature, pytests must me made.
- Pytest is the main testing framework in use.
- NEVER pass broken code to the user.
- Broken means: not running, not passing 100% tests, etc.
- After tests pass with 100%, always git commit your changes.

164
README.md
View File

@@ -69,7 +69,7 @@ From the outside a decky looks identical to a real machine: it has its own MAC a
## Installation
```bash
git clone <repo-url> DECNET
git clone https://git.resacachile.cl/anti/DECNET
cd DECNET
pip install -e .
```
@@ -207,6 +207,26 @@ sudo decnet deploy --deckies 4 --archetype windows-workstation
[corp-workstations]
archetype = windows-workstation
amount = 4
[win-fileserver]
services = ftp
nmap_os = windows
os_version = Windows Server 2019
[dbsrv01]
ip = 192.168.1.112
services = mysql, http
nmap_os = linux
[dbsrv01.http]
server_header = Apache/2.4.54 (Debian)
response_code = 200
fake_app = wordpress
[dbsrv01.mysql]
mysql_version = 5.7.38-log
mysql_banner = MySQL Community Server
```
---
@@ -454,7 +474,7 @@ Key/value pairs are passed directly to the service plugin as persona config. Com
| `mongodb` | `mongo_version` |
| `elasticsearch` | `es_version`, `cluster_name` |
| `ldap` | `base_dn`, `domain` |
| `snmp` | `snmp_community`, `sys_descr` |
| `snmp` | `snmp_community`, `sys_descr`, `snmp_archetype` (picks predefined sysDescr for `water_plant`, `hospital`, etc.) |
| `mqtt` | `mqtt_version` |
| `sip` | `sip_server`, `sip_domain` |
| `k8s` | `k8s_version` |
@@ -470,6 +490,34 @@ See [`test-full.ini`](test-full.ini) — covers all 25 services across 10 role-t
---
## Environment Configuration (.env)
DECNET supports loading configuration from `.env.local` and `.env` files located in the project root. This is useful for securing secrets like the JWT key and configuring default ports without passing flags every time.
An example `.env.example` is provided:
```ini
# API Options
DECNET_API_HOST=0.0.0.0
DECNET_API_PORT=8000
DECNET_JWT_SECRET=supersecretkey12345
DECNET_INGEST_LOG_FILE=/var/log/decnet/decnet.log
# Web Dashboard Options
DECNET_WEB_HOST=0.0.0.0
DECNET_WEB_PORT=8080
DECNET_ADMIN_USER=admin
DECNET_ADMIN_PASSWORD=admin
# Database pool tuning (applies to both SQLite and MySQL)
DECNET_DB_POOL_SIZE=20 # base pool connections (default: 20)
DECNET_DB_MAX_OVERFLOW=40 # extra connections under burst (default: 40)
```
Copy `.env.example` to `.env.local` and modify it to suit your environment.
---
## Logging
All attacker interactions are forwarded off the decoy network to an isolated logging sink. The log pipeline lives on a separate internal Docker bridge (`decnet_logs`) that is not reachable from the fake LAN.
@@ -631,3 +679,115 @@ The test suite covers:
| `test_cli_service_pool.py` | CLI service resolution |
Every new feature requires passing tests before merging.
### Stress Testing
A [Locust](https://locust.io)-based stress test suite lives in `tests/stress/`. It hammers every API endpoint with realistic traffic patterns to find throughput ceilings and latency degradation.
```bash
# Run via pytest (starts its own server)
pytest -m stress tests/stress/ -v -x -n0 -s
# Crank it up
STRESS_USERS=2000 STRESS_SPAWN_RATE=200 STRESS_DURATION=120 pytest -m stress tests/stress/ -v -x -n0 -s
# Standalone Locust web UI against a running server
locust -f tests/stress/locustfile.py --host http://localhost:8000
```
| Env var | Default | Description |
|---|---|---|
| `STRESS_USERS` | `500` | Total simulated users |
| `STRESS_SPAWN_RATE` | `50` | Users spawned per second |
| `STRESS_DURATION` | `60` | Test duration in seconds |
| `STRESS_WORKERS` | CPU count (max 4) | Uvicorn workers for the test server |
| `STRESS_MIN_RPS` | `500` | Minimum RPS to pass baseline test |
| `STRESS_MAX_P99_MS` | `200` | Maximum p99 latency (ms) to pass |
| `STRESS_SPIKE_USERS` | `1000` | Users for thundering herd test |
| `STRESS_SUSTAINED_USERS` | `200` | Users for sustained load test |
#### Measured baseline
Reference numbers from recent Locust runs against a MySQL backend
(asyncmy driver). All runs hold zero failures throughout.
**Single worker** (unless noted):
| Metric | 500u, tracing on | 1500u, tracing on | 1500u, tracing **off** | 1500u, tracing off, **pinned to 1 core** | 1500u, tracing off, **12 workers** |
|---|---|---|---|---|---|
| Requests served | 396,672 | 232,648 | 277,214 | 3,532 | 308,024 |
| Failures | 0 | 0 | 0 | 0 | 0 |
| Throughput (current RPS) | ~960 | ~880 | ~990 | ~46 | ~1,585 |
| Average latency | 465 ms | 1,774 ms | 1,489 ms | 21.7 s | 930 ms |
| Median (p50) | 100 ms | 690 ms | 340 ms | 270 ms | 700 ms |
| p95 | 1.9 s | 6.5 s | 5.7 s | 115 s | 2.7 s |
| p99 | 2.9 s | 9.5 s | 8.4 s | 122 s | 4.2 s |
| Max observed | 8.3 s | 24.4 s | 20.9 s | 124.5 s | 16.5 s |
Ramp is 15 users/s for the 500u column, 40 users/s otherwise.
Takeaways:
- **Tracing off**: at 1500 users, flipping `DECNET_TRACING=false`
halves p50 (690 → 340 ms) and pushes RPS from ~880 past the
500-user figure on a single worker.
- **12 workers**: RPS scales ~1.6× over a single worker (~990 →
~1585). Sublinear because the workload is DB-bound — MySQL and the
connection pool become the new ceiling, not Python. p99 drops from
8.4 s to 4.2 s.
- **Connection math**: `DECNET_DB_POOL_SIZE=20` × `DECNET_DB_MAX_OVERFLOW=40`
× 12 workers = 720 connections at peak. MySQL's default
`max_connections=151` needs bumping (we used 2000) before running
multi-worker load.
- **Single-core pinning**: ~46 RPS with p95 near two minutes. Interesting
as a "physics floor" datapoint — not a production config.
Top endpoints by volume: `/api/v1/attackers`, `/api/v1/deckies`,
`/api/v1/bounty`, `/api/v1/logs/histogram`, `/api/v1/config`,
`/api/v1/health`, `/api/v1/auth/login`, `/api/v1/logs`.
Notes on tuning:
- **Python 3.14 is currently a no-go for the API server.** Under heavy
concurrent async load the reworked 3.14 GC segfaults inside
`mark_all_reachable` (observed in `_PyGC_Collect` during pending-GC
on 3.14.3). Stick to Python 3.113.13 until upstream stabilises.
- Router-level TTL caches on hot count/stats endpoints (`/stats`,
`/logs` count, `/attackers` count, `/bounty`, `/logs/histogram`,
`/deckies`, `/config`) collapse concurrent duplicate work onto a
single DB hit per window — essential to reach this RPS on one worker.
- Turning off request tracing (`DECNET_TRACING=false`) is the next
free headroom: tracing was still on during the run above.
- On SQLite, `DECNET_DB_POOL_PRE_PING=false` skips the per-checkout
`SELECT 1`. On MySQL, keep it `true` — network disconnects are real.
#### System tuning: open file limit
Under heavy load (500+ concurrent users), the server will exhaust the default Linux open file limit (`ulimit -n`), causing `OSError: [Errno 24] Too many open files`. Most distros default to **1024**, which is far too low for stress testing or production use.
**Before running stress tests:**
```bash
# Check current limit
ulimit -n
# Bump for this shell session
ulimit -n 65536
```
**Permanent fix** — add to `/etc/security/limits.conf`:
```
* soft nofile 65536
* hard nofile 65536
```
Or for systemd-managed services, add `LimitNOFILE=65536` to the unit file.
> This applies to production deployments too — any server handling hundreds of concurrent connections needs a raised file descriptor limit.
# AI Disclosure
This project has been made with lots, and I mean lots of help from AIs. While most of the design was made by me, most of the coding was done by AI models.
Nevertheless, this project will be kept under high scrutiny by humans.

64
decnet.ini.example Normal file
View File

@@ -0,0 +1,64 @@
; /etc/decnet/decnet.ini — DECNET host configuration
;
; Copy to /etc/decnet/decnet.ini and edit. Values here seed os.environ at
; CLI startup via setdefault() — real env vars still win, so you can
; override any value on the shell without editing this file.
;
; A missing file is fine; every daemon has sensible defaults. The main
; reason to use this file is to skip typing the same flags on every
; `decnet` invocation and to pin a host's role via `mode`.
[decnet]
; mode = agent | master
; agent — worker host (runs `decnet agent`, `decnet forwarder`, `decnet updater`).
; Master-only commands (api, swarmctl, swarm, deploy, teardown, ...)
; are hidden from `decnet --help` and refuse to run.
; master — central server (runs `decnet api`, `decnet web`, `decnet swarmctl`,
; `decnet listener`). All commands visible.
mode = agent
; disallow-master = true (default when mode=agent)
; Set to false for hybrid dev hosts that legitimately run both roles.
disallow-master = true
; log-directory — root for DECNET's per-component logs. Systemd units set
; DECNET_SYSTEM_LOGS=<log-directory>/decnet.<component>.log so agent, forwarder,
; and engine each get their own file. The forwarder tails decnet.log.
log-directory = /var/log/decnet
; ─── Agent-only settings (read when mode=agent) ───────────────────────────
[agent]
; Where the master's syslog-TLS listener lives. DECNET_SWARM_MASTER_HOST.
master-host = 192.168.1.50
; Master listener port (RFC 5425 default 6514). DECNET_SWARM_SYSLOG_PORT.
swarm-syslog-port = 6514
; Bind address/port for this worker's agent API (mTLS).
agent-port = 8765
; Cert bundle dir — must contain ca.crt, worker.crt, worker.key from enroll.
; DECNET_AGENT_DIR — honored by the forwarder child as well.
agent-dir = /home/anti/.decnet/agent
; Updater cert bundle (required for `decnet updater`).
updater-dir = /home/anti/.decnet/updater
; ─── Master-only settings (read when mode=master) ─────────────────────────
[master]
; Main API (REST for the React dashboard). DECNET_API_HOST / _PORT.
api-host = 0.0.0.0
api-port = 8000
; React dev-server dashboard (`decnet web`). DECNET_WEB_HOST / _PORT.
web-host = 0.0.0.0
web-port = 8080
; Swarm controller (master-internal). DECNET_SWARMCTL_HOST isn't exposed
; under that name today — this block is the forward-compatible spelling.
; swarmctl-host = 127.0.0.1
; swarmctl-port = 8770
; Syslog-over-TLS listener bind address and port. DECNET_LISTENER_HOST and
; DECNET_SWARM_SYSLOG_PORT. The listener is auto-spawned by `decnet swarmctl`.
listener-host = 0.0.0.0
swarm-syslog-port = 6514
; Master CA dir (for enroll / swarm cert issuance).
; ca-dir = /home/anti/.decnet/ca
; JWT secret for the web API. MUST be set; 32+ bytes. Keep out of git.
; jwt-secret = REPLACE_ME_WITH_A_32_BYTE_SECRET

View File

@@ -1,159 +0,0 @@
<134>1 2026-04-04T07:40:53.045660+00:00 decky-devops k8s - startup - Kubernetes API server starting as decky-devops
<134>1 2026-04-04T07:40:53.058000+00:00 decky-devops docker_api - startup - Docker API server starting as decky-devops
<134>1 2026-04-04T07:40:53.147349+00:00 decky-legacy vnc - startup - VNC server starting as decky-legacy
<134>1 2026-04-04T07:40:53.224094+00:00 decky-fileserv tftp - startup - TFTP server starting as decky-fileserv
<134>1 2026-04-04T07:40:53.231313+00:00 decky-fileserv ftp - startup - FTP server starting as decky-fileserv on port 21
<134>1 2026-04-04T07:40:53.237175+00:00 decky-fileserv smb - startup - SMB server starting as decky-fileserv
<134>1 2026-04-04T07:40:53.331998+00:00 decky-webmail imap - startup - IMAP server starting as decky-webmail
<134>1 2026-04-04T07:40:53.441710+00:00 decky-webmail http - startup - HTTP server starting as decky-webmail
<134>1 2026-04-04T07:40:53.482287+00:00 decky-webmail smtp - startup - SMTP server starting as decky-webmail
<134>1 2026-04-04T07:40:53.487752+00:00 decky-webmail pop3 - startup - POP3 server starting as decky-webmail
<134>1 2026-04-04T07:40:53.493478+00:00 decky-iot mqtt - startup - MQTT server starting as decky-iot
<134>1 2026-04-04T07:40:53.519136+00:00 decky-iot snmp - startup - SNMP server starting as decky-iot
<134>1 2026-04-04T07:40:53.586186+00:00 decky-voip sip - startup - SIP server starting as decky-voip
<134>1 2026-04-04T07:40:53.734237+00:00 decky-dbsrv02 postgres - startup - PostgreSQL server starting as decky-dbsrv02
<134>1 2026-04-04T07:40:53.746573+00:00 decky-voip llmnr - startup - LLMNR/mDNS server starting as decky-voip
<134>1 2026-04-04T07:40:53.792767+00:00 decky-dbsrv02 elasticsearch - startup - Elasticsearch server starting as decky-dbsrv02
<134>1 2026-04-04T07:40:53.817558+00:00 decky-dbsrv02 mongodb - startup - MongoDB server starting as decky-dbsrv02
<134>1 2026-04-04T07:40:53.848912+00:00 decky-ldapdc ldap - startup - LDAP server starting as decky-ldapdc
<134>1 2026-04-04T07:40:53.860378+00:00 decky-winbox rdp - startup - RDP server starting as decky-winbox on port 3389
<134>1 2026-04-04T07:40:53.911084+00:00 decky-winbox mssql - startup - MSSQL server starting as decky-winbox
<134>1 2026-04-04T07:40:53.978994+00:00 decky-winbox smb - startup - SMB server starting as decky-winbox
<134>1 2026-04-04T07:41:07.439918+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="46462"]
<134>1 2026-04-04T07:41:07.439922+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="54734"]
<134>1 2026-04-04T07:41:07.439868+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="54606"]
<134>1 2026-04-04T07:41:07.440333+00:00 decky-fileserv ftp - connection [decnet@55555 src_ip="192.168.1.5" src_port="39736"]
<134>1 2026-04-04T07:41:07.442465+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:13.446744+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="GET / HTTP/1.0"]
<134>1 2026-04-04T07:41:13.446743+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd=""]
<134>1 2026-04-04T07:41:13.447251+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd=""]
<134>1 2026-04-04T07:41:13.446995+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{}" body=""]
<134>1 2026-04-04T07:41:13.447556+00:00 decky-fileserv ftp - disconnect [decnet@55555 src_ip="192.168.1.5" src_port="39736"]
<134>1 2026-04-04T07:41:18.451412+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:18.451529+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:18.451729+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="55996"]
<134>1 2026-04-04T07:41:18.451746+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="36592"]
<134>1 2026-04-04T07:41:18.451844+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="OPTIONS / HTTP/1.0"]
<134>1 2026-04-04T07:41:18.451928+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd=""]
<134>1 2026-04-04T07:41:23.456442+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:23.456408+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.734697+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="36604"]
<134>1 2026-04-04T07:41:24.736542+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="36606"]
<134>1 2026-04-04T07:41:24.737069+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="56204"]
<134>1 2026-04-04T07:41:24.737449+00:00 decky-fileserv ftp - connection [decnet@55555 src_ip="192.168.1.5" src_port="48992"]
<134>1 2026-04-04T07:41:24.737834+00:00 decky-fileserv ftp - connection [decnet@55555 src_ip="192.168.1.5" src_port="48994"]
<134>1 2026-04-04T07:41:24.738282+00:00 decky-fileserv ftp - connection [decnet@55555 src_ip="192.168.1.5" src_port="49002"]
<134>1 2026-04-04T07:41:24.738760+00:00 decky-fileserv ftp - connection [decnet@55555 src_ip="192.168.1.5" src_port="49004"]
<134>1 2026-04-04T07:41:24.739240+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="36622"]
<134>1 2026-04-04T07:41:24.741300+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="STLS"]
<134>1 2026-04-04T07:41:24.741346+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="STLS"]
<134>1 2026-04-04T07:41:24.741319+00:00 decky-webmail smtp - ehlo [decnet@55555 src="192.168.1.5" domain="nmap.scanme.org"]
<134>1 2026-04-04T07:41:24.741391+00:00 decky-fileserv ftp - user [decnet@55555 username="anonymous"]
<134>1 2026-04-04T07:41:24.741474+00:00 decky-fileserv ftp - user [decnet@55555 username="anonymous"]
<134>1 2026-04-04T07:41:24.741374+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/nmaplowercheck1775288484" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.741566+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/.git/HEAD" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.741988+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.742327+00:00 decky-webmail http - request [decnet@55555 method="PROPFIND" path="/" remote_addr="192.168.1.5" headers="{'Depth': '0', 'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.742608+00:00 decky-webmail http - request [decnet@55555 method="POST" path="/" remote_addr="192.168.1.5" headers="{'Content-Length': '88', 'Connection': 'close', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': '192.168.1.110'}" body="<methodCall> <methodName>system.listMethods</methodName> <params></params> </methodCall>"]
<134>1 2026-04-04T07:41:24.742807+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.741701+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{}" body=""]
<134>1 2026-04-04T07:41:24.742699+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.742135+00:00 decky-webmail http - request [decnet@55555 method="POST" path="/sdk" remote_addr="192.168.1.5" headers="{'Content-Length': '441', 'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body="<soap:Envelope xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Header><operationID>00000001-00000001</operationID></soap:Header><soap:Body><RetrieveServiceContent xmlns=\"urn:internalvim25\"><_this xsi:type=\"ManagedObjectReference\" type=\"ServiceInstance\">ServiceInstance</_this></RetrieveServiceContent></soap:Body></soap:Envelope>"]
<134>1 2026-04-04T07:41:24.742460+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'HEAD', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:24.745408+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.745793+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.745837+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="AUTH NTLM"]
<134>1 2026-04-04T07:41:24.745797+00:00 decky-fileserv ftp - user [decnet@55555 username="anonymous"]
<134>1 2026-04-04T07:41:24.745960+00:00 decky-fileserv ftp - auth_attempt [decnet@55555 username="anonymous" password="IEUser@"]
<134>1 2026-04-04T07:41:24.745842+00:00 decky-webmail http - request [decnet@55555 method="FGDH" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.746083+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="56216"]
<134>1 2026-04-04T07:41:24.746041+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="56008"]
<134>1 2026-04-04T07:41:24.745961+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'GET', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:24.746514+00:00 decky-fileserv ftp - auth_attempt [decnet@55555 username="anonymous" password="IEUser@"]
<134>1 2026-04-04T07:41:24.746245+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/NmapUpperCheck1775288484" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.746723+00:00 decky-fileserv ftp - disconnect [decnet@55555 src_ip="192.168.1.5" src_port="48994"]
<134>1 2026-04-04T07:41:24.746073+00:00 decky-webmail http - request [decnet@55555 method="PROPFIND" path="/" remote_addr="192.168.1.5" headers="{'Content-Length': '0', 'Connection': 'close', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Host': '192.168.1.110', 'Depth': '1'}" body=""]
<134>1 2026-04-04T07:41:24.795603+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="TlRMTVNTUAABAAAAB4IIoAAAAAAAAAAAAAAAAAAAAAA="]
<134>1 2026-04-04T07:41:24.795629+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.795621+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="56016"]
<134>1 2026-04-04T07:41:24.795604+00:00 decky-fileserv ftp - auth_attempt [decnet@55555 username="anonymous" password="IEUser@"]
<134>1 2026-04-04T07:41:24.795738+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.795928+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/robots.txt" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.796118+00:00 decky-webmail http - request [decnet@55555 method="PROPFIND" path="/" remote_addr="192.168.1.5" headers="{'Depth': '0', 'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.845180+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="56226"]
<134>1 2026-04-04T07:41:24.845355+00:00 decky-webmail smtp - ehlo [decnet@55555 src="192.168.1.5" domain="nmap.scanme.org"]
<134>1 2026-04-04T07:41:24.845379+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'POST', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:24.894554+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.894871+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/Nmap/folder/check1775288484" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.895133+00:00 decky-webmail http - request [decnet@55555 method="POST" path="/" remote_addr="192.168.1.5" headers="{'Content-Length': '0', 'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:24.944224+00:00 decky-webmail smtp - ehlo [decnet@55555 src="192.168.1.5" domain="nmap.scanme.org"]
<134>1 2026-04-04T07:41:24.944215+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="56032"]
<134>1 2026-04-04T07:41:24.944346+00:00 decky-webmail smtp - unknown_command [decnet@55555 src="192.168.1.5" command="HELP"]
<134>1 2026-04-04T07:41:24.994175+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:24.994238+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="56234"]
<134>1 2026-04-04T07:41:24.994534+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'PUT', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:25.044450+00:00 decky-webmail smtp - auth_attempt [decnet@55555 src="192.168.1.5" command="AUTH NTLM"]
<134>1 2026-04-04T07:41:25.044450+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="000b AUTHENTICATE NTLM"]
<134>1 2026-04-04T07:41:25.044580+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:25.044674+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:25.093812+00:00 decky-webmail smtp - ehlo [decnet@55555 src="192.168.1.5" domain="nmap.scanme.org"]
<134>1 2026-04-04T07:41:25.094022+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/favicon.ico" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Connection': 'close'}" body=""]
<134>1 2026-04-04T07:41:25.142989+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="TlRMTVNTUAABAAAAB4IIoAAAAAAAAAAAAAAAAAAAAAA="]
<134>1 2026-04-04T07:41:25.143126+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'DELETE', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:25.241565+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:25.241690+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:25.290930+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:25.291070+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'TRACE', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:25.438930+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'OPTIONS', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:25.586609+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'CONNECT', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:25.734144+00:00 decky-webmail http - request [decnet@55555 method="OPTIONS" path="/" remote_addr="192.168.1.5" headers="{'Connection': 'close', 'Origin': 'example.com', 'User-Agent': 'Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)', 'Access-Control-Request-Method': 'PATCH', 'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:29.778527+00:00 decky-fileserv ftp - disconnect [decnet@55555 src_ip="192.168.1.5" src_port="49004"]
<134>1 2026-04-04T07:41:31.976898+00:00 decky-fileserv ftp - disconnect [decnet@55555 src_ip="192.168.1.5" src_port="48992"]
<134>1 2026-04-04T07:41:33.746244+00:00 decky-fileserv ftp - disconnect [decnet@55555 src_ip="192.168.1.5" src_port="49002"]
<134>1 2026-04-04T07:41:33.747544+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="39972"]
<134>1 2026-04-04T07:41:33.748339+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{}" body=""]
<134>1 2026-04-04T07:41:33.748742+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="39984"]
<134>1 2026-04-04T07:41:33.748916+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="($<03>i<EFBFBD><69>jÁ{Bк<>F<EFBFBD><46><02><>(ri[;z <09>s~_?<3F> <20>+Ō,7n/.<0F><><14>P<EFBFBD>PO<50><4F>3=<3D>\\<5C>0RS<19>r395/<2F>,<2C>0<00>̨̩̪<CCA8><CCAA><EFBFBD><EFBFBD><EFBFBD>\]<5D>a<EFBFBD>S<EFBFBD>+<2B>/<00><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>\\<5C>`<60>R<EFBFBD>$"]
<134>1 2026-04-04T07:41:33.748959+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<22><00><> <09>E<00><><EFBFBD><EFBFBD>Q<00><><EFBFBD><EFBFBD>P=<00><<00><00>Ai<> "]
<134>1 2026-04-04T07:41:33.748983+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<11><11><11>#
(&    "]
<134>1 2026-04-04T07:41:33.749009+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd=" +-3<04><04><11><04>aq<61>څv<DA85>+DS[\\<5C><><EFBFBD>c-'4R<34>(<28><>a<EFBFBD>J<08><>L<>2^7<><37>luѡ<75><D1A1>v<EFBFBD>^<05>g%Y<><59><EFBFBD><EFBFBD>Sx<53>r<EFBFBD>-jR<12><>C#b<><62><EFBFBD>r<EFBFBD><72>"]
<134>1 2026-04-04T07:41:33.749035+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<22>i<EFBFBD><69><EFBFBD>TLػ<4C><13>A<EFBFBD>1<EFBFBD>s<EFBFBD><73>'"]
<134>1 2026-04-04T07:41:33.749060+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<22>4,<2C><> <0C>
<EFBFBD>G<1B><>q<EFBFBD>–B仠<42><01> K7O<37>Y<EFBFBD>rq<><71><EFBFBD>3VtzD<7A><44>̨"]
<134>1 2026-04-04T07:41:33.749041+00:00 decky-webmail http - request [decnet@55555 method="GET" path="/" remote_addr="192.168.1.5" headers="{'Host': '192.168.1.110'}" body=""]
<134>1 2026-04-04T07:41:33.749083+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="Ѓu<0F>Y<EFBFBD><59><EFBFBD><EFBFBD>-<2D>\"<22><>eSp*Zֹ L<><4C>{ <09>#<23>:<3A><><EFBFBD><EFBFBD>9!ɂCm<43>I<EFBFBD>$ݦ1ϻo-H<><48><EFBFBD>*<2A><17>X<EFBFBD><58>{<7B><><EFBFBD><EFBFBD>p<EFBFBD>ޚ|W<><57>ƫf <16><>T<EFBFBD>%<25>F5<46>8<EFBFBD><38><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>WU<57>a<EFBFBD><61>c ><3E><> u\]<5D><>i~<7E>V<EFBFBD><56><EFBFBD>&<26>z"]
<134>1 2026-04-04T07:41:33.749104+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<22>1<16>\\<5C><>Wc<57>C<EFBFBD><1B><><76>6z<36> <20><>0<EFBFBD>$iS<69> 3'<27>8<<3C>"]
<134>1 2026-04-04T07:41:33.749122+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<10><>2<EFBFBD><32>"]
<134>1 2026-04-04T07:41:33.749138+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="<22><>\"/<2F> <0B>E<08><><EFBFBD>tv!"]
<134>1 2026-04-04T07:41:33.749160+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="񋞸<>)<29><>[j}<7D>`<60><>\\V|k<><6B>ԣy<D4A3>Y<EFBFBD><59>?<05>2<EFBFBD>`<17>w¬ܶ#<23>X}<7D><>[cg3<67>W8E<38>tl<74>y<<3C>Z<1B>ʇ<EFBFBD><CA87><EFBFBD>% dQBk9=+<2B><07>ȳ<EFBFBD><16><>(<28>y<EFBFBD><79><EFBFBD><EFBFBD>*[8<><38><EFBFBD>qyN`<60><><EFBFBD>5>j<><6A> 825<13>f<EFBFBD><66>2. s\\dLar"]
<134>1 2026-04-04T07:41:33.749238+00:00 decky-webmail imap - connect [decnet@55555 src="192.168.1.5" src_port="39996"]
<134>1 2026-04-04T07:41:33.749290+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="WSi<><69><EFBFBD>,g<>O<EFBFBD>(T<>YC<59><01>ѢO<D1A2>Ę<EFBFBD><16><><EFBFBD><EFBFBD>"]
<134>1 2026-04-04T07:41:33.749328+00:00 decky-webmail imap - command [decnet@55555 src="192.168.1.5" cmd="/"]
<134>1 2026-04-04T07:41:33.749369+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.749411+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.749441+00:00 decky-webmail imap - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.749484+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="47822"]
<134>1 2026-04-04T07:41:33.749708+00:00 decky-webmail smtp - ehlo [decnet@55555 src="192.168.1.5" domain="nmap.scanme.org"]
<134>1 2026-04-04T07:41:33.749852+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.749936+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="47834"]
<134>1 2026-04-04T07:41:33.750118+00:00 decky-webmail smtp - connect [decnet@55555 src="192.168.1.5" src_port="47846"]
<134>1 2026-04-04T07:41:33.750202+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.750261+00:00 decky-webmail smtp - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.750423+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="48678"]
<134>1 2026-04-04T07:41:33.750684+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="STLS"]
<134>1 2026-04-04T07:41:33.750772+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.750852+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="48684"]
<134>1 2026-04-04T07:41:33.750920+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="($h_\\n<>W<EFBFBD>f 6~<10><><EFBFBD>'U<><55>ԥ\"{<7B><><EFBFBD>jg<04> <20>*M<>$<24><><EFBFBD>at}5gq<67><71>)<29>X<7F>w<EFBFBD>7<EFBFBD><37>_<>r395/<2F>,<2C>0<00>̨̩̪<CCA8><CCAA><EFBFBD><EFBFBD><EFBFBD>\]<5D>a<EFBFBD>S<EFBFBD>+<2B>/<00><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>\\<5C>`<60>R<EFBFBD>"]
<134>1 2026-04-04T07:41:33.750964+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="<22><00><> <09>E<00><><EFBFBD><EFBFBD>Q<00><><EFBFBD><EFBFBD>P=<00><<00><00>Ai<> "]
<134>1 2026-04-04T07:41:33.750997+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="<11><11><11>#
(&    "]
<134>1 2026-04-04T07:41:33.751027+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd=" +-3<04><04><11><04><>pEt<45>\"g3<67>Ff` c<>FY4<59>2<EFBFBD>$3<>t<EFBFBD><74>Q<EFBFBD>QKR/ <20>+5<><35><EFBFBD><EFBFBD> q
<EFBFBD>&<26>@<40><><EFBFBD><EFBFBD><07><><1F>B<EFBFBD><42>(?<3F>3<EFBFBD>R/
<EFBFBD>3<EFBFBD>qr<EFBFBD>! <20>"]
<134>1 2026-04-04T07:41:33.751096+00:00 decky-webmail pop3 - connect [decnet@55555 src="192.168.1.5" src_port="48698"]
<134>1 2026-04-04T07:41:33.751153+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="WSi<><69><EFBFBD>{<7B><><EFBFBD>5<EFBFBD><35><01>R<EFBFBD><52>!<21>;jj
<EFBFBD><EFBFBD><EFBFBD> "]
<134>1 2026-04-04T07:41:33.751197+00:00 decky-webmail pop3 - command [decnet@55555 src="192.168.1.5" cmd="/"]
<134>1 2026-04-04T07:41:33.751245+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]
<134>1 2026-04-04T07:41:33.751285+00:00 decky-webmail pop3 - disconnect [decnet@55555 src="192.168.1.5"]

View File

@@ -0,0 +1,12 @@
"""DECNET — honeypot deception-network framework.
This __init__ runs once, on the first `import decnet.*`. It seeds
os.environ from /etc/decnet/decnet.ini (if present) so that later
module-level reads in decnet.env pick up the INI values as if they had
been exported by the shell. Real env vars always win via setdefault().
Kept minimal on purpose — any heavier work belongs in a submodule.
"""
from decnet.config_ini import load_ini_config as _load_ini_config
_load_ini_config()

7
decnet/agent/__init__.py Normal file
View File

@@ -0,0 +1,7 @@
"""DECNET worker agent — runs on every SWARM worker host.
Exposes an mTLS-protected FastAPI service the master's SWARM controller
calls to deploy, mutate, and tear down deckies locally. The agent reuses
the existing `decnet.engine.deployer` code path unchanged, so a worker runs
deckies the same way `decnet deploy --mode unihost` does today.
"""

144
decnet/agent/app.py Normal file
View File

@@ -0,0 +1,144 @@
"""Worker-side FastAPI app.
Protected by mTLS at the ASGI/uvicorn transport layer: uvicorn is started
with ``--ssl-ca-certs`` + ``--ssl-cert-reqs 2`` (CERT_REQUIRED), so any
client that cannot prove a cert signed by the DECNET CA is rejected before
reaching a handler. Once past the TLS handshake, all peers are trusted
equally (the only entity holding a CA-signed cert is the master
controller).
Endpoints mirror the existing unihost CLI verbs:
* ``POST /deploy`` — body: serialized ``DecnetConfig``
* ``POST /teardown`` — body: optional ``{"decky_id": "..."}``
* ``POST /mutate`` — body: ``{"decky_id": "...", "services": [...]}``
* ``GET /status`` — deployment snapshot
* ``GET /health`` — liveness probe, does NOT require mTLS? No — mTLS
still required; master pings it with its cert.
"""
from __future__ import annotations
from contextlib import asynccontextmanager
from typing import Optional
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from decnet.agent import executor as _exec
from decnet.agent import heartbeat as _heartbeat
from decnet.config import DecnetConfig
from decnet.logging import get_logger
log = get_logger("agent.app")
@asynccontextmanager
async def _lifespan(app: FastAPI):
# Best-effort: if identity/bundle plumbing isn't configured (e.g. dev
# runs or non-enrolled hosts), heartbeat.start() is a silent no-op.
_heartbeat.start()
try:
yield
finally:
await _heartbeat.stop()
app = FastAPI(
title="DECNET SWARM Agent",
version="0.1.0",
docs_url=None, # no interactive docs on worker — narrow attack surface
redoc_url=None,
openapi_url=None,
lifespan=_lifespan,
responses={
400: {"description": "Malformed request body"},
500: {"description": "Executor error"},
},
)
# ------------------------------------------------------------------ schemas
class DeployRequest(BaseModel):
config: DecnetConfig = Field(..., description="Full DecnetConfig to materialise on this worker")
dry_run: bool = False
no_cache: bool = False
class TeardownRequest(BaseModel):
decky_id: Optional[str] = None
class MutateRequest(BaseModel):
decky_id: str
services: list[str]
# ------------------------------------------------------------------ routes
@app.get("/health")
async def health() -> dict[str, str]:
return {"status": "ok"}
@app.get("/status")
async def status() -> dict:
return await _exec.status()
@app.post(
"/deploy",
responses={500: {"description": "Deployer raised an exception materialising the config"}},
)
async def deploy(req: DeployRequest) -> dict:
try:
await _exec.deploy(req.config, dry_run=req.dry_run, no_cache=req.no_cache)
except Exception as exc:
log.exception("agent.deploy failed")
raise HTTPException(status_code=500, detail=str(exc)) from exc
return {"status": "deployed", "deckies": len(req.config.deckies)}
@app.post(
"/teardown",
responses={500: {"description": "Teardown raised an exception"}},
)
async def teardown(req: TeardownRequest) -> dict:
try:
await _exec.teardown(req.decky_id)
except Exception as exc:
log.exception("agent.teardown failed")
raise HTTPException(status_code=500, detail=str(exc)) from exc
return {"status": "torn_down", "decky_id": req.decky_id}
@app.post(
"/self-destruct",
responses={500: {"description": "Reaper could not be scheduled"}},
)
async def self_destruct() -> dict:
"""Stop all DECNET services on this worker and delete the install
footprint. Called by the master during decommission. Logs under
/var/log/decnet* are preserved. Fire-and-forget — returns 202 before
the reaper starts deleting files."""
try:
await _exec.self_destruct()
except Exception as exc:
log.exception("agent.self_destruct failed")
raise HTTPException(status_code=500, detail=str(exc)) from exc
return {"status": "self_destruct_scheduled"}
@app.post(
"/mutate",
responses={501: {"description": "Worker-side mutate not yet implemented"}},
)
async def mutate(req: MutateRequest) -> dict:
# TODO: implement worker-side mutate. Currently the master performs
# mutation by re-sending a full /deploy with the updated DecnetConfig;
# this avoids duplicating mutation logic on the worker for v1. When
# ready, replace the 501 with a real redeploy-of-a-single-decky path.
raise HTTPException(
status_code=501,
detail="Per-decky mutate is performed via /deploy with updated services",
)

223
decnet/agent/executor.py Normal file
View File

@@ -0,0 +1,223 @@
"""Thin adapter between the agent's HTTP endpoints and the existing
``decnet.engine.deployer`` code path.
Kept deliberately small: the agent does not re-implement deployment logic,
it only translates a master RPC into the same function calls the unihost
CLI already uses. Everything runs in a worker thread (the deployer is
blocking) so the FastAPI event loop stays responsive.
"""
from __future__ import annotations
import asyncio
from ipaddress import IPv4Network
from typing import Any
from decnet.engine import deployer as _deployer
from decnet.config import DecnetConfig, load_state, clear_state
from decnet.logging import get_logger
from decnet.network import (
allocate_ips,
detect_interface,
detect_subnet,
get_host_ip,
)
log = get_logger("agent.executor")
def _relocalize(config: DecnetConfig) -> DecnetConfig:
"""Rewrite a master-built config to the worker's local network reality.
The master populates ``interface``/``subnet``/``gateway`` from its own
box before dispatching, which blows up the deployer on any worker whose
NIC name differs (common in heterogeneous fleets — master on ``wlp6s0``,
worker on ``enp0s3``). We always re-detect locally; if the worker sits
on a different subnet than the master, decky IPs are re-allocated from
the worker's subnet so they're actually reachable.
"""
local_iface = detect_interface()
local_subnet, local_gateway = detect_subnet(local_iface)
local_host_ip = get_host_ip(local_iface)
updates: dict[str, Any] = {
"interface": local_iface,
"subnet": local_subnet,
"gateway": local_gateway,
}
master_net = IPv4Network(config.subnet, strict=False) if config.subnet else None
local_net = IPv4Network(local_subnet, strict=False)
if master_net is None or master_net != local_net:
log.info(
"agent.deploy subnet mismatch master=%s local=%s — re-allocating decky IPs",
config.subnet, local_subnet,
)
fresh_ips = allocate_ips(
subnet=local_subnet,
gateway=local_gateway,
host_ip=local_host_ip,
count=len(config.deckies),
)
new_deckies = [d.model_copy(update={"ip": ip}) for d, ip in zip(config.deckies, fresh_ips)]
updates["deckies"] = new_deckies
return config.model_copy(update=updates)
async def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = False) -> None:
"""Run the blocking deployer off-loop. The deployer itself calls
save_state() internally once the compose file is materialised."""
log.info(
"agent.deploy mode=%s deckies=%d interface=%s (incoming)",
config.mode, len(config.deckies), config.interface,
)
if config.mode == "swarm":
config = _relocalize(config)
log.info(
"agent.deploy relocalized interface=%s subnet=%s gateway=%s",
config.interface, config.subnet, config.gateway,
)
await asyncio.to_thread(_deployer.deploy, config, dry_run, no_cache, False)
async def teardown(decky_id: str | None = None) -> None:
log.info("agent.teardown decky_id=%s", decky_id)
await asyncio.to_thread(_deployer.teardown, decky_id)
if decky_id is None:
await asyncio.to_thread(clear_state)
def _decky_runtime_states(config: DecnetConfig) -> dict[str, dict[str, Any]]:
"""Map decky_name → {"running": bool, "services": {svc: container_state}}.
Queried so the master can tell, after a partial-failure deploy, which
deckies actually came up instead of tainting the whole shard as failed.
Best-effort: a docker error returns an empty map, not an exception.
"""
try:
import docker # local import — agent-only path
client = docker.from_env()
live = {c.name: c.status for c in client.containers.list(all=True, ignore_removed=True)}
except Exception: # pragma: no cover — defensive
log.exception("_decky_runtime_states: docker query failed")
return {}
out: dict[str, dict[str, Any]] = {}
for d in config.deckies:
svc_states = {
svc: live.get(f"{d.name}-{svc.replace('_', '-')}", "absent")
for svc in d.services
}
out[d.name] = {
"running": bool(svc_states) and all(s == "running" for s in svc_states.values()),
"services": svc_states,
}
return out
_REAPER_SCRIPT = r"""#!/bin/bash
# DECNET agent self-destruct reaper.
# Runs detached from the agent process so it survives the agent's death.
# Waits briefly for the HTTP response to drain, then stops services,
# wipes install paths, and preserves logs.
set +e
sleep 3
# Stop decky containers started by the local deployer (best-effort).
if command -v docker >/dev/null 2>&1; then
docker ps -q --filter "label=com.docker.compose.project=decnet" | xargs -r docker stop
docker ps -aq --filter "label=com.docker.compose.project=decnet" | xargs -r docker rm -f
docker network rm decnet_lan 2>/dev/null
fi
# Stop+disable every systemd unit the installer may have dropped.
for unit in decnet-agent decnet-engine decnet-collector decnet-forwarder decnet-prober decnet-sniffer decnet-updater; do
systemctl stop "$unit" 2>/dev/null
systemctl disable "$unit" 2>/dev/null
done
# Nuke install paths. Logs under /var/log/decnet* are intentionally
# preserved — the operator typically wants them for forensic review.
rm -rf /opt/decnet* /var/lib/decnet/* /usr/local/bin/decnet* /etc/decnet
rm -f /etc/systemd/system/decnet-*.service /etc/systemd/system/decnet-*.timer
systemctl daemon-reload 2>/dev/null
rm -f "$0"
"""
async def self_destruct() -> None:
"""Tear down deckies, then spawn a detached reaper that wipes the
install footprint. Returns immediately so the HTTP response can drain
before the reaper starts deleting files out from under the agent."""
import os
import shutil
import subprocess # nosec B404
import tempfile
# Best-effort teardown first — the reaper also runs docker stop, but
# going through the deployer gives the host-macvlan/ipvlan helper a
# chance to clean up routes cleanly.
try:
await asyncio.to_thread(_deployer.teardown, None)
await asyncio.to_thread(clear_state)
except Exception:
log.exception("self_destruct: pre-reap teardown failed — reaper will force-stop containers")
# Reaper lives under /tmp so it survives rm -rf /opt/decnet*.
fd, path = tempfile.mkstemp(prefix="decnet-reaper-", suffix=".sh", dir="/tmp") # nosec B108 — reaper must outlive /opt/decnet removal
try:
os.write(fd, _REAPER_SCRIPT.encode())
finally:
os.close(fd)
os.chmod(path, 0o700) # nosec B103 — root-owned reaper, needs exec
# The reaper MUST run outside decnet-agent.service's cgroup — otherwise
# `systemctl stop decnet-agent` SIGTERMs the whole cgroup (reaper included)
# before rm -rf completes. `start_new_session=True` gets us a fresh POSIX
# session but does NOT escape the systemd cgroup. So we prefer
# `systemd-run --scope` (launches the command in a transient scope
# detached from the caller's service), falling back to a bare Popen if
# systemd-run is unavailable (non-systemd host / container).
systemd_run = shutil.which("systemd-run")
if systemd_run:
argv = [
systemd_run,
"--collect",
"--unit", f"decnet-reaper-{os.getpid()}",
"--description", "DECNET agent self-destruct reaper",
"/bin/bash", path,
]
spawn_kwargs = {"start_new_session": True}
else:
argv = ["/bin/bash", path]
spawn_kwargs = {"start_new_session": True}
subprocess.Popen( # nosec B603
argv,
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
close_fds=True,
**spawn_kwargs,
)
log.warning(
"self_destruct: reaper spawned path=%s via=%s — agent will die in ~3s",
path, "systemd-run" if systemd_run else "popen",
)
async def status() -> dict[str, Any]:
state = await asyncio.to_thread(load_state)
if state is None:
return {"deployed": False, "deckies": []}
config, _compose_path = state
runtime = await asyncio.to_thread(_decky_runtime_states, config)
return {
"deployed": True,
"mode": config.mode,
"compose_path": str(_compose_path),
"deckies": [d.model_dump() for d in config.deckies],
"runtime": runtime,
}

134
decnet/agent/heartbeat.py Normal file
View File

@@ -0,0 +1,134 @@
"""Agent → master liveness heartbeat loop.
Every ``INTERVAL_S`` seconds the worker posts ``executor.status()`` to
``POST <master>/swarm/heartbeat`` over mTLS. The master pins the
presented client cert's SHA-256 against the ``SwarmHost`` row for the
claimed ``host_uuid``; a match refreshes ``last_heartbeat`` + each
``DeckyShard``'s snapshot + runtime state.
Identity comes from ``/etc/decnet/decnet.ini`` (seeded by the enroll
bundle) — specifically ``DECNET_HOST_UUID`` and ``DECNET_MASTER_HOST``.
The worker's existing ``~/.decnet/agent/`` bundle (or
``/etc/decnet/agent/``) provides the mTLS client cert.
Started/stopped via the agent FastAPI app's lifespan. If identity
plumbing is missing (pre-enrollment dev runs) the loop logs at DEBUG and
declines to start — callers don't have to guard it.
"""
from __future__ import annotations
import asyncio
import pathlib
from typing import Optional
import httpx
from decnet.agent import executor as _exec
from decnet.logging import get_logger
from decnet.swarm import pki
from decnet.swarm.log_forwarder import build_worker_ssl_context
log = get_logger("agent.heartbeat")
INTERVAL_S = 30.0
_TIMEOUT = httpx.Timeout(connect=5.0, read=10.0, write=5.0, pool=5.0)
_task: Optional[asyncio.Task] = None
def _resolve_agent_dir() -> pathlib.Path:
"""Match the agent-dir resolution order used by the agent server:
DECNET_AGENT_DIR env, else /etc/decnet/agent (production install),
else ~/.decnet/agent (dev)."""
import os
env = os.environ.get("DECNET_AGENT_DIR")
if env:
return pathlib.Path(env)
system = pathlib.Path("/etc/decnet/agent")
if system.exists():
return system
return pki.DEFAULT_AGENT_DIR
async def _tick(client: httpx.AsyncClient, url: str, host_uuid: str, agent_version: str) -> None:
snap = await _exec.status()
resp = await client.post(
url,
json={
"host_uuid": host_uuid,
"agent_version": agent_version,
"status": snap,
},
)
# 403 / 404 are terminal-ish — we still keep looping because an
# operator may re-enrol the host mid-session, but we log loudly so
# prod ops can spot cert-pinning drift.
if resp.status_code == 204:
return
log.warning(
"heartbeat rejected status=%d body=%s",
resp.status_code, resp.text[:200],
)
async def _loop(url: str, host_uuid: str, agent_version: str, ssl_ctx) -> None:
log.info("heartbeat loop starting url=%s host_uuid=%s interval=%ss",
url, host_uuid, INTERVAL_S)
async with httpx.AsyncClient(verify=ssl_ctx, timeout=_TIMEOUT) as client:
while True:
try:
await _tick(client, url, host_uuid, agent_version)
except asyncio.CancelledError:
raise
except Exception:
log.exception("heartbeat tick failed — will retry in %ss", INTERVAL_S)
await asyncio.sleep(INTERVAL_S)
def start() -> Optional[asyncio.Task]:
"""Kick off the background heartbeat task. No-op if identity is
unconfigured (dev mode) — the caller doesn't need to check."""
global _task
from decnet.env import (
DECNET_HOST_UUID,
DECNET_MASTER_HOST,
DECNET_SWARMCTL_PORT,
)
if _task is not None and not _task.done():
return _task
if not DECNET_HOST_UUID or not DECNET_MASTER_HOST:
log.debug("heartbeat not starting — DECNET_HOST_UUID or DECNET_MASTER_HOST unset")
return None
agent_dir = _resolve_agent_dir()
try:
ssl_ctx = build_worker_ssl_context(agent_dir)
except Exception:
log.exception("heartbeat not starting — worker SSL context unavailable at %s", agent_dir)
return None
try:
from decnet import __version__ as _v
agent_version = _v
except Exception:
agent_version = "unknown"
url = f"https://{DECNET_MASTER_HOST}:{DECNET_SWARMCTL_PORT}/swarm/heartbeat"
_task = asyncio.create_task(
_loop(url, DECNET_HOST_UUID, agent_version, ssl_ctx),
name="agent-heartbeat",
)
return _task
async def stop() -> None:
global _task
if _task is None:
return
_task.cancel()
try:
await _task
except (asyncio.CancelledError, Exception):
pass
_task = None

70
decnet/agent/server.py Normal file
View File

@@ -0,0 +1,70 @@
"""Worker-agent uvicorn launcher.
Starts ``decnet.agent.app:app`` over HTTPS with mTLS enforcement. The
worker must already have a bundle in ``~/.decnet/agent/`` (delivered by
``decnet swarm enroll`` from the master); if it does not, we refuse to
start — unauthenticated agents are not a supported mode.
"""
from __future__ import annotations
import os
import pathlib
import signal
import subprocess # nosec B404
import sys
from decnet.logging import get_logger
from decnet.swarm import pki
log = get_logger("agent.server")
def run(host: str, port: int, agent_dir: pathlib.Path = pki.DEFAULT_AGENT_DIR) -> int:
bundle = pki.load_worker_bundle(agent_dir)
if bundle is None:
print(
f"[agent] No cert bundle at {agent_dir}. "
f"Run `decnet swarm enroll` from the master first.",
file=sys.stderr,
)
return 2
keyfile = agent_dir / "worker.key"
certfile = agent_dir / "worker.crt"
cafile = agent_dir / "ca.crt"
cmd = [
sys.executable,
"-m",
"uvicorn",
"decnet.agent.app:app",
"--host",
host,
"--port",
str(port),
"--ssl-keyfile",
str(keyfile),
"--ssl-certfile",
str(certfile),
"--ssl-ca-certs",
str(cafile),
# 2 == ssl.CERT_REQUIRED — clients MUST present a CA-signed cert.
"--ssl-cert-reqs",
"2",
]
log.info("agent starting host=%s port=%d bundle=%s", host, port, agent_dir)
# Own process group for clean Ctrl+C / SIGTERM propagation to uvicorn
# workers (same pattern as `decnet api`).
proc = subprocess.Popen(cmd, start_new_session=True) # nosec B603
try:
return proc.wait()
except KeyboardInterrupt:
try:
os.killpg(proc.pid, signal.SIGTERM)
try:
return proc.wait(timeout=10)
except subprocess.TimeoutExpired:
os.killpg(proc.pid, signal.SIGKILL)
return proc.wait()
except ProcessLookupError:
return 0

View File

@@ -148,7 +148,7 @@ ARCHETYPES: dict[str, Archetype] = {
slug="deaddeck",
display_name="Deaddeck (Entry Point)",
description="Internet-facing entry point with real interactive SSH — no honeypot emulation",
services=["real_ssh"],
services=["ssh"],
preferred_distros=["debian", "ubuntu22"],
nmap_os="linux",
),
@@ -167,4 +167,4 @@ def all_archetypes() -> dict[str, Archetype]:
def random_archetype() -> Archetype:
return random.choice(list(ARCHETYPES.values()))
return random.choice(list(ARCHETYPES.values())) # nosec B311

View File

@@ -1,461 +0,0 @@
"""
DECNET CLI — entry point for all commands.
Usage:
decnet deploy --mode unihost --deckies 5 --randomize-services
decnet status
decnet teardown [--all | --id decky-01]
decnet services
"""
import random
from typing import Optional
import typer
from rich.console import Console
from rich.table import Table
from decnet.archetypes import Archetype, all_archetypes, get_archetype
from decnet.config import (
DeckyConfig,
DecnetConfig,
random_hostname,
)
from decnet.distros import all_distros, get_distro, random_distro
from decnet.ini_loader import IniConfig, load_ini
from decnet.network import detect_interface, detect_subnet, allocate_ips, get_host_ip
from decnet.services.registry import all_services
app = typer.Typer(
name="decnet",
help="Deploy a deception network of honeypot deckies on your LAN.",
no_args_is_help=True,
)
console = Console()
def _all_service_names() -> list[str]:
"""Return all registered service names from the live plugin registry."""
return sorted(all_services().keys())
def _resolve_distros(
distros_explicit: list[str] | None,
randomize_distros: bool,
n: int,
archetype: Archetype | None = None,
) -> list[str]:
"""Return a list of n distro slugs based on CLI flags or archetype preference."""
if distros_explicit:
return [distros_explicit[i % len(distros_explicit)] for i in range(n)]
if randomize_distros:
return [random_distro().slug for _ in range(n)]
if archetype:
pool = archetype.preferred_distros
return [pool[i % len(pool)] for i in range(n)]
# Default: cycle through all distros to maximize heterogeneity
slugs = list(all_distros().keys())
return [slugs[i % len(slugs)] for i in range(n)]
def _build_deckies(
n: int,
ips: list[str],
services_explicit: list[str] | None,
randomize_services: bool,
distros_explicit: list[str] | None = None,
randomize_distros: bool = False,
archetype: Archetype | None = None,
) -> list[DeckyConfig]:
deckies = []
used_combos: set[frozenset] = set()
distro_slugs = _resolve_distros(distros_explicit, randomize_distros, n, archetype)
for i, ip in enumerate(ips):
name = f"decky-{i + 1:02d}"
distro = get_distro(distro_slugs[i])
hostname = random_hostname(distro.slug)
if services_explicit:
svc_list = services_explicit
elif archetype:
svc_list = list(archetype.services)
elif randomize_services:
svc_pool = _all_service_names()
attempts = 0
while True:
count = random.randint(1, min(3, len(svc_pool)))
chosen = frozenset(random.sample(svc_pool, count))
attempts += 1
if chosen not in used_combos or attempts > 20:
break
svc_list = list(chosen)
used_combos.add(chosen)
else:
typer.echo("Error: provide --services, --archetype, or --randomize-services.", err=True)
raise typer.Exit(1)
deckies.append(
DeckyConfig(
name=name,
ip=ip,
services=svc_list,
distro=distro.slug,
base_image=distro.image,
build_base=distro.build_base,
hostname=hostname,
archetype=archetype.slug if archetype else None,
nmap_os=archetype.nmap_os if archetype else "linux",
)
)
return deckies
def _build_deckies_from_ini(
ini: IniConfig,
subnet_cidr: str,
gateway: str,
host_ip: str,
randomize: bool,
) -> list[DeckyConfig]:
"""Build DeckyConfig list from an IniConfig, auto-allocating missing IPs."""
from ipaddress import IPv4Address, IPv4Network
explicit_ips: set[IPv4Address] = {
IPv4Address(s.ip) for s in ini.deckies if s.ip
}
net = IPv4Network(subnet_cidr, strict=False)
reserved = {
net.network_address,
net.broadcast_address,
IPv4Address(gateway),
IPv4Address(host_ip),
} | explicit_ips
auto_pool = (str(addr) for addr in net.hosts() if addr not in reserved)
deckies: list[DeckyConfig] = []
for spec in ini.deckies:
# Resolve archetype (if any) — explicit services/distro override it
arch: Archetype | None = None
if spec.archetype:
try:
arch = get_archetype(spec.archetype)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
# Distro: archetype preferred list → random → global cycle
distro_pool = arch.preferred_distros if arch else list(all_distros().keys())
distro = get_distro(distro_pool[len(deckies) % len(distro_pool)])
hostname = random_hostname(distro.slug)
ip = spec.ip or next(auto_pool, None)
if ip is None:
raise RuntimeError(
f"Not enough free IPs in {subnet_cidr} while assigning IP for '{spec.name}'."
)
if spec.services:
known = set(_all_service_names())
unknown = [s for s in spec.services if s not in known]
if unknown:
console.print(
f"[red]Unknown service(s) in [{spec.name}]: {unknown}. "
f"Available: {_all_service_names()}[/]"
)
raise typer.Exit(1)
svc_list = spec.services
elif arch:
svc_list = list(arch.services)
elif randomize:
svc_pool = _all_service_names()
count = random.randint(1, min(3, len(svc_pool)))
svc_list = random.sample(svc_pool, count)
else:
console.print(
f"[red]Decky '[{spec.name}]' has no services= in config. "
"Add services=, archetype=, or use --randomize-services.[/]"
)
raise typer.Exit(1)
# nmap_os priority: explicit INI key > archetype default > "linux"
resolved_nmap_os = spec.nmap_os or (arch.nmap_os if arch else "linux")
deckies.append(DeckyConfig(
name=spec.name,
ip=ip,
services=svc_list,
distro=distro.slug,
base_image=distro.image,
build_base=distro.build_base,
hostname=hostname,
archetype=arch.slug if arch else None,
service_config=spec.service_config,
nmap_os=resolved_nmap_os,
))
return deckies
@app.command()
def deploy(
mode: str = typer.Option("unihost", "--mode", "-m", help="Deployment mode: unihost | swarm"),
deckies: Optional[int] = typer.Option(None, "--deckies", "-n", help="Number of deckies to deploy (required without --config)", min=1),
interface: Optional[str] = typer.Option(None, "--interface", "-i", help="Host NIC (auto-detected if omitted)"),
subnet: Optional[str] = typer.Option(None, "--subnet", help="LAN subnet CIDR (auto-detected if omitted)"),
ip_start: Optional[str] = typer.Option(None, "--ip-start", help="First decky IP (auto if omitted)"),
services: Optional[str] = typer.Option(None, "--services", help="Comma-separated services, e.g. ssh,smb,rdp"),
randomize_services: bool = typer.Option(False, "--randomize-services", help="Assign random services to each decky"),
distro: Optional[str] = typer.Option(None, "--distro", help="Comma-separated distro slugs, e.g. debian,ubuntu22,rocky9"),
randomize_distros: bool = typer.Option(False, "--randomize-distros", help="Assign a random distro to each decky"),
log_target: Optional[str] = typer.Option(None, "--log-target", help="Forward logs to ip:port (e.g. 192.168.1.5:5140)"),
log_file: Optional[str] = typer.Option(None, "--log-file", help="Write RFC 5424 syslog to this path inside containers (e.g. /var/log/decnet/decnet.log)"),
archetype_name: Optional[str] = typer.Option(None, "--archetype", "-a", help="Machine archetype slug (e.g. linux-server, windows-workstation)"),
dry_run: bool = typer.Option(False, "--dry-run", help="Generate compose file without starting containers"),
no_cache: bool = typer.Option(False, "--no-cache", help="Force rebuild all images, ignoring Docker layer cache"),
ipvlan: bool = typer.Option(False, "--ipvlan", help="Use IPvlan L2 instead of MACVLAN (required on WiFi interfaces)"),
config_file: Optional[str] = typer.Option(None, "--config", "-c", help="Path to INI config file"),
) -> None:
"""Deploy deckies to the LAN."""
if mode not in ("unihost", "swarm"):
console.print("[red]--mode must be 'unihost' or 'swarm'[/]")
raise typer.Exit(1)
# ------------------------------------------------------------------ #
# Config-file path #
# ------------------------------------------------------------------ #
if config_file:
try:
ini = load_ini(config_file)
except FileNotFoundError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
# CLI flags override INI values when explicitly provided
iface = interface or ini.interface or detect_interface()
subnet_cidr = subnet or ini.subnet
effective_gateway = ini.gateway
if subnet_cidr is None:
subnet_cidr, effective_gateway = detect_subnet(iface)
elif effective_gateway is None:
_, effective_gateway = detect_subnet(iface)
host_ip = get_host_ip(iface)
console.print(f"[dim]Config:[/] {config_file} [dim]Interface:[/] {iface} "
f"[dim]Subnet:[/] {subnet_cidr} [dim]Gateway:[/] {effective_gateway} "
f"[dim]Host IP:[/] {host_ip}")
# Register bring-your-own services from INI before validation
if ini.custom_services:
from decnet.custom_service import CustomService
from decnet.services.registry import register_custom_service
for cs in ini.custom_services:
register_custom_service(
CustomService(
name=cs.name,
image=cs.image,
exec_cmd=cs.exec_cmd,
ports=cs.ports,
)
)
effective_log_target = log_target or ini.log_target
effective_log_file = log_file
decky_configs = _build_deckies_from_ini(
ini, subnet_cidr, effective_gateway, host_ip, randomize_services
)
# ------------------------------------------------------------------ #
# Classic CLI path #
# ------------------------------------------------------------------ #
else:
if deckies is None:
console.print("[red]--deckies is required when --config is not used.[/]")
raise typer.Exit(1)
services_list = [s.strip() for s in services.split(",")] if services else None
if services_list:
known = set(_all_service_names())
unknown = [s for s in services_list if s not in known]
if unknown:
console.print(f"[red]Unknown service(s): {unknown}. Available: {_all_service_names()}[/]")
raise typer.Exit(1)
# Resolve archetype if provided
arch: Archetype | None = None
if archetype_name:
try:
arch = get_archetype(archetype_name)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
if not services_list and not randomize_services and not arch:
console.print("[red]Specify --services, --archetype, or --randomize-services.[/]")
raise typer.Exit(1)
iface = interface or detect_interface()
if subnet is None:
subnet_cidr, effective_gateway = detect_subnet(iface)
else:
subnet_cidr = subnet
_, effective_gateway = detect_subnet(iface)
host_ip = get_host_ip(iface)
console.print(f"[dim]Interface:[/] {iface} [dim]Subnet:[/] {subnet_cidr} "
f"[dim]Gateway:[/] {effective_gateway} [dim]Host IP:[/] {host_ip}")
distros_list = [d.strip() for d in distro.split(",")] if distro else None
if distros_list:
try:
for slug in distros_list:
get_distro(slug)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
ips = allocate_ips(subnet_cidr, effective_gateway, host_ip, deckies, ip_start)
decky_configs = _build_deckies(
deckies, ips, services_list, randomize_services,
distros_explicit=distros_list, randomize_distros=randomize_distros,
archetype=arch,
)
effective_log_target = log_target
effective_log_file = log_file
config = DecnetConfig(
mode=mode,
interface=iface,
subnet=subnet_cidr,
gateway=effective_gateway,
deckies=decky_configs,
log_target=effective_log_target,
log_file=effective_log_file,
ipvlan=ipvlan,
)
if effective_log_target and not dry_run:
from decnet.logging.forwarder import probe_log_target
if not probe_log_target(effective_log_target):
console.print(f"[yellow]Warning: log target {effective_log_target} is unreachable. "
"Logs will be lost if it stays down.[/]")
from decnet.deployer import deploy as _deploy
_deploy(config, dry_run=dry_run, no_cache=no_cache)
@app.command()
def status() -> None:
"""Show running deckies and their status."""
from decnet.deployer import status as _status
_status()
@app.command()
def teardown(
all_: bool = typer.Option(False, "--all", help="Tear down all deckies and remove network"),
id_: Optional[str] = typer.Option(None, "--id", help="Tear down a specific decky by name"),
) -> None:
"""Stop and remove deckies."""
if not all_ and not id_:
console.print("[red]Specify --all or --id <name>.[/]")
raise typer.Exit(1)
from decnet.deployer import teardown as _teardown
_teardown(decky_id=id_)
@app.command(name="services")
def list_services() -> None:
"""List all registered honeypot service plugins."""
svcs = all_services()
table = Table(title="Available Services", show_lines=True)
table.add_column("Name", style="bold cyan")
table.add_column("Ports")
table.add_column("Image")
for name, svc in sorted(svcs.items()):
table.add_row(name, ", ".join(str(p) for p in svc.ports), svc.default_image)
console.print(table)
@app.command(name="distros")
def list_distros() -> None:
"""List all available OS distro profiles for deckies."""
table = Table(title="Available Distro Profiles", show_lines=True)
table.add_column("Slug", style="bold cyan")
table.add_column("Display Name")
table.add_column("Docker Image", style="dim")
for slug, profile in sorted(all_distros().items()):
table.add_row(slug, profile.display_name, profile.image)
console.print(table)
@app.command(name="correlate")
def correlate(
log_file: Optional[str] = typer.Option(None, "--log-file", "-f", help="Path to DECNET syslog file to analyse"),
min_deckies: int = typer.Option(2, "--min-deckies", "-m", help="Minimum number of distinct deckies an IP must touch to be reported"),
output: str = typer.Option("table", "--output", "-o", help="Output format: table | json | syslog"),
emit_syslog: bool = typer.Option(False, "--emit-syslog", help="Also print traversal events as RFC 5424 lines (for SIEM piping)"),
) -> None:
"""Analyse logs for cross-decky traversals and print the attacker movement graph."""
import sys
import json as _json
from pathlib import Path
from decnet.correlation.engine import CorrelationEngine
engine = CorrelationEngine()
if log_file:
path = Path(log_file)
if not path.exists():
console.print(f"[red]Log file not found: {log_file}[/]")
raise typer.Exit(1)
engine.ingest_file(path)
elif not sys.stdin.isatty():
for line in sys.stdin:
engine.ingest(line)
else:
console.print("[red]Provide --log-file or pipe log data via stdin.[/]")
raise typer.Exit(1)
traversals = engine.traversals(min_deckies)
if output == "json":
console.print_json(_json.dumps(engine.report_json(min_deckies), indent=2))
elif output == "syslog":
for line in engine.traversal_syslog_lines(min_deckies):
typer.echo(line)
else:
if not traversals:
console.print(
f"[yellow]No traversals detected "
f"(min_deckies={min_deckies}, events_indexed={engine.events_indexed}).[/]"
)
else:
console.print(engine.report_table(min_deckies))
console.print(
f"[dim]Parsed {engine.lines_parsed} lines · "
f"indexed {engine.events_indexed} events · "
f"{len(engine.all_attackers())} unique IPs · "
f"[bold]{len(traversals)}[/] traversal(s)[/]"
)
if emit_syslog:
for line in engine.traversal_syslog_lines(min_deckies):
typer.echo(line)
@app.command(name="archetypes")
def list_archetypes() -> None:
"""List all machine archetype profiles."""
table = Table(title="Machine Archetypes", show_lines=True)
table.add_column("Slug", style="bold cyan")
table.add_column("Display Name")
table.add_column("Default Services", style="green")
table.add_column("Description", style="dim")
for slug, arch in sorted(all_archetypes().items()):
table.add_row(
slug,
arch.display_name,
", ".join(arch.services),
arch.description,
)
console.print(table)

80
decnet/cli/__init__.py Normal file
View File

@@ -0,0 +1,80 @@
"""
DECNET CLI — entry point for all commands.
Usage:
decnet deploy --mode unihost --deckies 5 --randomize-services
decnet status
decnet teardown [--all | --id decky-01]
decnet services
Layout: each command module exports ``register(app)`` which attaches its
commands to the passed Typer app. ``__init__.py`` builds the root app,
calls every module's ``register`` in order, then runs the master-only
gate. The gate must fire LAST so it sees the fully-populated dispatch
table before filtering.
"""
from __future__ import annotations
import typer
from . import (
agent,
api,
db,
deploy,
forwarder,
inventory,
lifecycle,
listener,
profiler,
sniffer,
swarm,
swarmctl,
updater,
web,
workers,
)
from .gating import _gate_commands_by_mode
from .utils import console as console, log as log
app = typer.Typer(
name="decnet",
help="Deploy a deception network of honeypot deckies on your LAN.",
no_args_is_help=True,
)
# Order matches the old flat layout so `decnet --help` reads the same.
for _mod in (
api, swarmctl, agent, updater, listener, forwarder,
swarm,
deploy, lifecycle, workers, inventory,
web, profiler, sniffer, db,
):
_mod.register(app)
_gate_commands_by_mode(app)
# Backwards-compat re-exports. Tests and third-party tooling import these
# directly from ``decnet.cli``; the refactor must keep them resolvable.
from .db import _db_reset_mysql_async # noqa: E402,F401
from .gating import ( # noqa: E402,F401
MASTER_ONLY_COMMANDS,
MASTER_ONLY_GROUPS,
_agent_mode_active,
_require_master_mode,
)
from .utils import ( # noqa: E402,F401
_daemonize,
_http_request,
_is_running,
_kill_all_services,
_pid_dir,
_service_registry,
_spawn_detached,
_swarmctl_base_url,
)
if __name__ == "__main__": # pragma: no cover
app()

64
decnet/cli/agent.py Normal file
View File

@@ -0,0 +1,64 @@
from __future__ import annotations
import os
import pathlib as _pathlib
import sys as _sys
from typing import Optional
import typer
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def agent(
port: int = typer.Option(8765, "--port", help="Port for the worker agent"),
host: str = typer.Option("0.0.0.0", "--host", help="Bind address for the worker agent"), # nosec B104
agent_dir: Optional[str] = typer.Option(None, "--agent-dir", help="Worker cert bundle dir (default: ~/.decnet/agent, expanded under the running user's HOME — set this when running as sudo/root)"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
no_forwarder: bool = typer.Option(False, "--no-forwarder", help="Do not auto-spawn the log forwarder alongside the agent"),
) -> None:
"""Run the DECNET SWARM worker agent (requires a cert bundle in ~/.decnet/agent/).
By default, `decnet agent` auto-spawns `decnet forwarder` as a fully-
detached sibling process so worker logs start flowing to the master
without a second manual invocation. The forwarder survives agent
restarts and crashes — if it dies on its own, restart it manually
with `decnet forwarder --daemon …`. Pass --no-forwarder to skip.
"""
from decnet.agent import server as _agent_server
from decnet.env import DECNET_SWARM_MASTER_HOST, DECNET_INGEST_LOG_FILE
from decnet.swarm import pki as _pki
resolved_dir = _pathlib.Path(agent_dir) if agent_dir else _pki.DEFAULT_AGENT_DIR
if daemon:
log.info("agent daemonizing host=%s port=%d", host, port)
_utils._daemonize()
if not no_forwarder and DECNET_SWARM_MASTER_HOST:
fw_argv = [
_sys.executable, "-m", "decnet", "forwarder",
"--master-host", DECNET_SWARM_MASTER_HOST,
"--master-port", str(int(os.environ.get("DECNET_SWARM_SYSLOG_PORT", "6514"))),
"--agent-dir", str(resolved_dir),
"--log-file", str(DECNET_INGEST_LOG_FILE),
"--daemon",
]
try:
pid = _utils._spawn_detached(fw_argv, _utils._pid_dir() / "forwarder.pid")
log.info("agent auto-spawned forwarder pid=%d master=%s", pid, DECNET_SWARM_MASTER_HOST)
console.print(f"[dim]Auto-spawned forwarder (pid {pid}) → {DECNET_SWARM_MASTER_HOST}.[/]")
except Exception as e: # noqa: BLE001
log.warning("agent could not auto-spawn forwarder: %s", e)
console.print(f"[yellow]forwarder auto-spawn skipped: {e}[/]")
elif not no_forwarder:
log.info("agent skipping forwarder auto-spawn (DECNET_SWARM_MASTER_HOST unset)")
log.info("agent command invoked host=%s port=%d dir=%s", host, port, resolved_dir)
console.print(f"[green]Starting DECNET worker agent on {host}:{port} (mTLS)...[/]")
rc = _agent_server.run(host, port, agent_dir=resolved_dir)
if rc != 0:
raise typer.Exit(rc)

53
decnet/cli/api.py Normal file
View File

@@ -0,0 +1,53 @@
from __future__ import annotations
import os
import signal
import subprocess # nosec B404
import sys
import typer
from decnet.env import DECNET_API_HOST, DECNET_API_PORT, DECNET_INGEST_LOG_FILE
from . import utils as _utils
from .gating import _require_master_mode
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def api(
port: int = typer.Option(DECNET_API_PORT, "--port", help="Port for the backend API"),
host: str = typer.Option(DECNET_API_HOST, "--host", help="Host IP for the backend API"),
log_file: str = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", help="Path to the DECNET log file to monitor"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
workers: int = typer.Option(1, "--workers", "-w", min=1, help="Number of uvicorn worker processes"),
) -> None:
"""Run the DECNET API and Web Dashboard in standalone mode."""
_require_master_mode("api")
if daemon:
log.info("API daemonizing host=%s port=%d workers=%d", host, port, workers)
_utils._daemonize()
log.info("API command invoked host=%s port=%d workers=%d", host, port, workers)
console.print(f"[green]Starting DECNET API on {host}:{port} (workers={workers})...[/]")
_env: dict[str, str] = os.environ.copy()
_env["DECNET_INGEST_LOG_FILE"] = str(log_file)
_cmd = [sys.executable, "-m", "uvicorn", "decnet.web.api:app",
"--host", host, "--port", str(port), "--workers", str(workers)]
try:
proc = subprocess.Popen(_cmd, env=_env, start_new_session=True) # nosec B603 B404
try:
proc.wait()
except KeyboardInterrupt:
try:
os.killpg(proc.pid, signal.SIGTERM)
try:
proc.wait(timeout=10)
except subprocess.TimeoutExpired:
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
except ProcessLookupError:
pass
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start API. Ensure 'uvicorn' is installed in the current environment.[/]")

130
decnet/cli/db.py Normal file
View File

@@ -0,0 +1,130 @@
from __future__ import annotations
from typing import Optional
import typer
from rich.table import Table
from .utils import console, log
_DB_RESET_TABLES: tuple[str, ...] = (
# Order matters for DROP TABLE: child FKs first.
# - attacker_behavior FK-references attackers.
# - decky_shards FK-references swarm_hosts.
"attacker_behavior",
"attackers",
"logs",
"bounty",
"state",
"users",
"decky_shards",
"swarm_hosts",
)
async def _db_reset_mysql_async(dsn: str, mode: str, confirm: bool) -> None:
"""Inspect + (optionally) wipe a MySQL database. Pulled out of the CLI
wrapper so tests can drive it without spawning a Typer runner."""
from urllib.parse import urlparse
from sqlalchemy import text
from sqlalchemy.ext.asyncio import create_async_engine
db_name = urlparse(dsn).path.lstrip("/") or "(default)"
engine = create_async_engine(dsn)
try:
rows: dict[str, int] = {}
async with engine.connect() as conn:
for tbl in _DB_RESET_TABLES:
try:
result = await conn.execute(text(f"SELECT COUNT(*) FROM `{tbl}`")) # nosec B608
rows[tbl] = result.scalar() or 0
except Exception: # noqa: BLE001 — ProgrammingError for missing table varies by driver
rows[tbl] = -1
summary = Table(title=f"DECNET MySQL reset — database `{db_name}` (mode={mode})")
summary.add_column("Table", style="cyan")
summary.add_column("Rows", justify="right")
for tbl, count in rows.items():
summary.add_row(tbl, "[dim]missing[/]" if count < 0 else f"{count:,}")
console.print(summary)
if not confirm:
console.print(
"[yellow]Dry-run only. Re-run with [bold]--i-know-what-im-doing[/] "
"to actually execute.[/]"
)
return
async with engine.begin() as conn:
await conn.execute(text("SET FOREIGN_KEY_CHECKS = 0"))
for tbl in _DB_RESET_TABLES:
if rows.get(tbl, -1) < 0:
continue
if mode == "truncate":
await conn.execute(text(f"TRUNCATE TABLE `{tbl}`"))
console.print(f"[green]✓ TRUNCATE {tbl}[/]")
else:
await conn.execute(text(f"DROP TABLE `{tbl}`"))
console.print(f"[green]✓ DROP TABLE {tbl}[/]")
await conn.execute(text("SET FOREIGN_KEY_CHECKS = 1"))
console.print(f"[bold green]Done. Database `{db_name}` reset ({mode}).[/]")
finally:
await engine.dispose()
def register(app: typer.Typer) -> None:
@app.command(name="db-reset")
def db_reset(
i_know: bool = typer.Option(
False,
"--i-know-what-im-doing",
help="Required to actually execute. Without it, the command runs in dry-run mode.",
),
mode: str = typer.Option(
"truncate",
"--mode",
help="truncate (wipe rows, keep schema) | drop-tables (DROP TABLE for each DECNET table)",
),
url: Optional[str] = typer.Option(
None,
"--url",
help="Override DECNET_DB_URL for this invocation (e.g. when cleanup needs admin creds).",
),
) -> None:
"""Wipe the MySQL database used by the DECNET dashboard.
Destructive. Runs dry by default — pass --i-know-what-im-doing to commit.
Only supported against MySQL; refuses to operate on SQLite.
"""
import asyncio
import os
if mode not in ("truncate", "drop-tables"):
console.print(f"[red]Invalid --mode '{mode}'. Expected: truncate | drop-tables.[/]")
raise typer.Exit(2)
db_type = os.environ.get("DECNET_DB_TYPE", "sqlite").lower()
if db_type != "mysql":
console.print(
f"[red]db-reset is MySQL-only (DECNET_DB_TYPE='{db_type}'). "
f"For SQLite, just delete the decnet.db file.[/]"
)
raise typer.Exit(2)
dsn = url or os.environ.get("DECNET_DB_URL")
if not dsn:
from decnet.web.db.mysql.database import build_mysql_url
try:
dsn = build_mysql_url()
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(2) from e
log.info("db-reset invoked mode=%s confirm=%s", mode, i_know)
try:
asyncio.run(_db_reset_mysql_async(dsn, mode=mode, confirm=i_know))
except Exception as e: # noqa: BLE001
console.print(f"[red]db-reset failed: {e}[/]")
raise typer.Exit(1) from e

307
decnet/cli/deploy.py Normal file
View File

@@ -0,0 +1,307 @@
from __future__ import annotations
from typing import Optional
import typer
from rich.table import Table
from decnet.archetypes import Archetype, get_archetype
from decnet.config import DecnetConfig
from decnet.distros import get_distro
from decnet.env import DECNET_API_HOST, DECNET_INGEST_LOG_FILE
from decnet.fleet import all_service_names, build_deckies, build_deckies_from_ini
from decnet.ini_loader import load_ini
from decnet.network import detect_interface, detect_subnet, allocate_ips, get_host_ip
from . import utils as _utils
from .gating import _require_master_mode
from .utils import console, log
def _deploy_swarm(config: "DecnetConfig", *, dry_run: bool, no_cache: bool) -> None:
"""Shard deckies round-robin across enrolled workers and POST to swarmctl."""
base = _utils._swarmctl_base_url(None)
resp = _utils._http_request("GET", base + "/swarm/hosts?host_status=enrolled")
enrolled = resp.json()
resp2 = _utils._http_request("GET", base + "/swarm/hosts?host_status=active")
active = resp2.json()
workers = [*enrolled, *active]
if not workers:
console.print("[red]No enrolled workers — run `decnet swarm enroll ...` first.[/]")
raise typer.Exit(1)
assigned: list = []
for idx, d in enumerate(config.deckies):
target = workers[idx % len(workers)]
assigned.append(d.model_copy(update={"host_uuid": target["uuid"]}))
config = config.model_copy(update={"deckies": assigned})
body = {"config": config.model_dump(mode="json"), "dry_run": dry_run, "no_cache": no_cache}
console.print(f"[cyan]Dispatching {len(config.deckies)} deckies across {len(workers)} worker(s)...[/]")
resp3 = _utils._http_request("POST", base + "/swarm/deploy", json_body=body, timeout=900.0)
results = resp3.json().get("results", [])
table = Table(title="SWARM deploy results")
for col in ("worker", "host_uuid", "ok", "detail"):
table.add_column(col)
any_failed = False
for r in results:
ok = bool(r.get("ok"))
if not ok:
any_failed = True
detail = r.get("detail")
if isinstance(detail, dict):
detail = detail.get("status") or "ok"
table.add_row(
str(r.get("host_name") or ""),
str(r.get("host_uuid") or ""),
"[green]yes[/]" if ok else "[red]no[/]",
str(detail)[:80],
)
console.print(table)
if any_failed:
raise typer.Exit(1)
def register(app: typer.Typer) -> None:
@app.command()
def deploy(
mode: str = typer.Option("unihost", "--mode", "-m", help="Deployment mode: unihost | swarm"),
deckies: Optional[int] = typer.Option(None, "--deckies", "-n", help="Number of deckies to deploy (required without --config)", min=1),
interface: Optional[str] = typer.Option(None, "--interface", "-i", help="Host NIC (auto-detected if omitted)"),
subnet: Optional[str] = typer.Option(None, "--subnet", help="LAN subnet CIDR (auto-detected if omitted)"),
ip_start: Optional[str] = typer.Option(None, "--ip-start", help="First decky IP (auto if omitted)"),
services: Optional[str] = typer.Option(None, "--services", help="Comma-separated services, e.g. ssh,smb,rdp"),
randomize_services: bool = typer.Option(False, "--randomize-services", help="Assign random services to each decky"),
distro: Optional[str] = typer.Option(None, "--distro", help="Comma-separated distro slugs, e.g. debian,ubuntu22,rocky9"),
randomize_distros: bool = typer.Option(False, "--randomize-distros", help="Assign a random distro to each decky"),
log_file: Optional[str] = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", help="Host path for the collector to write RFC 5424 logs (e.g. /var/log/decnet/decnet.log)"),
archetype_name: Optional[str] = typer.Option(None, "--archetype", "-a", help="Machine archetype slug (e.g. linux-server, windows-workstation)"),
mutate_interval: Optional[int] = typer.Option(30, "--mutate-interval", help="Automatically rotate services every N minutes"),
dry_run: bool = typer.Option(False, "--dry-run", help="Generate compose file without starting containers"),
no_cache: bool = typer.Option(False, "--no-cache", help="Force rebuild all images, ignoring Docker layer cache"),
parallel: bool = typer.Option(False, "--parallel", help="Build all images concurrently (enables BuildKit, separates build from up)"),
ipvlan: bool = typer.Option(False, "--ipvlan", help="Use IPvlan L2 instead of MACVLAN (required on WiFi interfaces)"),
config_file: Optional[str] = typer.Option(None, "--config", "-c", help="Path to INI config file"),
api: bool = typer.Option(False, "--api", help="Start the FastAPI backend to ingest and serve logs"),
api_port: int = typer.Option(8000, "--api-port", help="Port for the backend API"),
daemon: bool = typer.Option(False, "--daemon", help="Detach to background as a daemon process"),
) -> None:
"""Deploy deckies to the LAN."""
import os
import subprocess # nosec B404
import sys
from pathlib import Path as _Path
_require_master_mode("deploy")
if daemon:
log.info("deploy daemonizing mode=%s deckies=%s", mode, deckies)
_utils._daemonize()
log.info("deploy command invoked mode=%s deckies=%s dry_run=%s", mode, deckies, dry_run)
if mode not in ("unihost", "swarm"):
console.print("[red]--mode must be 'unihost' or 'swarm'[/]")
raise typer.Exit(1)
if config_file:
try:
ini = load_ini(config_file)
except FileNotFoundError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
iface = interface or ini.interface or detect_interface()
subnet_cidr = subnet or ini.subnet
effective_gateway = ini.gateway
if subnet_cidr is None:
subnet_cidr, effective_gateway = detect_subnet(iface)
elif effective_gateway is None:
_, effective_gateway = detect_subnet(iface)
host_ip = get_host_ip(iface)
console.print(f"[dim]Config:[/] {config_file} [dim]Interface:[/] {iface} "
f"[dim]Subnet:[/] {subnet_cidr} [dim]Gateway:[/] {effective_gateway} "
f"[dim]Host IP:[/] {host_ip}")
if ini.custom_services:
from decnet.custom_service import CustomService
from decnet.services.registry import register_custom_service
for cs in ini.custom_services:
register_custom_service(
CustomService(
name=cs.name,
image=cs.image,
exec_cmd=cs.exec_cmd,
ports=cs.ports,
)
)
effective_log_file = log_file
try:
decky_configs = build_deckies_from_ini(
ini, subnet_cidr, effective_gateway, host_ip, randomize_services, cli_mutate_interval=mutate_interval
)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
else:
if deckies is None:
console.print("[red]--deckies is required when --config is not used.[/]")
raise typer.Exit(1)
services_list = [s.strip() for s in services.split(",")] if services else None
if services_list:
known = set(all_service_names())
unknown = [s for s in services_list if s not in known]
if unknown:
console.print(f"[red]Unknown service(s): {unknown}. Available: {all_service_names()}[/]")
raise typer.Exit(1)
arch: Archetype | None = None
if archetype_name:
try:
arch = get_archetype(archetype_name)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
if not services_list and not randomize_services and not arch:
console.print("[red]Specify --services, --archetype, or --randomize-services.[/]")
raise typer.Exit(1)
iface = interface or detect_interface()
if subnet is None:
subnet_cidr, effective_gateway = detect_subnet(iface)
else:
subnet_cidr = subnet
_, effective_gateway = detect_subnet(iface)
host_ip = get_host_ip(iface)
console.print(f"[dim]Interface:[/] {iface} [dim]Subnet:[/] {subnet_cidr} "
f"[dim]Gateway:[/] {effective_gateway} [dim]Host IP:[/] {host_ip}")
distros_list = [d.strip() for d in distro.split(",")] if distro else None
if distros_list:
try:
for slug in distros_list:
get_distro(slug)
except ValueError as e:
console.print(f"[red]{e}[/]")
raise typer.Exit(1)
ips = allocate_ips(subnet_cidr, effective_gateway, host_ip, deckies, ip_start)
decky_configs = build_deckies(
deckies, ips, services_list, randomize_services,
distros_explicit=distros_list, randomize_distros=randomize_distros,
archetype=arch, mutate_interval=mutate_interval,
)
effective_log_file = log_file
if api and not effective_log_file:
effective_log_file = os.path.join(os.getcwd(), "decnet.log")
console.print(f"[cyan]API mode enabled: defaulting log-file to {effective_log_file}[/]")
config = DecnetConfig(
mode=mode,
interface=iface,
subnet=subnet_cidr,
gateway=effective_gateway,
deckies=decky_configs,
log_file=effective_log_file,
ipvlan=ipvlan,
mutate_interval=mutate_interval,
)
log.debug("deploy: config built deckies=%d interface=%s subnet=%s", len(config.deckies), config.interface, config.subnet)
if mode == "swarm":
_deploy_swarm(config, dry_run=dry_run, no_cache=no_cache)
if dry_run:
log.info("deploy: swarm dry-run complete, no workers dispatched")
else:
log.info("deploy: swarm deployment complete deckies=%d", len(config.deckies))
return
from decnet.engine import deploy as _deploy
_deploy(config, dry_run=dry_run, no_cache=no_cache, parallel=parallel)
if dry_run:
log.info("deploy: dry-run complete, no containers started")
else:
log.info("deploy: deployment complete deckies=%d", len(config.deckies))
if mutate_interval is not None and not dry_run:
console.print(f"[green]Starting DECNET Mutator watcher in the background (interval: {mutate_interval}m)...[/]")
try:
subprocess.Popen( # nosec B603
[sys.executable, "-m", "decnet.cli", "mutate", "--watch"],
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
start_new_session=True,
)
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start mutator watcher.[/]")
if effective_log_file and not dry_run and not api:
_collector_err = _Path(effective_log_file).with_suffix(".collector.log")
console.print(f"[bold cyan]Starting log collector[/] → {effective_log_file}")
subprocess.Popen( # nosec B603
[sys.executable, "-m", "decnet.cli", "collect", "--log-file", str(effective_log_file)],
stdin=subprocess.DEVNULL,
stdout=open(_collector_err, "a"),
stderr=subprocess.STDOUT,
start_new_session=True,
)
if api and not dry_run:
console.print(f"[green]Starting DECNET API on port {api_port}...[/]")
_env: dict[str, str] = os.environ.copy()
_env["DECNET_INGEST_LOG_FILE"] = str(effective_log_file or "")
try:
subprocess.Popen( # nosec B603
[sys.executable, "-m", "uvicorn", "decnet.web.api:app", "--host", DECNET_API_HOST, "--port", str(api_port)],
env=_env,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT
)
console.print(f"[dim]API running at http://{DECNET_API_HOST}:{api_port}[/]")
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start API. Ensure 'uvicorn' is installed in the current environment.[/]")
if effective_log_file and not dry_run:
console.print("[bold cyan]Starting DECNET-PROBER[/] (auto-discovers attackers from log stream)")
try:
subprocess.Popen( # nosec B603
[sys.executable, "-m", "decnet.cli", "probe", "--daemon", "--log-file", str(effective_log_file)],
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
start_new_session=True,
)
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start DECNET-PROBER.[/]")
if effective_log_file and not dry_run:
console.print("[bold cyan]Starting DECNET-PROFILER[/] (builds attacker profiles from log stream)")
try:
subprocess.Popen( # nosec B603
[sys.executable, "-m", "decnet.cli", "profiler", "--daemon"],
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
start_new_session=True,
)
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start DECNET-PROFILER.[/]")
if effective_log_file and not dry_run:
console.print("[bold cyan]Starting DECNET-SNIFFER[/] (passive network capture)")
try:
subprocess.Popen( # nosec B603
[sys.executable, "-m", "decnet.cli", "sniffer", "--daemon", "--log-file", str(effective_log_file)],
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
start_new_session=True,
)
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start DECNET-SNIFFER.[/]")

74
decnet/cli/forwarder.py Normal file
View File

@@ -0,0 +1,74 @@
from __future__ import annotations
import asyncio
import pathlib
import signal
from typing import Optional
import typer
from decnet.env import DECNET_INGEST_LOG_FILE
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def forwarder(
master_host: Optional[str] = typer.Option(None, "--master-host", help="Master listener hostname/IP (default: $DECNET_SWARM_MASTER_HOST)"),
master_port: int = typer.Option(6514, "--master-port", help="Master listener TCP port (RFC 5425 default 6514)"),
log_file: Optional[str] = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", help="Local RFC 5424 file to tail and forward"),
agent_dir: Optional[str] = typer.Option(None, "--agent-dir", help="Worker cert bundle dir (default: ~/.decnet/agent)"),
state_db: Optional[str] = typer.Option(None, "--state-db", help="Forwarder offset SQLite path (default: <agent_dir>/forwarder.db)"),
poll_interval: float = typer.Option(0.5, "--poll-interval", help="Seconds between log file stat checks"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Run the worker-side syslog-over-TLS forwarder (RFC 5425, mTLS to master:6514)."""
from decnet.env import DECNET_SWARM_MASTER_HOST
from decnet.swarm import pki
from decnet.swarm.log_forwarder import ForwarderConfig, run_forwarder
resolved_host = master_host or DECNET_SWARM_MASTER_HOST
if not resolved_host:
console.print("[red]--master-host is required (or set DECNET_SWARM_MASTER_HOST).[/]")
raise typer.Exit(2)
resolved_agent_dir = pathlib.Path(agent_dir) if agent_dir else pki.DEFAULT_AGENT_DIR
if not (resolved_agent_dir / "worker.crt").exists():
console.print(f"[red]No worker cert bundle at {resolved_agent_dir} — enroll from the master first.[/]")
raise typer.Exit(2)
if not log_file:
console.print("[red]--log-file is required.[/]")
raise typer.Exit(2)
cfg = ForwarderConfig(
log_path=pathlib.Path(log_file),
master_host=resolved_host,
master_port=master_port,
agent_dir=resolved_agent_dir,
state_db=pathlib.Path(state_db) if state_db else None,
)
if daemon:
log.info("forwarder daemonizing master=%s:%d log=%s", resolved_host, master_port, log_file)
_utils._daemonize()
log.info("forwarder command invoked master=%s:%d log=%s", resolved_host, master_port, log_file)
console.print(f"[green]Starting DECNET forwarder → {resolved_host}:{master_port} (mTLS)...[/]")
async def _main() -> None:
stop = asyncio.Event()
loop = asyncio.get_running_loop()
for sig in (signal.SIGTERM, signal.SIGINT):
try:
loop.add_signal_handler(sig, stop.set)
except (NotImplementedError, RuntimeError): # pragma: no cover
pass
await run_forwarder(cfg, poll_interval=poll_interval, stop_event=stop)
try:
asyncio.run(_main())
except KeyboardInterrupt:
pass

71
decnet/cli/gating.py Normal file
View File

@@ -0,0 +1,71 @@
"""Role-based CLI gating.
MAINTAINERS: when you add a new Typer command (or add_typer group) that is
master-only, register its name in MASTER_ONLY_COMMANDS / MASTER_ONLY_GROUPS
below. The gate is the only thing that:
(a) hides the command from `decnet --help` on worker hosts, and
(b) prevents a misconfigured worker from invoking master-side logic.
Forgetting to register a new command is a role-boundary bug. Grep for
MASTER_ONLY when touching command registration.
Worker-legitimate commands (NOT in these sets): agent, updater, forwarder,
status, collect, probe, sniffer. Agents run deckies locally and should be
able to inspect them + run the per-host microservices (collector streams
container logs, prober characterizes attackers hitting this host, sniffer
captures traffic). Mutator and Profiler stay master-only: the mutator
orchestrates respawns across the swarm; the profiler rebuilds attacker
profiles against the master DB (no per-host DB exists).
"""
from __future__ import annotations
import os
import typer
from .utils import console
MASTER_ONLY_COMMANDS: frozenset[str] = frozenset({
"api", "swarmctl", "deploy", "redeploy", "teardown",
"mutate", "listener", "profiler",
"services", "distros", "correlate", "archetypes", "web",
"db-reset",
})
MASTER_ONLY_GROUPS: frozenset[str] = frozenset({"swarm"})
def _agent_mode_active() -> bool:
"""True when the host is configured as an agent AND master commands are
disallowed (the default for agents). Workers overriding this explicitly
set DECNET_DISALLOW_MASTER=false to opt into hybrid use."""
mode = os.environ.get("DECNET_MODE", "master").lower()
disallow = os.environ.get("DECNET_DISALLOW_MASTER", "true").lower() == "true"
return mode == "agent" and disallow
def _require_master_mode(command_name: str) -> None:
"""Defence-in-depth: called at the top of every master-only command body.
The registration-time gate in _gate_commands_by_mode() already hides
these commands from Typer's dispatch table, but this check protects
against direct function imports (e.g. from tests or third-party tools)
that would bypass Typer entirely."""
if _agent_mode_active():
console.print(
f"[red]`decnet {command_name}` is a master-only command; this host "
f"is configured as an agent (DECNET_MODE=agent).[/]"
)
raise typer.Exit(1)
def _gate_commands_by_mode(_app: typer.Typer) -> None:
if not _agent_mode_active():
return
_app.registered_commands = [
c for c in _app.registered_commands
if (c.name or c.callback.__name__) not in MASTER_ONLY_COMMANDS
]
_app.registered_groups = [
g for g in _app.registered_groups
if g.name not in MASTER_ONLY_GROUPS
]

52
decnet/cli/inventory.py Normal file
View File

@@ -0,0 +1,52 @@
from __future__ import annotations
import typer
from rich.table import Table
from decnet.archetypes import all_archetypes
from decnet.distros import all_distros
from decnet.services.registry import all_services
from .utils import console
def register(app: typer.Typer) -> None:
@app.command(name="services")
def list_services() -> None:
"""List all registered honeypot service plugins."""
svcs = all_services()
table = Table(title="Available Services", show_lines=True)
table.add_column("Name", style="bold cyan")
table.add_column("Ports")
table.add_column("Image")
for name, svc in sorted(svcs.items()):
table.add_row(name, ", ".join(str(p) for p in svc.ports), svc.default_image)
console.print(table)
@app.command(name="distros")
def list_distros() -> None:
"""List all available OS distro profiles for deckies."""
table = Table(title="Available Distro Profiles", show_lines=True)
table.add_column("Slug", style="bold cyan")
table.add_column("Display Name")
table.add_column("Docker Image", style="dim")
for slug, profile in sorted(all_distros().items()):
table.add_row(slug, profile.display_name, profile.image)
console.print(table)
@app.command(name="archetypes")
def list_archetypes() -> None:
"""List all machine archetype profiles."""
table = Table(title="Machine Archetypes", show_lines=True)
table.add_column("Slug", style="bold cyan")
table.add_column("Display Name")
table.add_column("Default Services", style="green")
table.add_column("Description", style="dim")
for slug, arch in sorted(all_archetypes().items()):
table.add_row(
slug,
arch.display_name,
", ".join(arch.services),
arch.description,
)
console.print(table)

97
decnet/cli/lifecycle.py Normal file
View File

@@ -0,0 +1,97 @@
from __future__ import annotations
import subprocess # nosec B404
from typing import Optional
import typer
from rich.table import Table
from decnet.env import DECNET_INGEST_LOG_FILE
from . import utils as _utils
from .gating import _agent_mode_active, _require_master_mode
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def redeploy(
log_file: str = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", "-f", help="Path to the DECNET log file"),
) -> None:
"""Check running DECNET services and relaunch any that are down."""
log.info("redeploy: checking services")
registry = _utils._service_registry(str(log_file))
table = Table(title="DECNET Services", show_lines=True)
table.add_column("Service", style="bold cyan")
table.add_column("Status")
table.add_column("PID", style="dim")
table.add_column("Action")
relaunched = 0
for name, match_fn, launch_args in registry:
pid = _utils._is_running(match_fn)
if pid is not None:
table.add_row(name, "[green]UP[/]", str(pid), "")
else:
try:
subprocess.Popen( # nosec B603
launch_args,
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
start_new_session=True,
)
table.add_row(name, "[red]DOWN[/]", "", "[green]relaunched[/]")
relaunched += 1
except (FileNotFoundError, subprocess.SubprocessError) as exc:
table.add_row(name, "[red]DOWN[/]", "", f"[red]failed: {exc}[/]")
console.print(table)
if relaunched:
console.print(f"[green]{relaunched} service(s) relaunched.[/]")
else:
console.print("[green]All services running.[/]")
@app.command()
def status() -> None:
"""Show running deckies and their status."""
log.info("status command invoked")
from decnet.engine import status as _status
_status()
registry = _utils._service_registry(str(DECNET_INGEST_LOG_FILE))
if _agent_mode_active():
registry = [r for r in registry if r[0] not in {"Mutator", "Profiler", "API"}]
svc_table = Table(title="DECNET Services", show_lines=True)
svc_table.add_column("Service", style="bold cyan")
svc_table.add_column("Status")
svc_table.add_column("PID", style="dim")
for name, match_fn, _launch_args in registry:
pid = _utils._is_running(match_fn)
if pid is not None:
svc_table.add_row(name, "[green]UP[/]", str(pid))
else:
svc_table.add_row(name, "[red]DOWN[/]", "")
console.print(svc_table)
@app.command()
def teardown(
all_: bool = typer.Option(False, "--all", help="Tear down all deckies and remove network"),
id_: Optional[str] = typer.Option(None, "--id", help="Tear down a specific decky by name"),
) -> None:
"""Stop and remove deckies."""
_require_master_mode("teardown")
if not all_ and not id_:
console.print("[red]Specify --all or --id <name>.[/]")
raise typer.Exit(1)
log.info("teardown command invoked all=%s id=%s", all_, id_)
from decnet.engine import teardown as _teardown
_teardown(decky_id=id_)
log.info("teardown complete all=%s id=%s", all_, id_)
if all_:
_utils._kill_all_services()

57
decnet/cli/listener.py Normal file
View File

@@ -0,0 +1,57 @@
from __future__ import annotations
import asyncio
import pathlib
import signal
from typing import Optional
import typer
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def listener(
bind_host: str = typer.Option("0.0.0.0", "--host", help="Bind address for the master syslog-TLS listener"), # nosec B104
bind_port: int = typer.Option(6514, "--port", help="Listener TCP port (RFC 5425 default 6514)"),
log_path: Optional[str] = typer.Option(None, "--log-path", help="RFC 5424 forensic sink (default: ./master.log)"),
json_path: Optional[str] = typer.Option(None, "--json-path", help="Parsed-JSON ingest sink (default: ./master.json)"),
ca_dir: Optional[str] = typer.Option(None, "--ca-dir", help="DECNET CA dir (default: ~/.decnet/ca)"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Run the master-side syslog-over-TLS listener (RFC 5425, mTLS)."""
from decnet.swarm import pki
from decnet.swarm.log_listener import ListenerConfig, run_listener
resolved_ca_dir = pathlib.Path(ca_dir) if ca_dir else pki.DEFAULT_CA_DIR
resolved_log = pathlib.Path(log_path) if log_path else pathlib.Path("master.log")
resolved_json = pathlib.Path(json_path) if json_path else pathlib.Path("master.json")
cfg = ListenerConfig(
log_path=resolved_log, json_path=resolved_json,
bind_host=bind_host, bind_port=bind_port, ca_dir=resolved_ca_dir,
)
if daemon:
log.info("listener daemonizing host=%s port=%d", bind_host, bind_port)
_utils._daemonize()
log.info("listener command invoked host=%s port=%d", bind_host, bind_port)
console.print(f"[green]Starting DECNET log listener on {bind_host}:{bind_port} (mTLS)...[/]")
async def _main() -> None:
stop = asyncio.Event()
loop = asyncio.get_running_loop()
for sig in (signal.SIGTERM, signal.SIGINT):
try:
loop.add_signal_handler(sig, stop.set)
except (NotImplementedError, RuntimeError): # pragma: no cover
pass
await run_listener(cfg, stop_event=stop)
try:
asyncio.run(_main())
except KeyboardInterrupt:
pass

34
decnet/cli/profiler.py Normal file
View File

@@ -0,0 +1,34 @@
from __future__ import annotations
import typer
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command(name="profiler")
def profiler_cmd(
interval: int = typer.Option(30, "--interval", "-i", help="Seconds between profile rebuild cycles"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Run the attacker profiler as a standalone microservice."""
import asyncio
from decnet.profiler import attacker_profile_worker
from decnet.web.dependencies import repo
if daemon:
log.info("profiler daemonizing interval=%d", interval)
_utils._daemonize()
log.info("profiler starting interval=%d", interval)
console.print(f"[bold cyan]Profiler starting[/] (interval: {interval}s)")
async def _run() -> None:
await repo.initialize()
await attacker_profile_worker(repo, interval=interval)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Profiler stopped.[/]")

31
decnet/cli/sniffer.py Normal file
View File

@@ -0,0 +1,31 @@
from __future__ import annotations
import typer
from decnet.env import DECNET_INGEST_LOG_FILE
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command(name="sniffer")
def sniffer_cmd(
log_file: str = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", "-f", help="Path to write captured syslog + JSON records"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Run the network sniffer as a standalone microservice."""
import asyncio
from decnet.sniffer import sniffer_worker
if daemon:
log.info("sniffer daemonizing log_file=%s", log_file)
_utils._daemonize()
log.info("sniffer starting log_file=%s", log_file)
console.print(f"[bold cyan]Sniffer starting[/] → {log_file}")
try:
asyncio.run(sniffer_worker(log_file))
except KeyboardInterrupt:
console.print("\n[yellow]Sniffer stopped.[/]")

346
decnet/cli/swarm.py Normal file
View File

@@ -0,0 +1,346 @@
"""`decnet swarm ...` — master-side operator commands (HTTP to local swarmctl)."""
from __future__ import annotations
from typing import Optional
import typer
from rich.table import Table
from . import utils as _utils
from .utils import console
def register(app: typer.Typer) -> None:
swarm_app = typer.Typer(
name="swarm",
help="Manage swarm workers (enroll, list, decommission). Requires `decnet swarmctl` running.",
no_args_is_help=True,
)
app.add_typer(swarm_app, name="swarm")
@swarm_app.command("enroll")
def swarm_enroll(
name: str = typer.Option(..., "--name", help="Short hostname for the worker (also the cert CN)"),
address: str = typer.Option(..., "--address", help="IP or DNS the master uses to reach the worker"),
agent_port: int = typer.Option(8765, "--agent-port", help="Worker agent TCP port"),
sans: Optional[str] = typer.Option(None, "--sans", help="Comma-separated extra SANs for the worker cert"),
notes: Optional[str] = typer.Option(None, "--notes", help="Free-form operator notes"),
out_dir: Optional[str] = typer.Option(None, "--out-dir", help="Write the bundle (ca.crt/worker.crt/worker.key) to this dir for scp"),
updater: bool = typer.Option(False, "--updater", help="Also issue an updater-identity cert (CN=updater@<name>) for the remote self-updater"),
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL (default: 127.0.0.1:8770)"),
) -> None:
"""Issue a mTLS bundle for a new worker and register it in the swarm."""
import pathlib as _pathlib
body: dict = {"name": name, "address": address, "agent_port": agent_port}
if sans:
body["sans"] = [s.strip() for s in sans.split(",") if s.strip()]
if notes:
body["notes"] = notes
if updater:
body["issue_updater_bundle"] = True
resp = _utils._http_request("POST", _utils._swarmctl_base_url(url) + "/swarm/enroll", json_body=body)
data = resp.json()
console.print(f"[green]Enrolled worker:[/] {data['name']} "
f"[dim]uuid=[/]{data['host_uuid']} "
f"[dim]fingerprint=[/]{data['fingerprint']}")
if data.get("updater"):
console.print(f"[green] + updater identity[/] "
f"[dim]fingerprint=[/]{data['updater']['fingerprint']}")
if out_dir:
target = _pathlib.Path(out_dir).expanduser()
target.mkdir(parents=True, exist_ok=True)
(target / "ca.crt").write_text(data["ca_cert_pem"])
(target / "worker.crt").write_text(data["worker_cert_pem"])
(target / "worker.key").write_text(data["worker_key_pem"])
for leaf in ("worker.key",):
try:
(target / leaf).chmod(0o600)
except OSError:
pass
console.print(f"[cyan]Agent bundle written to[/] {target}")
if data.get("updater"):
upd_target = target.parent / f"{target.name}-updater"
upd_target.mkdir(parents=True, exist_ok=True)
(upd_target / "ca.crt").write_text(data["ca_cert_pem"])
(upd_target / "updater.crt").write_text(data["updater"]["updater_cert_pem"])
(upd_target / "updater.key").write_text(data["updater"]["updater_key_pem"])
try:
(upd_target / "updater.key").chmod(0o600)
except OSError:
pass
console.print(f"[cyan]Updater bundle written to[/] {upd_target}")
console.print("[dim]Ship the agent dir to ~/.decnet/agent/ and the updater dir to ~/.decnet/updater/ on the worker.[/]")
else:
console.print("[dim]Ship this directory to the worker at ~/.decnet/agent/ (or wherever `decnet agent --agent-dir` points).[/]")
else:
console.print("[yellow]No --out-dir given — bundle PEMs are in the JSON response; persist them before leaving this shell.[/]")
@swarm_app.command("list")
def swarm_list(
host_status: Optional[str] = typer.Option(None, "--status", help="Filter by status (enrolled|active|unreachable|decommissioned)"),
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL"),
) -> None:
"""List enrolled workers."""
q = f"?host_status={host_status}" if host_status else ""
resp = _utils._http_request("GET", _utils._swarmctl_base_url(url) + "/swarm/hosts" + q)
rows = resp.json()
if not rows:
console.print("[dim]No workers enrolled.[/]")
return
table = Table(title="DECNET swarm workers")
for col in ("name", "address", "port", "status", "last heartbeat", "enrolled"):
table.add_column(col)
for r in rows:
table.add_row(
r.get("name") or "",
r.get("address") or "",
str(r.get("agent_port") or ""),
r.get("status") or "",
str(r.get("last_heartbeat") or ""),
str(r.get("enrolled_at") or ""),
)
console.print(table)
@swarm_app.command("check")
def swarm_check(
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL"),
json_out: bool = typer.Option(False, "--json", help="Emit JSON instead of a table"),
) -> None:
"""Actively probe every enrolled worker and refresh status + last_heartbeat."""
resp = _utils._http_request("POST", _utils._swarmctl_base_url(url) + "/swarm/check", timeout=60.0)
payload = resp.json()
results = payload.get("results", [])
if json_out:
console.print_json(data=payload)
return
if not results:
console.print("[dim]No workers enrolled.[/]")
return
table = Table(title="DECNET swarm check")
for col in ("name", "address", "reachable", "detail"):
table.add_column(col)
for r in results:
reachable = r.get("reachable")
mark = "[green]yes[/]" if reachable else "[red]no[/]"
detail = r.get("detail")
detail_str = ""
if isinstance(detail, dict):
detail_str = detail.get("status") or ", ".join(f"{k}={v}" for k, v in detail.items())
elif detail is not None:
detail_str = str(detail)
table.add_row(
r.get("name") or "",
r.get("address") or "",
mark,
detail_str,
)
console.print(table)
@swarm_app.command("update")
def swarm_update(
host: Optional[str] = typer.Option(None, "--host", help="Target worker (name or UUID). Omit with --all."),
all_hosts: bool = typer.Option(False, "--all", help="Push to every enrolled worker."),
include_self: bool = typer.Option(False, "--include-self", help="Also push to each updater's /update-self after a successful agent update."),
root: Optional[str] = typer.Option(None, "--root", help="Source tree to tar (default: CWD)."),
exclude: list[str] = typer.Option([], "--exclude", help="Additional exclude glob. Repeatable."),
updater_port: int = typer.Option(8766, "--updater-port", help="Port the workers' updater listens on."),
dry_run: bool = typer.Option(False, "--dry-run", help="Build the tarball and print stats; no network."),
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL."),
) -> None:
"""Push the current working tree to workers' self-updaters (with auto-rollback on failure)."""
import asyncio
import pathlib as _pathlib
from decnet.swarm.tar_tree import tar_working_tree, detect_git_sha
from decnet.swarm.updater_client import UpdaterClient
if not (host or all_hosts):
console.print("[red]Supply --host <name> or --all.[/]")
raise typer.Exit(2)
if host and all_hosts:
console.print("[red]--host and --all are mutually exclusive.[/]")
raise typer.Exit(2)
base = _utils._swarmctl_base_url(url)
resp = _utils._http_request("GET", base + "/swarm/hosts")
rows = resp.json()
if host:
targets = [r for r in rows if r.get("name") == host or r.get("uuid") == host]
if not targets:
console.print(f"[red]No enrolled worker matching '{host}'.[/]")
raise typer.Exit(1)
else:
targets = [r for r in rows if r.get("status") != "decommissioned"]
if not targets:
console.print("[dim]No targets.[/]")
return
tree_root = _pathlib.Path(root) if root else _pathlib.Path.cwd()
sha = detect_git_sha(tree_root)
console.print(f"[dim]Tarring[/] {tree_root} [dim]sha={sha or '(not a git repo)'}[/]")
tarball = tar_working_tree(tree_root, extra_excludes=exclude)
console.print(f"[dim]Tarball size:[/] {len(tarball):,} bytes")
if dry_run:
console.print("[yellow]--dry-run: not pushing.[/]")
for t in targets:
console.print(f" would push to [cyan]{t.get('name')}[/] at {t.get('address')}:{updater_port}")
return
async def _push_one(h: dict) -> dict:
name = h.get("name") or h.get("uuid")
out: dict = {"name": name, "address": h.get("address"), "agent": None, "self": None}
try:
async with UpdaterClient(h, updater_port=updater_port) as u:
r = await u.update(tarball, sha=sha)
out["agent"] = {"status": r.status_code, "body": r.json() if r.content else {}}
if r.status_code == 200 and include_self:
rs = await u.update_self(tarball, sha=sha)
out["self"] = {"status": rs.status_code, "body": rs.json() if rs.content else {}}
except Exception as exc: # noqa: BLE001
out["error"] = f"{type(exc).__name__}: {exc}"
return out
async def _push_all() -> list[dict]:
return await asyncio.gather(*(_push_one(t) for t in targets))
results = asyncio.run(_push_all())
table = Table(title="DECNET swarm update")
for col in ("host", "address", "agent", "self", "detail"):
table.add_column(col)
any_failure = False
for r in results:
agent = r.get("agent") or {}
selff = r.get("self") or {}
err = r.get("error")
if err:
any_failure = True
table.add_row(r["name"], r.get("address") or "", "[red]error[/]", "", err)
continue
a_status = agent.get("status")
if a_status == 200:
agent_cell = "[green]updated[/]"
elif a_status == 409:
agent_cell = "[yellow]rolled-back[/]"
any_failure = True
else:
agent_cell = f"[red]{a_status}[/]"
any_failure = True
if not include_self:
self_cell = ""
elif selff.get("status") == 200 or selff.get("status") is None:
self_cell = "[green]ok[/]" if selff else "[dim]skipped[/]"
else:
self_cell = f"[red]{selff.get('status')}[/]"
detail = ""
body = agent.get("body") or {}
if isinstance(body, dict):
detail = body.get("release", {}).get("sha") or body.get("detail", {}).get("error") or ""
table.add_row(r["name"], r.get("address") or "", agent_cell, self_cell, str(detail)[:80])
console.print(table)
if any_failure:
raise typer.Exit(1)
@swarm_app.command("deckies")
def swarm_deckies(
host: Optional[str] = typer.Option(None, "--host", help="Filter by worker name or UUID"),
state: Optional[str] = typer.Option(None, "--state", help="Filter by shard state (pending|running|failed|torn_down)"),
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL"),
json_out: bool = typer.Option(False, "--json", help="Emit JSON instead of a table"),
) -> None:
"""List deployed deckies across the swarm with their owning worker host."""
base = _utils._swarmctl_base_url(url)
host_uuid: Optional[str] = None
if host:
resp = _utils._http_request("GET", base + "/swarm/hosts")
rows = resp.json()
match = next((r for r in rows if r.get("uuid") == host or r.get("name") == host), None)
if match is None:
console.print(f"[red]No enrolled worker matching '{host}'.[/]")
raise typer.Exit(1)
host_uuid = match["uuid"]
query = []
if host_uuid:
query.append(f"host_uuid={host_uuid}")
if state:
query.append(f"state={state}")
path = "/swarm/deckies" + ("?" + "&".join(query) if query else "")
resp = _utils._http_request("GET", base + path)
rows = resp.json()
if json_out:
console.print_json(data=rows)
return
if not rows:
console.print("[dim]No deckies deployed.[/]")
return
table = Table(title="DECNET swarm deckies")
for col in ("decky", "host", "address", "state", "services"):
table.add_column(col)
for r in rows:
services = ",".join(r.get("services") or []) or ""
state_val = r.get("state") or "pending"
colored = {
"running": f"[green]{state_val}[/]",
"failed": f"[red]{state_val}[/]",
"pending": f"[yellow]{state_val}[/]",
"torn_down": f"[dim]{state_val}[/]",
}.get(state_val, state_val)
table.add_row(
r.get("decky_name") or "",
r.get("host_name") or "<unknown>",
r.get("host_address") or "",
colored,
services,
)
console.print(table)
@swarm_app.command("decommission")
def swarm_decommission(
name: Optional[str] = typer.Option(None, "--name", help="Worker hostname"),
uuid: Optional[str] = typer.Option(None, "--uuid", help="Worker UUID (skip lookup)"),
url: Optional[str] = typer.Option(None, "--url", help="Override swarm controller URL"),
yes: bool = typer.Option(False, "--yes", "-y", help="Skip interactive confirmation"),
) -> None:
"""Remove a worker from the swarm (cascades decky shard rows)."""
if not (name or uuid):
console.print("[red]Supply --name or --uuid.[/]")
raise typer.Exit(2)
base = _utils._swarmctl_base_url(url)
target_uuid = uuid
target_name = name
if target_uuid is None:
resp = _utils._http_request("GET", base + "/swarm/hosts")
rows = resp.json()
match = next((r for r in rows if r.get("name") == name), None)
if match is None:
console.print(f"[red]No enrolled worker named '{name}'.[/]")
raise typer.Exit(1)
target_uuid = match["uuid"]
target_name = match.get("name") or target_name
if not yes:
confirm = typer.confirm(f"Decommission worker {target_name!r} ({target_uuid})?", default=False)
if not confirm:
console.print("[dim]Aborted.[/]")
raise typer.Exit(0)
_utils._http_request("DELETE", f"{base}/swarm/hosts/{target_uuid}")
console.print(f"[green]Decommissioned {target_name or target_uuid}.[/]")

104
decnet/cli/swarmctl.py Normal file
View File

@@ -0,0 +1,104 @@
from __future__ import annotations
import os
import signal
import subprocess # nosec B404
import sys
from typing import Optional
import typer
from . import utils as _utils
from .gating import _require_master_mode
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def swarmctl(
port: int = typer.Option(8770, "--port", help="Port for the swarm controller"),
host: str = typer.Option("127.0.0.1", "--host", help="Bind address for the swarm controller"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
no_listener: bool = typer.Option(False, "--no-listener", help="Do not auto-spawn the syslog-TLS listener alongside swarmctl"),
tls: bool = typer.Option(False, "--tls", help="Serve over HTTPS with mTLS (required for cross-host worker heartbeats)"),
cert: Optional[str] = typer.Option(None, "--cert", help="BYOC: path to TLS server cert (PEM). Auto-issues from the DECNET CA if omitted."),
key: Optional[str] = typer.Option(None, "--key", help="BYOC: path to TLS server private key (PEM)."),
client_ca: Optional[str] = typer.Option(None, "--client-ca", help="CA bundle used to verify worker client certs. Defaults to the DECNET CA."),
) -> None:
"""Run the DECNET SWARM controller (master-side, separate process from `decnet api`).
By default, `decnet swarmctl` auto-spawns `decnet listener` as a fully-
detached sibling process so the master starts accepting forwarder
connections on 6514 without a second manual invocation. The listener
survives swarmctl restarts and crashes — if it dies on its own,
restart it manually with `decnet listener --daemon …`. Pass
--no-listener to skip.
Pass ``--tls`` to serve over HTTPS with mutual-TLS enforcement. By
default the server cert is auto-issued from the DECNET CA under
``~/.decnet/swarmctl/`` so enrolled workers (which already ship that
CA's ``ca.crt``) trust it out of the box. BYOC via ``--cert``/``--key``
if you need a publicly-trusted or externally-managed cert.
"""
_require_master_mode("swarmctl")
if daemon:
log.info("swarmctl daemonizing host=%s port=%d", host, port)
_utils._daemonize()
if not no_listener:
listener_host = os.environ.get("DECNET_LISTENER_HOST", "0.0.0.0") # nosec B104
listener_port = int(os.environ.get("DECNET_SWARM_SYSLOG_PORT", "6514"))
lst_argv = [
sys.executable, "-m", "decnet", "listener",
"--host", listener_host,
"--port", str(listener_port),
"--daemon",
]
try:
pid = _utils._spawn_detached(lst_argv, _utils._pid_dir() / "listener.pid")
log.info("swarmctl auto-spawned listener pid=%d bind=%s:%d",
pid, listener_host, listener_port)
console.print(f"[dim]Auto-spawned listener (pid {pid}) on {listener_host}:{listener_port}.[/]")
except Exception as e: # noqa: BLE001
log.warning("swarmctl could not auto-spawn listener: %s", e)
console.print(f"[yellow]listener auto-spawn skipped: {e}[/]")
log.info("swarmctl command invoked host=%s port=%d tls=%s", host, port, tls)
scheme = "https" if tls else "http"
console.print(f"[green]Starting DECNET SWARM controller on {scheme}://{host}:{port}...[/]")
_cmd = [sys.executable, "-m", "uvicorn", "decnet.web.swarm_api:app",
"--host", host, "--port", str(port)]
if tls:
from decnet.swarm import pki as _pki
if cert and key:
cert_path, key_path = cert, key
elif cert or key:
console.print("[red]--cert and --key must be provided together.[/]")
raise typer.Exit(code=2)
else:
auto_cert, auto_key, _auto_ca = _pki.ensure_swarmctl_cert(host)
cert_path, key_path = str(auto_cert), str(auto_key)
console.print(f"[dim]Auto-issued swarmctl server cert → {cert_path}[/]")
ca_path = client_ca or str(_pki.DEFAULT_CA_DIR / "ca.crt")
_cmd += [
"--ssl-keyfile", key_path,
"--ssl-certfile", cert_path,
"--ssl-ca-certs", ca_path,
"--ssl-cert-reqs", "2",
]
try:
proc = subprocess.Popen(_cmd, start_new_session=True) # nosec B603 B404
try:
proc.wait()
except KeyboardInterrupt:
try:
os.killpg(proc.pid, signal.SIGTERM)
try:
proc.wait(timeout=10)
except subprocess.TimeoutExpired:
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
except ProcessLookupError:
pass
except (FileNotFoundError, subprocess.SubprocessError):
console.print("[red]Failed to start swarmctl. Ensure 'uvicorn' is installed in the current environment.[/]")

46
decnet/cli/updater.py Normal file
View File

@@ -0,0 +1,46 @@
from __future__ import annotations
import pathlib as _pathlib
from typing import Optional
import typer
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def updater(
port: int = typer.Option(8766, "--port", help="Port for the self-updater daemon"),
host: str = typer.Option("0.0.0.0", "--host", help="Bind address for the updater"), # nosec B104
updater_dir: Optional[str] = typer.Option(None, "--updater-dir", help="Updater cert bundle dir (default: ~/.decnet/updater)"),
install_dir: Optional[str] = typer.Option(None, "--install-dir", help="Release install root (default: /opt/decnet)"),
agent_dir: Optional[str] = typer.Option(None, "--agent-dir", help="Worker agent cert bundle (for local /health probes; default: ~/.decnet/agent)"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Run the DECNET self-updater (requires a bundle in ~/.decnet/updater/)."""
from decnet.swarm import pki as _pki
from decnet.updater import server as _upd_server
resolved_updater = _pathlib.Path(updater_dir) if updater_dir else _upd_server.DEFAULT_UPDATER_DIR
resolved_install = _pathlib.Path(install_dir) if install_dir else _pathlib.Path("/opt/decnet")
resolved_agent = _pathlib.Path(agent_dir) if agent_dir else _pki.DEFAULT_AGENT_DIR
if daemon:
log.info("updater daemonizing host=%s port=%d", host, port)
_utils._daemonize()
log.info(
"updater command invoked host=%s port=%d updater_dir=%s install_dir=%s",
host, port, resolved_updater, resolved_install,
)
console.print(f"[green]Starting DECNET self-updater on {host}:{port} (mTLS)...[/]")
rc = _upd_server.run(
host, port,
updater_dir=resolved_updater,
install_dir=resolved_install,
agent_dir=resolved_agent,
)
if rc != 0:
raise typer.Exit(rc)

177
decnet/cli/utils.py Normal file
View File

@@ -0,0 +1,177 @@
"""Shared CLI helpers: console, logger, process management, swarm HTTP client.
Submodules reference these as ``from . import utils`` then ``utils.foo(...)``
so tests can patch ``decnet.cli.utils.<name>`` and have every caller see it.
"""
from __future__ import annotations
import os
import signal
import subprocess # nosec B404
import sys
from pathlib import Path
from typing import Optional
import typer
from rich.console import Console
from decnet.logging import get_logger
from decnet.env import DECNET_API_HOST, DECNET_API_PORT, DECNET_INGEST_LOG_FILE
log = get_logger("cli")
console = Console()
def _daemonize() -> None:
"""Fork the current process into a background daemon (Unix double-fork)."""
if os.fork() > 0:
raise SystemExit(0)
os.setsid()
if os.fork() > 0:
raise SystemExit(0)
sys.stdout = open(os.devnull, "w") # noqa: SIM115
sys.stderr = open(os.devnull, "w") # noqa: SIM115
sys.stdin = open(os.devnull, "r") # noqa: SIM115
def _pid_dir() -> Path:
"""Return the writable PID directory.
/opt/decnet when it exists and is writable (production), else
~/.decnet (dev). The directory is created if needed."""
candidates = [Path("/opt/decnet"), Path.home() / ".decnet"]
for path in candidates:
try:
path.mkdir(parents=True, exist_ok=True)
if os.access(path, os.W_OK):
return path
except (PermissionError, OSError):
continue
return Path("/tmp") # nosec B108
def _spawn_detached(argv: list[str], pid_file: Path) -> int:
"""Spawn a DECNET subcommand as a fully-independent sibling process.
The parent does NOT wait() on this child. start_new_session=True puts
the child in its own session so SIGHUP on parent exit doesn't kill it;
stdin/stdout/stderr go to /dev/null so the launching shell can close
without EIO on the child. close_fds=True prevents inherited sockets
from pinning ports we're trying to rebind.
This is deliberately NOT a supervisor — we fire-and-forget. If the
child dies, the operator restarts it manually via its own subcommand.
"""
if pid_file.exists():
try:
existing = int(pid_file.read_text().strip())
os.kill(existing, 0)
return existing
except (ValueError, ProcessLookupError, PermissionError, OSError):
pass # stale pid_file — fall through and spawn
with open(os.devnull, "rb") as dn_in, open(os.devnull, "ab") as dn_out:
proc = subprocess.Popen( # nosec B603
argv,
stdin=dn_in, stdout=dn_out, stderr=dn_out,
start_new_session=True, close_fds=True,
)
pid_file.parent.mkdir(parents=True, exist_ok=True)
pid_file.write_text(f"{proc.pid}\n")
return proc.pid
def _is_running(match_fn) -> int | None:
"""Return PID of a running DECNET process matching ``match_fn(cmdline)``, or None."""
import psutil
for proc in psutil.process_iter(["pid", "cmdline"]):
try:
cmd = proc.info["cmdline"]
if cmd and match_fn(cmd):
return proc.info["pid"]
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
return None
def _service_registry(log_file: str) -> list[tuple[str, callable, list[str]]]:
"""Return the microservice registry for health-check and relaunch.
On agents these run as systemd units invoking /usr/local/bin/decnet,
which doesn't include "decnet.cli" in its cmdline. On master dev boxes
they're launched via `python -m decnet.cli`. Match either form — cmd
is a list of argv tokens, so substring-check the joined string.
"""
_py = sys.executable
def _matches(sub: str, extras: tuple[str, ...] = ()):
def _check(cmd) -> bool:
joined = " ".join(cmd) if not isinstance(cmd, str) else cmd
if "decnet" not in joined:
return False
if sub not in joined:
return False
return all(e in joined for e in extras)
return _check
return [
("Collector", _matches("collect"),
[_py, "-m", "decnet.cli", "collect", "--daemon", "--log-file", log_file]),
("Mutator", _matches("mutate", ("--watch",)),
[_py, "-m", "decnet.cli", "mutate", "--daemon", "--watch"]),
("Prober", _matches("probe"),
[_py, "-m", "decnet.cli", "probe", "--daemon", "--log-file", log_file]),
("Profiler", _matches("profiler"),
[_py, "-m", "decnet.cli", "profiler", "--daemon"]),
("Sniffer", _matches("sniffer"),
[_py, "-m", "decnet.cli", "sniffer", "--daemon", "--log-file", log_file]),
("API",
lambda cmd: "uvicorn" in cmd and "decnet.web.api:app" in cmd,
[_py, "-m", "uvicorn", "decnet.web.api:app",
"--host", DECNET_API_HOST, "--port", str(DECNET_API_PORT)]),
]
def _kill_all_services() -> None:
"""Find and kill all running DECNET microservice processes."""
registry = _service_registry(str(DECNET_INGEST_LOG_FILE))
killed = 0
for name, match_fn, _launch_args in registry:
pid = _is_running(match_fn)
if pid is not None:
console.print(f"[yellow]Stopping {name} (PID {pid})...[/]")
os.kill(pid, signal.SIGTERM)
killed += 1
if killed:
console.print(f"[green]{killed} background process(es) stopped.[/]")
else:
console.print("[dim]No DECNET services were running.[/]")
_DEFAULT_SWARMCTL_URL = "http://127.0.0.1:8770"
def _swarmctl_base_url(url: Optional[str]) -> str:
return url or os.environ.get("DECNET_SWARMCTL_URL", _DEFAULT_SWARMCTL_URL)
def _http_request(method: str, url: str, *, json_body: Optional[dict] = None, timeout: float = 30.0):
"""Tiny sync wrapper around httpx; avoids leaking async into the CLI."""
import httpx
try:
resp = httpx.request(method, url, json=json_body, timeout=timeout)
except httpx.HTTPError as exc:
console.print(f"[red]Could not reach swarm controller at {url}: {exc}[/]")
console.print("[dim]Is `decnet swarmctl` running?[/]")
raise typer.Exit(2)
if resp.status_code >= 400:
try:
detail = resp.json().get("detail", resp.text)
except Exception: # nosec B110
detail = resp.text
console.print(f"[red]{method} {url} failed: {resp.status_code}{detail}[/]")
raise typer.Exit(1)
return resp

120
decnet/cli/web.py Normal file
View File

@@ -0,0 +1,120 @@
from __future__ import annotations
import typer
from decnet.env import DECNET_API_PORT, DECNET_WEB_HOST, DECNET_WEB_PORT
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command(name="web")
def serve_web(
web_port: int = typer.Option(DECNET_WEB_PORT, "--web-port", help="Port to serve the DECNET Web Dashboard"),
host: str = typer.Option(DECNET_WEB_HOST, "--host", help="Host IP to serve the Web Dashboard"),
api_port: int = typer.Option(DECNET_API_PORT, "--api-port", help="Port the DECNET API is listening on"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Serve the DECNET Web Dashboard frontend.
Proxies /api/* requests to the API server so the frontend can use
relative URLs (/api/v1/...) with no CORS configuration required.
"""
import http.client
import http.server
import os
import socketserver
from pathlib import Path
dist_dir = Path(__file__).resolve().parent.parent.parent / "decnet_web" / "dist"
if not dist_dir.exists():
console.print(f"[red]Frontend build not found at {dist_dir}. Make sure you run 'npm run build' inside 'decnet_web'.[/]")
raise typer.Exit(1)
if daemon:
log.info("web daemonizing host=%s port=%d api_port=%d", host, web_port, api_port)
_utils._daemonize()
_api_port = api_port
class SPAHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path.startswith("/api/"):
self._proxy("GET")
return
path = self.translate_path(self.path)
if not Path(path).exists() or Path(path).is_dir():
self.path = "/index.html"
return super().do_GET()
def do_POST(self):
if self.path.startswith("/api/"):
self._proxy("POST")
return
self.send_error(405)
def do_PUT(self):
if self.path.startswith("/api/"):
self._proxy("PUT")
return
self.send_error(405)
def do_DELETE(self):
if self.path.startswith("/api/"):
self._proxy("DELETE")
return
self.send_error(405)
def _proxy(self, method: str) -> None:
content_length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(content_length) if content_length else None
forward = {k: v for k, v in self.headers.items()
if k.lower() not in ("host", "connection")}
try:
conn = http.client.HTTPConnection("127.0.0.1", _api_port, timeout=120)
conn.request(method, self.path, body=body, headers=forward)
resp = conn.getresponse()
self.send_response(resp.status)
for key, val in resp.getheaders():
if key.lower() not in ("connection", "transfer-encoding"):
self.send_header(key, val)
self.end_headers()
content_type = resp.getheader("Content-Type", "")
if "text/event-stream" in content_type:
conn.sock.settimeout(None)
_read = getattr(resp, "read1", resp.read)
while True:
chunk = _read(4096)
if not chunk:
break
self.wfile.write(chunk)
self.wfile.flush()
except Exception as exc:
log.warning("web proxy error %s %s: %s", method, self.path, exc)
self.send_error(502, f"API proxy error: {exc}")
finally:
try:
conn.close()
except Exception: # nosec B110 — best-effort conn cleanup
pass
def log_message(self, fmt: str, *args: object) -> None:
log.debug("web %s", fmt % args)
os.chdir(dist_dir)
socketserver.TCPServer.allow_reuse_address = True
with socketserver.ThreadingTCPServer((host, web_port), SPAHTTPRequestHandler) as httpd:
console.print(f"[green]Serving DECNET Web Dashboard on http://{host}:{web_port}[/]")
console.print(f"[dim]Proxying /api/* → http://127.0.0.1:{_api_port}[/]")
try:
httpd.serve_forever()
except KeyboardInterrupt:
console.print("\n[dim]Shutting down dashboard server.[/]")

142
decnet/cli/workers.py Normal file
View File

@@ -0,0 +1,142 @@
from __future__ import annotations
from typing import Optional
import typer
from decnet.env import DECNET_INGEST_LOG_FILE
from . import utils as _utils
from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def probe(
log_file: str = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", "-f", help="Path for RFC 5424 syslog + .json output (reads attackers from .json, writes results to both)"),
interval: int = typer.Option(300, "--interval", "-i", help="Seconds between probe cycles (default: 300)"),
timeout: float = typer.Option(5.0, "--timeout", help="Per-probe TCP timeout in seconds"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background (used by deploy, no console output)"),
) -> None:
"""Fingerprint attackers (JARM + HASSH + TCP/IP stack) discovered in the log stream."""
import asyncio
from decnet.prober import prober_worker
if daemon:
log.info("probe daemonizing log_file=%s interval=%d", log_file, interval)
_utils._daemonize()
asyncio.run(prober_worker(log_file, interval=interval, timeout=timeout))
return
log.info("probe command invoked log_file=%s interval=%d", log_file, interval)
console.print(f"[bold cyan]DECNET-PROBER[/] watching {log_file} for attackers (interval: {interval}s)")
console.print("[dim]Press Ctrl+C to stop[/]")
try:
asyncio.run(prober_worker(log_file, interval=interval, timeout=timeout))
except KeyboardInterrupt:
console.print("\n[yellow]DECNET-PROBER stopped.[/]")
@app.command()
def collect(
log_file: str = typer.Option(DECNET_INGEST_LOG_FILE, "--log-file", "-f", help="Path to write RFC 5424 syslog lines and .json records"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Stream Docker logs from all running decky service containers to a log file."""
import asyncio
from decnet.collector import log_collector_worker
if daemon:
log.info("collect daemonizing log_file=%s", log_file)
_utils._daemonize()
log.info("collect command invoked log_file=%s", log_file)
console.print(f"[bold cyan]Collector starting[/] → {log_file}")
asyncio.run(log_collector_worker(log_file))
@app.command()
def mutate(
watch: bool = typer.Option(False, "--watch", "-w", help="Run continuously and mutate deckies according to their interval"),
decky_name: Optional[str] = typer.Option(None, "--decky", help="Force mutate a specific decky immediately"),
force_all: bool = typer.Option(False, "--all", help="Force mutate all deckies immediately"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Manually trigger or continuously watch for decky mutation."""
import asyncio
from decnet.mutator import mutate_decky, mutate_all, run_watch_loop
from decnet.web.dependencies import repo
if daemon:
log.info("mutate daemonizing watch=%s", watch)
_utils._daemonize()
async def _run() -> None:
await repo.initialize()
if watch:
await run_watch_loop(repo)
elif decky_name:
await mutate_decky(decky_name, repo)
elif force_all:
await mutate_all(force=True, repo=repo)
else:
await mutate_all(force=False, repo=repo)
asyncio.run(_run())
@app.command(name="correlate")
def correlate(
log_file: Optional[str] = typer.Option(None, "--log-file", "-f", help="Path to DECNET syslog file to analyse"),
min_deckies: int = typer.Option(2, "--min-deckies", "-m", help="Minimum number of distinct deckies an IP must touch to be reported"),
output: str = typer.Option("table", "--output", "-o", help="Output format: table | json | syslog"),
emit_syslog: bool = typer.Option(False, "--emit-syslog", help="Also print traversal events as RFC 5424 lines (for SIEM piping)"),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
) -> None:
"""Analyse logs for cross-decky traversals and print the attacker movement graph."""
import sys
import json as _json
from pathlib import Path
from decnet.correlation.engine import CorrelationEngine
if daemon:
log.info("correlate daemonizing log_file=%s", log_file)
_utils._daemonize()
engine = CorrelationEngine()
if log_file:
path = Path(log_file)
if not path.exists():
console.print(f"[red]Log file not found: {log_file}[/]")
raise typer.Exit(1)
engine.ingest_file(path)
elif not sys.stdin.isatty():
for line in sys.stdin:
engine.ingest(line)
else:
console.print("[red]Provide --log-file or pipe log data via stdin.[/]")
raise typer.Exit(1)
traversals = engine.traversals(min_deckies)
if output == "json":
console.print_json(_json.dumps(engine.report_json(min_deckies), indent=2))
elif output == "syslog":
for line in engine.traversal_syslog_lines(min_deckies):
typer.echo(line)
else:
if not traversals:
console.print(
f"[yellow]No traversals detected "
f"(min_deckies={min_deckies}, events_indexed={engine.events_indexed}).[/]"
)
else:
console.print(engine.report_table(min_deckies))
console.print(
f"[dim]Parsed {engine.lines_parsed} lines · "
f"indexed {engine.events_indexed} events · "
f"{len(engine.all_attackers())} unique IPs · "
f"[bold]{len(traversals)}[/] traversal(s)[/]"
)
if emit_syslog:
for line in engine.traversal_syslog_lines(min_deckies):
typer.echo(line)

View File

@@ -0,0 +1,13 @@
from decnet.collector.worker import (
is_service_container,
is_service_event,
log_collector_worker,
parse_rfc5424,
)
__all__ = [
"is_service_container",
"is_service_event",
"log_collector_worker",
"parse_rfc5424",
]

368
decnet/collector/worker.py Normal file
View File

@@ -0,0 +1,368 @@
"""
Host-side Docker log collector.
Streams stdout from all running decky service containers via the Docker SDK,
writes RFC 5424 lines to <log_file> and parsed JSON records to <log_file>.json.
The ingester tails the .json file; rsyslog can consume the .log file independently.
"""
import asyncio
import json
import os
import re
import threading
import time
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
from decnet.logging import get_logger
from decnet.telemetry import traced as _traced, get_tracer as _get_tracer, inject_context as _inject_ctx
logger = get_logger("collector")
# ─── Ingestion rate limiter ───────────────────────────────────────────────────
#
# Rationale: connection-lifecycle events (connect/disconnect/accept/close) are
# emitted once per TCP connection. During a portscan or credential-stuffing
# run, a single attacker can generate hundreds of these per second from the
# honeypot services themselves — each becoming a tiny WAL-write transaction
# through the ingester, starving reads until the queue drains.
#
# The collector still writes every line to the raw .log file (forensic record
# for rsyslog/SIEM). Only the .json path — which feeds SQLite — is deduped.
#
# Dedup key: (attacker_ip, decky, service, event_type)
# Window: DECNET_COLLECTOR_RL_WINDOW_SEC seconds (default 1.0)
# Scope: DECNET_COLLECTOR_RL_EVENT_TYPES comma list
# (default: connect,disconnect,connection,accept,close)
# Events outside that set bypass the limiter untouched.
def _parse_float_env(name: str, default: float) -> float:
raw = os.environ.get(name)
if raw is None:
return default
try:
value = float(raw)
except ValueError:
logger.warning("collector: invalid %s=%r, using default %s", name, raw, default)
return default
return max(0.0, value)
_RL_WINDOW_SEC: float = _parse_float_env("DECNET_COLLECTOR_RL_WINDOW_SEC", 1.0)
_RL_EVENT_TYPES: frozenset[str] = frozenset(
t.strip()
for t in os.environ.get(
"DECNET_COLLECTOR_RL_EVENT_TYPES",
"connect,disconnect,connection,accept,close",
).split(",")
if t.strip()
)
_RL_MAX_ENTRIES: int = 10_000
_rl_lock: threading.Lock = threading.Lock()
_rl_last: dict[tuple[str, str, str, str], float] = {}
def _should_ingest(parsed: dict[str, Any]) -> bool:
"""
Return True if this parsed event should be written to the JSON ingestion
stream. Rate-limited connection-lifecycle events return False when another
event with the same (attacker_ip, decky, service, event_type) was emitted
inside the dedup window.
"""
event_type = parsed.get("event_type", "")
if _RL_WINDOW_SEC <= 0.0 or event_type not in _RL_EVENT_TYPES:
return True
key = (
parsed.get("attacker_ip", "Unknown"),
parsed.get("decky", ""),
parsed.get("service", ""),
event_type,
)
now = time.monotonic()
with _rl_lock:
last = _rl_last.get(key, 0.0)
if now - last < _RL_WINDOW_SEC:
return False
_rl_last[key] = now
# Opportunistic GC: when the map grows past the cap, drop entries older
# than 60 windows (well outside any realistic in-flight dedup range).
if len(_rl_last) > _RL_MAX_ENTRIES:
cutoff = now - (_RL_WINDOW_SEC * 60.0)
stale = [k for k, t in _rl_last.items() if t < cutoff]
for k in stale:
del _rl_last[k]
return True
def _reset_rate_limiter() -> None:
"""Test-only helper — clear dedup state between test cases."""
with _rl_lock:
_rl_last.clear()
# ─── RFC 5424 parser ──────────────────────────────────────────────────────────
_RFC5424_RE = re.compile(
r"^<\d+>1 "
r"(\S+) " # 1: TIMESTAMP
r"(\S+) " # 2: HOSTNAME (decky name)
r"(\S+) " # 3: APP-NAME (service)
r"\S+ " # PROCID — NILVALUE ("-") for syslog_bridge emitters,
# real PID for native syslog callers like sshd/sudo
# routed through rsyslog. Accept both; we don't consume it.
r"(\S+) " # 4: MSGID (event_type)
r"(.+)$", # 5: SD element + optional MSG
)
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
_PARAM_RE = re.compile(r'(\w+)="((?:[^"\\]|\\.)*)"')
_IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "remote_addr", "target_ip", "ip")
# Free-form `key=value` pairs in the MSG body. Used for lines that bypass the
# syslog_bridge SD format — e.g. the SSH container's PROMPT_COMMAND which
# calls `logger -t bash "CMD uid=0 user=root src=1.2.3.4 pwd=/root cmd=…"`.
# Values run until the next whitespace, so `cmd=…` at end-of-line is preserved
# as one unit; we only care about IP-shaped fields here anyway.
_MSG_KV_RE = re.compile(r'(\w+)=(\S+)')
def parse_rfc5424(line: str) -> Optional[dict[str, Any]]:
"""
Parse an RFC 5424 DECNET log line into a structured dict.
Returns None if the line does not match the expected format.
"""
m = _RFC5424_RE.match(line)
if not m:
return None
ts_raw, decky, service, event_type, sd_rest = m.groups()
fields: dict[str, str] = {}
msg: str = ""
if sd_rest.startswith("-"):
msg = sd_rest[1:].lstrip()
elif sd_rest.startswith("["):
block = _SD_BLOCK_RE.search(sd_rest)
if block:
for k, v in _PARAM_RE.findall(block.group(1)):
fields[k] = v.replace('\\"', '"').replace("\\\\", "\\").replace("\\]", "]")
msg_match = re.search(r'\]\s+(.+)$', sd_rest)
if msg_match:
msg = msg_match.group(1).strip()
else:
msg = sd_rest
attacker_ip = "Unknown"
for fname in _IP_FIELDS:
if fname in fields:
attacker_ip = fields[fname]
break
# Fallback for plain `logger` callers that don't use SD params (notably
# the SSH container's bash PROMPT_COMMAND: `logger -t bash "CMD … src=IP …"`).
# Scan the MSG body for IP-shaped `key=value` tokens ONLY — don't fold
# them into `fields`, because the frontend's parseEventBody already
# renders kv pairs from the msg and doubling them up produces noisy
# duplicate pills. This keeps attacker attribution working without
# changing the shape of `fields` for non-SD lines.
if attacker_ip == "Unknown" and msg:
for k, v in _MSG_KV_RE.findall(msg):
if k in _IP_FIELDS:
attacker_ip = v
break
try:
ts_formatted = datetime.fromisoformat(ts_raw).strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
ts_formatted = ts_raw
return {
"timestamp": ts_formatted,
"decky": decky,
"service": service,
"event_type": event_type,
"attacker_ip": attacker_ip,
"fields": fields,
"msg": msg,
"raw_line": line,
}
# ─── Container helpers ────────────────────────────────────────────────────────
def _load_service_container_names() -> set[str]:
"""
Return the exact set of service container names from decnet-state.json.
Format: {decky_name}-{service_name}, e.g. 'omega-decky-smtp'.
Returns an empty set if no state file exists.
"""
from decnet.config import load_state
state = load_state()
if state is None:
return set()
config, _ = state
names: set[str] = set()
for decky in config.deckies:
for svc in decky.services:
names.add(f"{decky.name}-{svc.replace('_', '-')}")
return names
def is_service_container(container) -> bool:
"""Return True if this Docker container is a known DECNET service container."""
name = (container if isinstance(container, str) else container.name).lstrip("/")
return name in _load_service_container_names()
def is_service_event(attrs: dict) -> bool:
"""Return True if a Docker start event is for a known DECNET service container."""
name = attrs.get("name", "").lstrip("/")
return name in _load_service_container_names()
# ─── Blocking stream worker (runs in a thread) ────────────────────────────────
def _reopen_if_needed(path: Path, fh: Optional[Any]) -> Any:
"""Return fh if it still points to the same inode as path; otherwise close
fh and open a fresh handle. Handles the file being deleted (manual rm) or
rotated (logrotate rename + create)."""
try:
if fh is not None and os.fstat(fh.fileno()).st_ino == os.stat(path).st_ino:
return fh
except OSError:
pass
# File gone or inode changed — close stale handle and open a new one.
if fh is not None:
try:
fh.close()
except Exception: # nosec B110 — best-effort file handle cleanup
pass
path.parent.mkdir(parents=True, exist_ok=True)
return open(path, "a", encoding="utf-8")
@_traced("collector.stream_container")
def _stream_container(container_id: str, log_path: Path, json_path: Path) -> None:
"""Stream logs from one container and append to the host log files."""
import docker # type: ignore[import]
lf: Optional[Any] = None
jf: Optional[Any] = None
try:
client = docker.from_env()
container = client.containers.get(container_id)
log_stream = container.logs(stream=True, follow=True, stdout=True, stderr=False)
buf = ""
for chunk in log_stream:
buf += chunk.decode("utf-8", errors="replace")
while "\n" in buf:
line, buf = buf.split("\n", 1)
line = line.rstrip()
if not line:
continue
lf = _reopen_if_needed(log_path, lf)
lf.write(line + "\n")
lf.flush()
parsed = parse_rfc5424(line)
if parsed:
if _should_ingest(parsed):
_tracer = _get_tracer("collector")
with _tracer.start_as_current_span("collector.event") as _span:
_span.set_attribute("decky", parsed.get("decky", ""))
_span.set_attribute("service", parsed.get("service", ""))
_span.set_attribute("event_type", parsed.get("event_type", ""))
_span.set_attribute("attacker_ip", parsed.get("attacker_ip", ""))
_inject_ctx(parsed)
logger.debug("collector: event written decky=%s type=%s", parsed.get("decky"), parsed.get("event_type"))
jf = _reopen_if_needed(json_path, jf)
jf.write(json.dumps(parsed) + "\n")
jf.flush()
else:
logger.debug(
"collector: rate-limited decky=%s service=%s type=%s attacker=%s",
parsed.get("decky"), parsed.get("service"),
parsed.get("event_type"), parsed.get("attacker_ip"),
)
else:
logger.debug("collector: malformed RFC5424 line snippet=%r", line[:80])
except Exception as exc:
logger.debug("collector: log stream ended container_id=%s reason=%s", container_id, exc)
finally:
for fh in (lf, jf):
if fh is not None:
try:
fh.close()
except Exception: # nosec B110 — best-effort file handle cleanup
pass
# ─── Async collector ──────────────────────────────────────────────────────────
async def log_collector_worker(log_file: str) -> None:
"""
Background task: streams Docker logs from all running decky service
containers, writing RFC 5424 lines to log_file and parsed JSON records
to log_file.json for the ingester to consume.
Watches Docker events to pick up containers started after initial scan.
"""
import docker # type: ignore[import]
log_path = Path(log_file)
json_path = log_path.with_suffix(".json")
log_path.parent.mkdir(parents=True, exist_ok=True)
active: dict[str, asyncio.Task[None]] = {}
loop = asyncio.get_running_loop()
# Dedicated thread pool so long-running container log streams don't
# saturate the default asyncio executor and starve short-lived
# to_thread() calls elsewhere (e.g. load_state in the web API).
collector_pool = ThreadPoolExecutor(
max_workers=64, thread_name_prefix="decnet-collector",
)
def _spawn(container_id: str, container_name: str) -> None:
if container_id not in active or active[container_id].done():
active[container_id] = asyncio.ensure_future(
loop.run_in_executor(
collector_pool, _stream_container,
container_id, log_path, json_path,
),
loop=loop,
)
logger.info("collector: streaming container=%s", container_name)
try:
logger.info("collector started log_path=%s", log_path)
client = docker.from_env()
for container in client.containers.list():
if is_service_container(container):
_spawn(container.id, container.name.lstrip("/"))
def _watch_events() -> None:
for event in client.events(
decode=True,
filters={"type": "container", "event": "start"},
):
attrs = event.get("Actor", {}).get("Attributes", {})
cid = event.get("id", "")
name = attrs.get("name", "")
if cid and is_service_event(attrs):
loop.call_soon_threadsafe(_spawn, cid, name)
await loop.run_in_executor(collector_pool, _watch_events)
except asyncio.CancelledError:
logger.info("collector shutdown requested cancelling %d tasks", len(active))
for task in active.values():
task.cancel()
collector_pool.shutdown(wait=False)
raise
except Exception as exc:
logger.error("collector error: %s", exc)
finally:
collector_pool.shutdown(wait=False)

View File

@@ -6,6 +6,12 @@ Network model:
All service containers for that decky share the base's network namespace
via `network_mode: "service:<base>"`. From the outside, every service on
a given decky appears to come from the same IP — exactly like a real host.
Logging model:
Service containers write RFC 5424 lines to stdout. Docker captures them
via the json-file driver. The host-side collector (decnet.web.collector)
streams those logs and writes them to the host log file for the ingester
and rsyslog to consume. No bind mounts or shared volumes are needed.
"""
from pathlib import Path
@@ -17,35 +23,19 @@ from decnet.network import MACVLAN_NETWORK_NAME
from decnet.os_fingerprint import get_os_sysctls
from decnet.services.registry import get_service
_CONTAINER_LOG_DIR = "/var/log/decnet"
_LOG_NETWORK = "decnet_logs"
def _resolve_log_file(log_file: str) -> tuple[str, str]:
"""
Return (host_dir, container_log_path) for a user-supplied log file path.
The host path is resolved to absolute so Docker can bind-mount it.
All containers share the same host directory, mounted at _CONTAINER_LOG_DIR.
"""
host_path = Path(log_file).resolve()
host_dir = str(host_path.parent)
container_path = f"{_CONTAINER_LOG_DIR}/{host_path.name}"
return host_dir, container_path
_DOCKER_LOGGING = {
"driver": "json-file",
"options": {
"max-size": "10m",
"max-file": "5",
},
}
def generate_compose(config: DecnetConfig) -> dict:
"""Build and return the full docker-compose data structure."""
services: dict = {}
log_host_dir: str | None = None
log_container_path: str | None = None
if config.log_file:
log_host_dir, log_container_path = _resolve_log_file(config.log_file)
# Ensure the host log directory exists so Docker doesn't create it as root-owned
Path(log_host_dir).mkdir(parents=True, exist_ok=True)
for decky in config.deckies:
base_key = decky.name # e.g. "decky-01"
@@ -62,8 +52,6 @@ def generate_compose(config: DecnetConfig) -> dict:
}
},
}
if config.log_target:
base["networks"][_LOG_NETWORK] = {}
# Inject TCP/IP stack sysctls to spoof the claimed OS fingerprint.
# Only the base container needs this — service containers inherit the
@@ -76,24 +64,21 @@ def generate_compose(config: DecnetConfig) -> dict:
# --- Service containers: share base network namespace ---
for svc_name in decky.services:
svc = get_service(svc_name)
if svc.fleet_singleton:
continue
svc_cfg = decky.service_config.get(svc_name, {})
fragment = svc.compose_fragment(
decky.name, log_target=config.log_target, service_cfg=svc_cfg
)
fragment = svc.compose_fragment(decky.name, service_cfg=svc_cfg)
# Inject the per-decky base image into build services so containers
# vary by distro and don't all fingerprint as debian:bookworm-slim.
# Services that need a fixed upstream image (e.g. conpot) can pre-set
# build.args.BASE_IMAGE in their compose_fragment() to opt out.
if "build" in fragment:
fragment["build"].setdefault("args", {})["BASE_IMAGE"] = decky.build_base
args = fragment["build"].setdefault("args", {})
args.setdefault("BASE_IMAGE", decky.build_base)
fragment.setdefault("environment", {})
fragment["environment"]["HOSTNAME"] = decky.hostname
if log_host_dir and log_container_path:
fragment["environment"]["DECNET_LOG_FILE"] = log_container_path
fragment.setdefault("volumes", [])
mount = f"{log_host_dir}:{_CONTAINER_LOG_DIR}"
if mount not in fragment["volumes"]:
fragment["volumes"].append(mount)
# Share the base container's network — no own IP needed
fragment["network_mode"] = f"service:{base_key}"
@@ -103,6 +88,9 @@ def generate_compose(config: DecnetConfig) -> dict:
fragment.pop("hostname", None)
fragment.pop("networks", None)
# Rotate Docker logs so disk usage is bounded
fragment["logging"] = _DOCKER_LOGGING
services[f"{decky.name}-{svc_name}"] = fragment
# Network definitions
@@ -111,8 +99,6 @@ def generate_compose(config: DecnetConfig) -> dict:
"external": True, # created by network.py before compose up
}
}
if config.log_target:
networks[_LOG_NETWORK] = {"driver": "bridge", "internal": True}
return {
"version": "3.8",

View File

@@ -4,61 +4,117 @@ State is persisted to decnet-state.json in the working directory.
"""
import json
import logging
import os
import socket as _socket
from datetime import datetime, timezone
from pathlib import Path
from typing import Literal
from pydantic import BaseModel, field_validator
from decnet.models import DeckyConfig, DecnetConfig # noqa: F401
from decnet.distros import random_hostname as _random_hostname
STATE_FILE = Path("decnet-state.json")
# ---------------------------------------------------------------------------
# RFC 5424 syslog formatter
# ---------------------------------------------------------------------------
# Severity mapping: Python level → syslog severity (RFC 5424 §6.2.1)
_SYSLOG_SEVERITY: dict[int, int] = {
logging.CRITICAL: 2, # Critical
logging.ERROR: 3, # Error
logging.WARNING: 4, # Warning
logging.INFO: 6, # Informational
logging.DEBUG: 7, # Debug
}
_FACILITY_LOCAL0 = 16 # local0 (RFC 5424 §6.2.1 / POSIX)
class Rfc5424Formatter(logging.Formatter):
"""Formats log records as RFC 5424 syslog messages.
Output:
<PRIVAL>1 TIMESTAMP HOSTNAME APP-NAME PROCID MSGID STRUCTURED-DATA MSG
Example:
<134>1 2026-04-12T21:48:03.123456+00:00 host decnet 1234 decnet.config - Dev mode active
"""
_hostname: str = _socket.gethostname()
_app: str = "decnet"
def format(self, record: logging.LogRecord) -> str:
severity = _SYSLOG_SEVERITY.get(record.levelno, 6)
prival = (_FACILITY_LOCAL0 * 8) + severity
ts = datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(timespec="microseconds")
msg = record.getMessage()
if record.exc_info:
msg += "\n" + self.formatException(record.exc_info)
app = getattr(record, "decnet_component", self._app)
return (
f"<{prival}>1 {ts} {self._hostname} {app}"
f" {os.getpid()} {record.name} - {msg}"
)
def _configure_logging(dev: bool) -> None:
"""Install RFC 5424 handlers on the root logger (idempotent).
Always adds a StreamHandler (stderr). Also adds a RotatingFileHandler
writing to DECNET_SYSTEM_LOGS (default: decnet.system.log in $PWD) so
all microservice daemons — which redirect stderr to /dev/null — still
produce readable logs. File handler is skipped under pytest.
"""
from decnet.logging.inode_aware_handler import InodeAwareRotatingFileHandler
root = logging.getLogger()
# Guard: if our StreamHandler is already installed, all handlers are set.
if any(isinstance(h, logging.StreamHandler) and isinstance(h.formatter, Rfc5424Formatter)
for h in root.handlers):
return
fmt = Rfc5424Formatter()
root.setLevel(logging.DEBUG if dev else logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(fmt)
root.addHandler(stream_handler)
# Skip the file handler during pytest runs to avoid polluting the test cwd.
_in_pytest = any(k.startswith("PYTEST") for k in os.environ)
if not _in_pytest:
_log_path = os.environ.get("DECNET_SYSTEM_LOGS", "decnet.system.log")
file_handler = InodeAwareRotatingFileHandler(
_log_path,
mode="a",
maxBytes=10 * 1024 * 1024, # 10 MB
backupCount=5,
encoding="utf-8",
)
file_handler.setFormatter(fmt)
root.addHandler(file_handler)
# Drop root ownership when invoked via sudo so non-root follow-up
# commands (e.g. `decnet api` after `sudo decnet deploy`) can append.
from decnet.privdrop import chown_to_invoking_user
chown_to_invoking_user(_log_path)
_dev = os.environ.get("DECNET_DEVELOPER", "").lower() == "true"
_configure_logging(_dev)
log = logging.getLogger(__name__)
if _dev:
log.debug("Developer mode: debug logging active")
# Calculate absolute path to the project root (where the config file resides)
_ROOT: Path = Path(__file__).parent.parent.absolute()
STATE_FILE: Path = _ROOT / "decnet-state.json"
DEFAULT_MUTATE_INTERVAL: int = 30 # default rotation interval in minutes
def random_hostname(distro_slug: str = "debian") -> str:
return _random_hostname(distro_slug)
class DeckyConfig(BaseModel):
name: str
ip: str
services: list[str]
distro: str # slug from distros.DISTROS, e.g. "debian", "ubuntu22"
base_image: str # Docker image for the base/IP-holder container
build_base: str = "debian:bookworm-slim" # apt-compatible image for service Dockerfiles
hostname: str
archetype: str | None = None # archetype slug if spawned from an archetype profile
service_config: dict[str, dict] = {} # optional per-service persona config
nmap_os: str = "linux" # OS family for TCP/IP stack spoofing (see os_fingerprint.py)
@field_validator("services")
@classmethod
def services_not_empty(cls, v: list[str]) -> list[str]:
if not v:
raise ValueError("A decky must have at least one service.")
return v
class DecnetConfig(BaseModel):
mode: Literal["unihost", "swarm"]
interface: str
subnet: str
gateway: str
deckies: list[DeckyConfig]
log_target: str | None = None # "ip:port" or None
log_file: str | None = None # path for RFC 5424 syslog file output
ipvlan: bool = False # use IPvlan L2 instead of MACVLAN (WiFi-friendly)
@field_validator("log_target")
@classmethod
def validate_log_target(cls, v: str | None) -> str | None:
if v is None:
return v
parts = v.rsplit(":", 1)
if len(parts) != 2 or not parts[1].isdigit():
raise ValueError("log_target must be in ip:port format, e.g. 192.168.1.5:5140")
return v
def save_state(config: DecnetConfig, compose_path: Path) -> None:
payload = {
"config": config.model_dump(),

90
decnet/config_ini.py Normal file
View File

@@ -0,0 +1,90 @@
"""Parse /etc/decnet/decnet.ini and seed os.environ defaults.
The INI file is a convenience layer on top of the existing DECNET_* env
vars. It never overrides an explicit environment variable (uses
os.environ.setdefault). Call load_ini_config() once, very early, before
any decnet.env import, so env.py picks up the seeded values as if they
had been exported by the shell.
Shape::
[decnet]
mode = agent # or "master"
log-directory = /var/log/decnet
disallow-master = true
[agent]
master-host = 192.168.1.50
master-port = 8770
agent-port = 8765
agent-dir = /home/anti/.decnet/agent
...
[master]
api-host = 0.0.0.0
swarmctl-port = 8770
listener-port = 6514
...
Only the section matching `mode` is loaded. The other section is
ignored silently so an agent host never reads master secrets (and
vice versa). Keys are converted to SCREAMING_SNAKE_CASE and prefixed
with ``DECNET_`` — e.g. ``master-host`` → ``DECNET_MASTER_HOST``.
"""
from __future__ import annotations
import configparser
import os
from pathlib import Path
from typing import Optional
DEFAULT_CONFIG_PATH = Path("/etc/decnet/decnet.ini")
# The [decnet] section keys are role-agnostic and always exported.
_COMMON_KEYS = frozenset({"mode", "disallow-master", "log-directory"})
def _key_to_env(key: str) -> str:
return "DECNET_" + key.replace("-", "_").upper()
def load_ini_config(path: Optional[Path] = None) -> Optional[Path]:
"""Seed os.environ defaults from the DECNET INI file.
Returns the path that was actually loaded (so callers can log it), or
None if no file was read. Missing file is a no-op — callers fall back
to env vars / CLI flags / hardcoded defaults.
Precedence: real os.environ > INI > defaults. Real env vars are never
overwritten because we use setdefault().
"""
if path is None:
override = os.environ.get("DECNET_CONFIG")
path = Path(override) if override else DEFAULT_CONFIG_PATH
if not path.is_file():
return None
parser = configparser.ConfigParser()
parser.read(path)
# [decnet] first — mode/disallow-master/log-directory. These seed the
# mode decision for the section selection below.
if parser.has_section("decnet"):
for key, value in parser.items("decnet"):
os.environ.setdefault(_key_to_env(key), value)
mode = os.environ.get("DECNET_MODE", "master").lower()
if mode not in ("agent", "master"):
raise ValueError(
f"decnet.ini: [decnet] mode must be 'agent' or 'master', got '{mode}'"
)
# Role-specific section.
section = mode
if parser.has_section(section):
for key, value in parser.items(section):
os.environ.setdefault(_key_to_env(key), value)
return path

View File

@@ -5,9 +5,9 @@ from decnet.correlation.graph import AttackerTraversal, TraversalHop
from decnet.correlation.parser import LogEvent, parse_line
__all__ = [
"CorrelationEngine",
"AttackerTraversal",
"TraversalHop",
"CorrelationEngine",
"LogEvent",
"TraversalHop",
"parse_line",
]

View File

@@ -22,7 +22,6 @@ Usage
from __future__ import annotations
import json
from collections import defaultdict
from pathlib import Path
@@ -34,6 +33,7 @@ from decnet.logging.syslog_formatter import (
SEVERITY_WARNING,
format_rfc5424,
)
from decnet.telemetry import traced as _traced, get_tracer as _get_tracer
class CorrelationEngine:
@@ -65,6 +65,7 @@ class CorrelationEngine:
self.events_indexed += 1
return event
@_traced("correlation.ingest_file")
def ingest_file(self, path: Path) -> int:
"""
Parse every line of *path* and index it.
@@ -74,12 +75,18 @@ class CorrelationEngine:
with open(path) as fh:
for line in fh:
self.ingest(line)
_tracer = _get_tracer("correlation")
with _tracer.start_as_current_span("correlation.ingest_file.summary") as _span:
_span.set_attribute("lines_parsed", self.lines_parsed)
_span.set_attribute("events_indexed", self.events_indexed)
_span.set_attribute("unique_ips", len(self._events))
return self.events_indexed
# ------------------------------------------------------------------ #
# Query #
# ------------------------------------------------------------------ #
@_traced("correlation.traversals")
def traversals(self, min_deckies: int = 2) -> list[AttackerTraversal]:
"""
Return all attackers that touched at least *min_deckies* distinct
@@ -136,6 +143,7 @@ class CorrelationEngine:
)
return table
@_traced("correlation.report_json")
def report_json(self, min_deckies: int = 2) -> dict:
"""Serialisable dict representation of all traversals."""
return {
@@ -148,6 +156,7 @@ class CorrelationEngine:
"traversals": [t.to_dict() for t in self.traversals(min_deckies)],
}
@_traced("correlation.traversal_syslog_lines")
def traversal_syslog_lines(self, min_deckies: int = 2) -> list[str]:
"""
Emit one RFC 5424 syslog line per detected traversal.

View File

@@ -6,7 +6,7 @@ the fields needed for cross-decky correlation: attacker IP, decky name,
service, event type, and timestamp.
Log format (produced by decnet.logging.syslog_formatter):
<PRI>1 TIMESTAMP HOSTNAME APP-NAME - MSGID [decnet@55555 k1="v1" k2="v2"] [MSG]
<PRI>1 TIMESTAMP HOSTNAME APP-NAME - MSGID [relay@55555 k1="v1" k2="v2"] [MSG]
The attacker IP may appear under several field names depending on service:
src_ip — ftp, smtp, http, most services
@@ -17,7 +17,7 @@ The attacker IP may appear under several field names depending on service:
from __future__ import annotations
import re
from dataclasses import dataclass, field
from dataclasses import dataclass
from datetime import datetime
# RFC 5424 line structure
@@ -31,14 +31,14 @@ _RFC5424_RE = re.compile(
r"(.+)$", # 5: SD element + optional MSG
)
# Structured data block: [decnet@55555 k="v" ...]
_SD_BLOCK_RE = re.compile(r'\[decnet@55555\s+(.*?)\]', re.DOTALL)
# Structured data block: [relay@55555 k="v" ...]
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
# Individual param: key="value" (with escaped chars inside value)
_PARAM_RE = re.compile(r'(\w+)="((?:[^"\\]|\\.)*)"')
# Field names to probe for attacker IP, in priority order
_IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "ip")
_IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "remote_addr", "target_ip", "ip")
@dataclass

View File

@@ -97,8 +97,8 @@ def random_hostname(distro_slug: str = "debian") -> str:
"""Generate a plausible hostname for the given distro style."""
profile = DISTROS.get(distro_slug)
style = profile.hostname_style if profile else "generic"
word = random.choice(_NAME_WORDS)
num = random.randint(10, 99)
word = random.choice(_NAME_WORDS) # nosec B311
num = random.randint(10, 99) # nosec B311
if style == "rhel":
# RHEL/CentOS/Fedora convention: word+num.localdomain
@@ -107,7 +107,7 @@ def random_hostname(distro_slug: str = "debian") -> str:
return f"{word}-{num}"
elif style == "rolling":
# Kali/Arch: just a word, no suffix
return f"{word}-{random.choice(_NAME_WORDS)}"
return f"{word}-{random.choice(_NAME_WORDS)}" # nosec B311
else:
# Debian/Ubuntu: SRV-WORD-nn
return f"SRV-{word.upper()}-{num}"
@@ -122,7 +122,7 @@ def get_distro(slug: str) -> DistroProfile:
def random_distro() -> DistroProfile:
return random.choice(list(DISTROS.values()))
return random.choice(list(DISTROS.values())) # nosec B311
def all_distros() -> dict[str, DistroProfile]:

15
decnet/engine/__init__.py Normal file
View File

@@ -0,0 +1,15 @@
from decnet.engine.deployer import (
COMPOSE_FILE,
_compose_with_retry,
deploy,
status,
teardown,
)
__all__ = [
"COMPOSE_FILE",
"_compose_with_retry",
"deploy",
"status",
"teardown",
]

View File

@@ -2,7 +2,8 @@
Deploy, teardown, and status via Docker SDK + subprocess docker compose.
"""
import subprocess
import shutil
import subprocess # nosec B404
import time
from pathlib import Path
@@ -10,15 +11,14 @@ import docker
from rich.console import Console
from rich.table import Table
from decnet.logging import get_logger
from decnet.telemetry import traced as _traced
from decnet.config import DecnetConfig, clear_state, load_state, save_state
from decnet.composer import write_compose
from decnet.network import (
MACVLAN_NETWORK_NAME,
allocate_ips,
create_ipvlan_network,
create_macvlan_network,
detect_interface,
detect_subnet,
get_host_ip,
ips_to_range,
remove_macvlan_network,
@@ -28,13 +28,51 @@ from decnet.network import (
teardown_host_macvlan,
)
log = get_logger("engine")
console = Console()
COMPOSE_FILE = Path("decnet-compose.yml")
_CANONICAL_LOGGING = Path(__file__).parent.parent / "templates" / "syslog_bridge.py"
def _compose(*args: str, compose_file: Path = COMPOSE_FILE) -> None:
cmd = ["docker", "compose", "-f", str(compose_file), *args]
subprocess.run(cmd, check=True)
def _sync_logging_helper(config: DecnetConfig) -> None:
"""Copy the canonical syslog_bridge.py into every active template build context."""
from decnet.services.registry import get_service
seen: set[Path] = set()
for decky in config.deckies:
for svc_name in decky.services:
svc = get_service(svc_name)
if svc is None:
continue
ctx = svc.dockerfile_context()
if ctx is None or ctx in seen:
continue
seen.add(ctx)
dest = ctx / "syslog_bridge.py"
if not dest.exists() or dest.read_bytes() != _CANONICAL_LOGGING.read_bytes():
shutil.copy2(_CANONICAL_LOGGING, dest)
def _compose(*args: str, compose_file: Path = COMPOSE_FILE, env: dict | None = None) -> None:
import os
# -p decnet pins the compose project name. Without it, docker compose
# derives the project from basename($PWD); when a daemon (systemd) runs
# with WorkingDirectory=/ that basename is empty and compose aborts with
# "project name must not be empty".
cmd = ["docker", "compose", "-p", "decnet", "-f", str(compose_file), *args]
merged = {**os.environ, **(env or {})}
result = subprocess.run(cmd, capture_output=True, text=True, env=merged) # nosec B603
if result.stdout:
print(result.stdout, end="")
if result.returncode != 0:
# Docker emits the useful detail ("Address already in use", which IP,
# which port) on stderr. Surface it to the structured log so the
# agent's journal carries it — without this the upstream traceback
# just shows the exit code.
if result.stderr:
log.error("docker compose %s failed: %s", " ".join(args), result.stderr.strip())
raise subprocess.CalledProcessError(
result.returncode, cmd, result.stdout, result.stderr
)
_PERMANENT_ERRORS = (
@@ -46,17 +84,25 @@ _PERMANENT_ERRORS = (
)
@_traced("engine.compose_with_retry")
def _compose_with_retry(
*args: str,
compose_file: Path = COMPOSE_FILE,
retries: int = 3,
delay: float = 5.0,
env: dict | None = None,
) -> None:
"""Run a docker compose command, retrying on transient failures."""
import os
last_exc: subprocess.CalledProcessError | None = None
cmd = ["docker", "compose", "-f", str(compose_file), *args]
# -p decnet pins the compose project name. Without it, docker compose
# derives the project from basename($PWD); when a daemon (systemd) runs
# with WorkingDirectory=/ that basename is empty and compose aborts with
# "project name must not be empty".
cmd = ["docker", "compose", "-p", "decnet", "-f", str(compose_file), *args]
merged = {**os.environ, **(env or {})}
for attempt in range(1, retries + 1):
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, env=merged) # nosec B603
if result.returncode == 0:
if result.stdout:
print(result.stdout, end="")
@@ -80,16 +126,21 @@ def _compose_with_retry(
else:
if result.stderr:
console.print(f"[red]{result.stderr.strip()}[/]")
log.error("docker compose %s failed after %d attempts: %s",
" ".join(args), retries, result.stderr.strip())
raise last_exc
def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = False) -> None:
@_traced("engine.deploy")
def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = False, parallel: bool = False) -> None:
log.info("deployment started n_deckies=%d interface=%s subnet=%s dry_run=%s", len(config.deckies), config.interface, config.subnet, dry_run)
log.debug("deploy: deckies=%s", [d.name for d in config.deckies])
client = docker.from_env()
# --- Network setup ---
ip_list = [d.ip for d in config.deckies]
decky_range = ips_to_range(ip_list)
host_ip = get_host_ip(config.interface)
log.debug("deploy: ip_range=%s host_ip=%s", decky_range, host_ip)
net_driver = "IPvlan L2" if config.ipvlan else "MACVLAN"
console.print(f"[bold cyan]Creating {net_driver} network[/] ({MACVLAN_NETWORK_NAME}) on {config.interface}")
@@ -113,30 +164,53 @@ def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = False)
)
setup_host_macvlan(config.interface, host_ip, decky_range)
# --- Compose generation ---
_sync_logging_helper(config)
compose_path = write_compose(config, COMPOSE_FILE)
console.print(f"[bold cyan]Compose file written[/] → {compose_path}")
if dry_run:
log.info("deployment dry-run complete compose_path=%s", compose_path)
console.print("[yellow]Dry run — no containers started.[/]")
return
# --- Save state before bring-up ---
save_state(config, compose_path)
# --- Bring up ---
# Pre-up cleanup: a prior half-failed `up` can leave containers still
# holding the IPs/ports this run wants, which surfaces as the recurring
# "Address already in use" from Docker's IPAM. Best-effort — ignore
# failure (e.g. nothing to tear down on a clean host).
try:
_compose("down", "--remove-orphans", compose_file=compose_path)
except subprocess.CalledProcessError:
log.debug("pre-up cleanup: compose down failed (likely nothing to remove)")
build_env = {"DOCKER_BUILDKIT": "1"} if parallel else {}
console.print("[bold cyan]Building images and starting deckies...[/]")
build_args = ["build"]
if no_cache:
build_args.append("--no-cache")
if parallel:
console.print("[bold cyan]Parallel build enabled — building all images concurrently...[/]")
_compose_with_retry(*build_args, compose_file=compose_path, env=build_env)
_compose_with_retry("up", "-d", compose_file=compose_path, env=build_env)
else:
if no_cache:
_compose_with_retry("build", "--no-cache", compose_file=compose_path)
_compose_with_retry("up", "--build", "-d", compose_file=compose_path)
# --- Status summary ---
log.info("deployment complete n_deckies=%d", len(config.deckies))
_print_status(config)
@_traced("engine.teardown")
def teardown(decky_id: str | None = None) -> None:
log.info("teardown requested decky_id=%s", decky_id or "all")
state = load_state()
if state is None:
log.warning("teardown: no active deployment found")
console.print("[red]No active deployment found (no decnet-state.json).[/]")
return
@@ -144,11 +218,14 @@ def teardown(decky_id: str | None = None) -> None:
client = docker.from_env()
if decky_id:
# Bring down only the services matching this decky
svc_names = [f"{decky_id}-{svc}" for svc in [d.services for d in config.deckies if d.name == decky_id]]
if not svc_names:
decky = next((d for d in config.deckies if d.name == decky_id), None)
if decky is None:
console.print(f"[red]Decky '{decky_id}' not found in current deployment.[/]")
return
svc_names = [f"{decky_id}-{svc}" for svc in decky.services]
if not svc_names:
log.warning("teardown: decky %s has no services to stop", decky_id)
return
_compose("stop", *svc_names, compose_file=compose_path)
_compose("rm", "-f", *svc_names, compose_file=compose_path)
else:
@@ -162,7 +239,9 @@ def teardown(decky_id: str | None = None) -> None:
teardown_host_macvlan(decky_range)
remove_macvlan_network(client)
clear_state()
net_driver = "IPvlan" if config.ipvlan else "MACVLAN"
log.info("teardown complete all deckies removed network_driver=%s", net_driver)
console.print(f"[green]All deckies torn down. {net_driver} network removed.[/]")
@@ -182,7 +261,7 @@ def status() -> None:
table.add_column("Hostname")
table.add_column("Status")
running = {c.name: c.status for c in client.containers.list(all=True)}
running = {c.name: c.status for c in client.containers.list(all=True, ignore_removed=True)}
for decky in config.deckies:
statuses = []

153
decnet/env.py Normal file
View File

@@ -0,0 +1,153 @@
import os
from pathlib import Path
from typing import Optional
from dotenv import load_dotenv
# Calculate absolute path to the project root
_ROOT: Path = Path(__file__).parent.parent.absolute()
# Load .env.local first, then fallback to .env.
# Also check CWD so deployments that install into site-packages (e.g. the
# self-updater's release slots) can ship a per-host .env.local at the
# process's working directory without having to edit site-packages.
load_dotenv(_ROOT / ".env.local")
load_dotenv(_ROOT / ".env")
load_dotenv(Path.cwd() / ".env.local")
load_dotenv(Path.cwd() / ".env")
def _port(name: str, default: int) -> int:
raw = os.environ.get(name, str(default))
try:
value = int(raw)
except ValueError:
raise ValueError(f"Environment variable '{name}' must be an integer, got '{raw}'.")
if not (1 <= value <= 65535):
raise ValueError(f"Environment variable '{name}' must be 165535, got {value}.")
return value
def _require_env(name: str) -> str:
"""Return the env var value or raise at startup if it is unset or a known-bad default."""
_KNOWN_BAD = {"fallback-secret-key-change-me", "admin", "secret", "password", "changeme"}
value = os.environ.get(name)
if not value:
raise ValueError(
f"Required environment variable '{name}' is not set. "
f"Set it in .env.local or export it before starting DECNET."
)
if any(k.startswith("PYTEST") for k in os.environ):
return value
if value.lower() in _KNOWN_BAD:
raise ValueError(
f"Environment variable '{name}' is set to an insecure default ('{value}'). "
f"Choose a strong, unique value before starting DECNET."
)
if name == "DECNET_JWT_SECRET" and len(value) < 32:
_developer = os.environ.get("DECNET_DEVELOPER", "False").lower() == "true"
if not _developer:
raise ValueError(
f"DECNET_JWT_SECRET is too short ({len(value)} bytes). "
f"Use at least 32 characters to satisfy HS256 requirements (RFC 7518 §3.2)."
)
return value
# System logging — all microservice daemons append here.
DECNET_SYSTEM_LOGS: str = os.environ.get("DECNET_SYSTEM_LOGS", "decnet.system.log")
# Set to "true" to embed the profiler inside the API process.
# Leave unset (default) when the standalone `decnet profiler --daemon` is
# running — embedding both produces two workers sharing the same DB cursor,
# which causes events to be skipped or processed twice.
DECNET_EMBED_PROFILER: bool = os.environ.get("DECNET_EMBED_PROFILER", "").lower() == "true"
# Set to "true" to embed the MACVLAN sniffer inside the API process.
# Leave unset (default) when the standalone `decnet sniffer --daemon` is
# running (which `decnet deploy` always does). Embedding both produces two
# workers sniffing the same interface — duplicated events and wasted CPU.
DECNET_EMBED_SNIFFER: bool = os.environ.get("DECNET_EMBED_SNIFFER", "").lower() == "true"
# Set to "true" to mount the Pyinstrument ASGI middleware on the FastAPI app.
# Produces per-request HTML flamegraphs under ./profiles/. Off by default so
# production and normal dev runs pay zero profiling overhead.
DECNET_PROFILE_REQUESTS: bool = os.environ.get("DECNET_PROFILE_REQUESTS", "").lower() == "true"
DECNET_PROFILE_DIR: str = os.environ.get("DECNET_PROFILE_DIR", "profiles")
# API Options
DECNET_API_HOST: str = os.environ.get("DECNET_API_HOST", "127.0.0.1")
DECNET_API_PORT: int = _port("DECNET_API_PORT", 8000)
# DECNET_JWT_SECRET is resolved lazily via module __getattr__ so that agent /
# updater / swarmctl subcommands (which never touch auth) can start without
# the master's JWT secret being present in the environment.
DECNET_INGEST_LOG_FILE: str | None = os.environ.get("DECNET_INGEST_LOG_FILE", "/var/log/decnet/decnet.log")
# SWARM log pipeline — RFC 5425 syslog-over-TLS between worker forwarders
# and the master listener. Plaintext syslog across hosts is forbidden.
DECNET_SWARM_SYSLOG_PORT: int = _port("DECNET_SWARM_SYSLOG_PORT", 6514)
DECNET_SWARM_MASTER_HOST: str | None = os.environ.get("DECNET_SWARM_MASTER_HOST")
# Worker-side identity + swarmctl locator, seeded by the enroll bundle's
# /etc/decnet/decnet.ini ([agent] host-uuid / master-host / swarmctl-port).
# The agent heartbeat loop uses these to self-identify to the master.
DECNET_HOST_UUID: str | None = os.environ.get("DECNET_HOST_UUID")
DECNET_MASTER_HOST: str | None = os.environ.get("DECNET_MASTER_HOST")
DECNET_SWARMCTL_PORT: int = _port("DECNET_SWARMCTL_PORT", 8770)
# Ingester batching: how many log rows to accumulate per commit, and the
# max wait (ms) before flushing a partial batch. Larger batches reduce
# SQLite write-lock contention; the timeout keeps latency bounded during
# low-traffic periods.
DECNET_BATCH_SIZE: int = int(os.environ.get("DECNET_BATCH_SIZE", "100"))
DECNET_BATCH_MAX_WAIT_MS: int = int(os.environ.get("DECNET_BATCH_MAX_WAIT_MS", "250"))
# Web Dashboard Options
DECNET_WEB_HOST: str = os.environ.get("DECNET_WEB_HOST", "127.0.0.1")
DECNET_WEB_PORT: int = _port("DECNET_WEB_PORT", 8080)
DECNET_ADMIN_USER: str = os.environ.get("DECNET_ADMIN_USER", "admin")
DECNET_ADMIN_PASSWORD: str = os.environ.get("DECNET_ADMIN_PASSWORD", "admin")
DECNET_DEVELOPER: bool = os.environ.get("DECNET_DEVELOPER", "False").lower() == "true"
# Host role — seeded by /etc/decnet/decnet.ini or exported directly.
# "master" = the central server (api, web, swarmctl, listener).
# "agent" = a worker node (agent, forwarder, updater). Workers gate their
# Typer CLI to hide master-only commands (see decnet/cli.py).
DECNET_MODE: str = os.environ.get("DECNET_MODE", "master").lower()
# When mode=agent, hide master-only Typer commands. Set to "false" for dual-
# role dev hosts where a single machine plays both sides.
DECNET_DISALLOW_MASTER: bool = (
os.environ.get("DECNET_DISALLOW_MASTER", "true").lower() == "true"
)
# Tracing — set to "true" to enable OpenTelemetry distributed tracing.
# Separate from DECNET_DEVELOPER so tracing can be toggled independently.
DECNET_DEVELOPER_TRACING: bool = os.environ.get("DECNET_DEVELOPER_TRACING", "").lower() == "true"
DECNET_OTEL_ENDPOINT: str = os.environ.get("DECNET_OTEL_ENDPOINT", "http://localhost:4317")
# Database Options
DECNET_DB_TYPE: str = os.environ.get("DECNET_DB_TYPE", "sqlite").lower()
DECNET_DB_URL: Optional[str] = os.environ.get("DECNET_DB_URL")
# MySQL component vars (used only when DECNET_DB_URL is not set)
DECNET_DB_HOST: str = os.environ.get("DECNET_DB_HOST", "localhost")
DECNET_DB_PORT: int = _port("DECNET_DB_PORT", 3306) if os.environ.get("DECNET_DB_PORT") else 3306
DECNET_DB_NAME: str = os.environ.get("DECNET_DB_NAME", "decnet")
DECNET_DB_USER: str = os.environ.get("DECNET_DB_USER", "decnet")
DECNET_DB_PASSWORD: Optional[str] = os.environ.get("DECNET_DB_PASSWORD")
# CORS — comma-separated list of allowed origins for the web dashboard API.
# Defaults to the configured web host/port. Override with DECNET_CORS_ORIGINS if needed.
# Example: DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com
_WILDCARD_ADDRS = {"0.0.0.0", "127.0.0.1", "::"} # nosec B104 — comparison only, not a bind
_web_hostname: str = "localhost" if DECNET_WEB_HOST in _WILDCARD_ADDRS else DECNET_WEB_HOST
_cors_default: str = f"http://{_web_hostname}:{DECNET_WEB_PORT}"
_cors_raw: str = os.environ.get("DECNET_CORS_ORIGINS", _cors_default)
DECNET_CORS_ORIGINS: list[str] = [o.strip() for o in _cors_raw.split(",") if o.strip()]
def __getattr__(name: str) -> str:
"""Lazy resolution for secrets only the master web/api process needs."""
if name == "DECNET_JWT_SECRET":
return _require_env("DECNET_JWT_SECRET")
raise AttributeError(f"module 'decnet.env' has no attribute {name!r}")

177
decnet/fleet.py Normal file
View File

@@ -0,0 +1,177 @@
"""
Fleet builder — shared logic for constructing DeckyConfig lists.
Used by both the CLI and the web API router to build deckies from
flags or INI config. Lives here (not in cli.py) so that the web layer
and the mutation engine can import it without depending on the CLI.
"""
import random
from typing import Optional
from decnet.archetypes import Archetype, get_archetype
from decnet.config import DeckyConfig, random_hostname
from decnet.distros import all_distros, get_distro, random_distro
from decnet.models import IniConfig
from decnet.services.registry import all_services
def all_service_names() -> list[str]:
"""Return all registered per-decky service names (excludes fleet singletons)."""
return sorted(
name for name, svc in all_services().items()
if not svc.fleet_singleton
)
def resolve_distros(
distros_explicit: list[str] | None,
randomize_distros: bool,
n: int,
archetype: Archetype | None = None,
) -> list[str]:
"""Return a list of n distro slugs based on flags or archetype preference."""
if distros_explicit:
return [distros_explicit[i % len(distros_explicit)] for i in range(n)]
if randomize_distros:
return [random_distro().slug for _ in range(n)]
if archetype:
pool = archetype.preferred_distros
return [pool[i % len(pool)] for i in range(n)]
slugs = list(all_distros().keys())
return [slugs[i % len(slugs)] for i in range(n)]
def build_deckies(
n: int,
ips: list[str],
services_explicit: list[str] | None,
randomize_services: bool,
distros_explicit: list[str] | None = None,
randomize_distros: bool = False,
archetype: Archetype | None = None,
mutate_interval: Optional[int] = None,
) -> list[DeckyConfig]:
"""Build a list of DeckyConfigs from CLI-style flags."""
deckies = []
used_combos: set[frozenset] = set()
distro_slugs = resolve_distros(distros_explicit, randomize_distros, n, archetype)
for i, ip in enumerate(ips):
name = f"decky-{i + 1:02d}"
distro = get_distro(distro_slugs[i])
hostname = random_hostname(distro.slug)
if services_explicit:
svc_list = services_explicit
elif archetype:
svc_list = list(archetype.services)
elif randomize_services:
svc_pool = all_service_names()
attempts = 0
while True:
count = random.randint(1, min(3, len(svc_pool))) # nosec B311
chosen = frozenset(random.sample(svc_pool, count)) # nosec B311
attempts += 1
if chosen not in used_combos or attempts > 20:
break
svc_list = list(chosen)
used_combos.add(chosen)
else:
raise ValueError("Provide services_explicit, archetype, or randomize_services=True.")
deckies.append(
DeckyConfig(
name=name,
ip=ip,
services=svc_list,
distro=distro.slug,
base_image=distro.image,
build_base=distro.build_base,
hostname=hostname,
archetype=archetype.slug if archetype else None,
nmap_os=archetype.nmap_os if archetype else "linux",
mutate_interval=mutate_interval,
)
)
return deckies
def build_deckies_from_ini(
ini: IniConfig,
subnet_cidr: str,
gateway: str,
host_ip: str,
randomize: bool,
cli_mutate_interval: int | None = None,
) -> list[DeckyConfig]:
"""Build DeckyConfig list from an IniConfig, auto-allocating missing IPs."""
from ipaddress import IPv4Address, IPv4Network
import time
now = time.time()
explicit_ips: set[IPv4Address] = {
IPv4Address(s.ip) for s in ini.deckies if s.ip
}
net = IPv4Network(subnet_cidr, strict=False)
reserved = {
net.network_address,
net.broadcast_address,
IPv4Address(gateway),
IPv4Address(host_ip),
} | explicit_ips
auto_pool = (str(addr) for addr in net.hosts() if addr not in reserved)
deckies: list[DeckyConfig] = []
for spec in ini.deckies:
arch: Archetype | None = None
if spec.archetype:
arch = get_archetype(spec.archetype)
distro_pool = arch.preferred_distros if arch else list(all_distros().keys())
distro = get_distro(distro_pool[len(deckies) % len(distro_pool)])
hostname = random_hostname(distro.slug)
ip = spec.ip or next(auto_pool, None)
if ip is None:
raise ValueError(f"Not enough free IPs in {subnet_cidr} while assigning IP for '{spec.name}'.")
if spec.services:
known = set(all_service_names())
unknown = [s for s in spec.services if s not in known]
if unknown:
raise ValueError(
f"Unknown service(s) in [{spec.name}]: {unknown}. "
f"Available: {all_service_names()}"
)
svc_list = spec.services
elif arch:
svc_list = list(arch.services)
elif randomize or (not spec.services and not arch):
svc_pool = all_service_names()
count = random.randint(1, min(3, len(svc_pool))) # nosec B311
svc_list = random.sample(svc_pool, count) # nosec B311
resolved_nmap_os = spec.nmap_os or (arch.nmap_os if arch else "linux")
decky_mutate_interval = cli_mutate_interval
if decky_mutate_interval is None:
decky_mutate_interval = spec.mutate_interval if spec.mutate_interval is not None else ini.mutate_interval
deckies.append(DeckyConfig(
name=spec.name,
ip=ip,
services=svc_list,
distro=distro.slug,
base_image=distro.image,
build_base=distro.build_base,
hostname=hostname,
archetype=arch.slug if arch else None,
service_config=spec.service_config,
nmap_os=resolved_nmap_os,
mutate_interval=decky_mutate_interval,
last_mutated=now,
))
return deckies

View File

@@ -6,7 +6,6 @@ Format:
net=192.168.1.0/24
gw=192.168.1.1
interface=wlp6s0
log_target=192.168.1.5:5140 # optional
[hostname-1]
ip=192.168.1.82 # optional
@@ -42,37 +41,8 @@ Format:
"""
import configparser
from dataclasses import dataclass, field
from pathlib import Path
@dataclass
class DeckySpec:
name: str
ip: str | None = None
services: list[str] | None = None
archetype: str | None = None
service_config: dict[str, dict] = field(default_factory=dict)
nmap_os: str | None = None # explicit OS family override (linux/windows/bsd/embedded/cisco)
@dataclass
class CustomServiceSpec:
"""Spec for a user-defined (bring-your-own) service."""
name: str # service slug, e.g. "myservice" (section is "custom-myservice")
image: str # Docker image to use
exec_cmd: str # command to run inside the container
ports: list[int] = field(default_factory=list)
@dataclass
class IniConfig:
subnet: str | None = None
gateway: str | None = None
interface: str | None = None
log_target: str | None = None
deckies: list[DeckySpec] = field(default_factory=list)
custom_services: list[CustomServiceSpec] = field(default_factory=list)
from decnet.models import IniConfig, DeckySpec, CustomServiceSpec, validate_ini_string # noqa: F401
def load_ini(path: str | Path) -> IniConfig:
@@ -81,7 +51,21 @@ def load_ini(path: str | Path) -> IniConfig:
read = cp.read(str(path))
if not read:
raise FileNotFoundError(f"Config file not found: {path}")
return _parse_configparser(cp)
def load_ini_from_string(content: str) -> IniConfig:
"""Parse a DECNET INI string and return an IniConfig."""
# Normalize line endings (CRLF → LF, bare CR → LF) so the validator
# and configparser both see the same line boundaries.
content = content.replace('\r\n', '\n').replace('\r', '\n')
validate_ini_string(content)
cp = configparser.ConfigParser(strict=False)
cp.read_string(content)
return _parse_configparser(cp)
def _parse_configparser(cp: configparser.ConfigParser) -> IniConfig:
cfg = IniConfig()
if cp.has_section("general"):
@@ -89,14 +73,24 @@ def load_ini(path: str | Path) -> IniConfig:
cfg.subnet = g.get("net")
cfg.gateway = g.get("gw")
cfg.interface = g.get("interface")
cfg.log_target = g.get("log_target") or g.get("log-target")
from decnet.services.registry import all_services
known_services = set(all_services().keys())
# First pass: collect decky sections and custom service definitions
for section in cp.sections():
if section == "general":
continue
# A service sub-section is identified if the section name has at least one dot
# AND the last segment is a known service name.
# e.g. "decky-01.ssh" -> sub-section
# e.g. "decky.webmail" -> decky section (if "webmail" is not a service)
if "." in section:
continue # subsections handled in second pass
_, _, last_segment = section.rpartition(".")
if last_segment in known_services:
continue # sub-section handled in second pass
if section.startswith("custom-"):
# Bring-your-own service definition
s = cp[section]
@@ -115,17 +109,30 @@ def load_ini(path: str | Path) -> IniConfig:
services = [sv.strip() for sv in svc_raw.split(",")] if svc_raw else None
archetype = s.get("archetype")
nmap_os = s.get("nmap_os") or s.get("nmap-os") or None
mi_raw = s.get("mutate_interval") or s.get("mutate-interval")
mutate_interval = None
if mi_raw:
try:
mutate_interval = int(mi_raw)
except ValueError:
raise ValueError(f"[{section}] mutate_interval= must be an integer, got '{mi_raw}'")
amount_raw = s.get("amount", "1")
try:
amount = int(amount_raw)
if amount < 1:
raise ValueError
except ValueError:
if amount > 100:
raise ValueError(f"[{section}] amount={amount} exceeds maximum allowed (100).")
except ValueError as e:
if "exceeds maximum" in str(e):
raise e
raise ValueError(f"[{section}] amount= must be a positive integer, got '{amount_raw}'")
if amount == 1:
cfg.deckies.append(DeckySpec(
name=section, ip=ip, services=services, archetype=archetype, nmap_os=nmap_os,
name=section, ip=ip, services=services, archetype=archetype, nmap_os=nmap_os, mutate_interval=mutate_interval,
))
else:
# Expand into N deckies; explicit ip is ignored (can't share one IP)
@@ -141,6 +148,7 @@ def load_ini(path: str | Path) -> IniConfig:
services=services,
archetype=archetype,
nmap_os=nmap_os,
mutate_interval=mutate_interval,
))
# Second pass: collect per-service subsections [decky-name.service]
@@ -149,7 +157,11 @@ def load_ini(path: str | Path) -> IniConfig:
for section in cp.sections():
if "." not in section:
continue
decky_name, _, svc_name = section.partition(".")
decky_name, dot, svc_name = section.rpartition(".")
if svc_name not in known_services:
continue # not a service sub-section
svc_cfg = {k: v for k, v in cp[section].items()}
if decky_name in decky_map:
# Direct match — single decky

View File

@@ -0,0 +1,92 @@
"""
DECNET application logging helpers.
Usage:
from decnet.logging import get_logger
log = get_logger("engine") # APP-NAME in RFC 5424 output becomes "engine"
The returned logger propagates to the root logger (configured in config.py with
Rfc5424Formatter), so level control via DECNET_DEVELOPER still applies globally.
When ``DECNET_DEVELOPER_TRACING`` is active, every LogRecord is enriched with
``otel_trace_id`` and ``otel_span_id`` from the current OTEL span context.
This lets you correlate log lines with Jaeger traces — click a log entry and
jump straight to the span that produced it.
"""
from __future__ import annotations
import logging
class _ComponentFilter(logging.Filter):
"""Injects *decnet_component* onto every LogRecord so Rfc5424Formatter can
use it as the RFC 5424 APP-NAME field instead of the hardcoded "decnet"."""
def __init__(self, component: str) -> None:
super().__init__()
self.component = component
def filter(self, record: logging.LogRecord) -> bool:
record.decnet_component = self.component # type: ignore[attr-defined]
return True
class _TraceContextFilter(logging.Filter):
"""Injects ``otel_trace_id`` and ``otel_span_id`` onto every LogRecord
from the active OTEL span context.
Installed once by ``enable_trace_context()`` on the root ``decnet`` logger
so all child loggers inherit the enrichment via propagation.
When no span is active, both fields are set to ``"0"`` (cheap string
comparison downstream, no None-checks needed).
"""
def filter(self, record: logging.LogRecord) -> bool:
try:
from opentelemetry import trace
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx and ctx.trace_id:
record.otel_trace_id = format(ctx.trace_id, "032x") # type: ignore[attr-defined]
record.otel_span_id = format(ctx.span_id, "016x") # type: ignore[attr-defined]
else:
record.otel_trace_id = "0" # type: ignore[attr-defined]
record.otel_span_id = "0" # type: ignore[attr-defined]
except Exception:
record.otel_trace_id = "0" # type: ignore[attr-defined]
record.otel_span_id = "0" # type: ignore[attr-defined]
return True
_trace_filter_installed: bool = False
def enable_trace_context() -> None:
"""Install the OTEL trace-context filter on the root ``decnet`` logger.
Called once from ``decnet.telemetry.setup_tracing()`` after the
TracerProvider is initialised. Safe to call multiple times (idempotent).
"""
global _trace_filter_installed
if _trace_filter_installed:
return
root = logging.getLogger("decnet")
root.addFilter(_TraceContextFilter())
_trace_filter_installed = True
def get_logger(component: str) -> logging.Logger:
"""Return a named logger that self-identifies as *component* in RFC 5424.
Valid components: cli, engine, api, mutator, collector.
The logger is named ``decnet.<component>`` and propagates normally, so the
root handler (Rfc5424Formatter + level gate from DECNET_DEVELOPER) handles
output. Calling this function multiple times for the same component is safe.
"""
logger = logging.getLogger(f"decnet.{component}")
if not any(isinstance(f, _ComponentFilter) for f in logger.filters):
logger.addFilter(_ComponentFilter(component))
return logger

View File

@@ -1,4 +1,3 @@
from __future__ import annotations
"""
Rotating file handler for DECNET syslog output.
@@ -7,34 +6,44 @@ Path is controlled by the DECNET_LOG_FILE environment variable
(default: /var/log/decnet/decnet.log).
"""
from __future__ import annotations
import logging
import logging.handlers
import os
from pathlib import Path
from decnet.logging.inode_aware_handler import InodeAwareRotatingFileHandler
from decnet.privdrop import chown_to_invoking_user, chown_tree_to_invoking_user
from decnet.telemetry import traced as _traced
_LOG_FILE_ENV = "DECNET_LOG_FILE"
_DEFAULT_LOG_FILE = "/var/log/decnet/decnet.log"
_MAX_BYTES = 10 * 1024 * 1024 # 10 MB
_BACKUP_COUNT = 5
_handler: logging.handlers.RotatingFileHandler | None = None
_handler: InodeAwareRotatingFileHandler | None = None
_logger: logging.Logger | None = None
def _get_logger() -> logging.Logger:
@_traced("logging.init_file_handler")
def _init_file_handler() -> logging.Logger:
"""One-time initialisation of the rotating file handler."""
global _handler, _logger
if _logger is not None:
return _logger
log_path = Path(os.environ.get(_LOG_FILE_ENV, _DEFAULT_LOG_FILE))
log_path.parent.mkdir(parents=True, exist_ok=True)
# When running under sudo, hand the parent dir back to the invoking user
# so a subsequent non-root `decnet api` can also write to it.
chown_tree_to_invoking_user(log_path.parent)
_handler = logging.handlers.RotatingFileHandler(
_handler = InodeAwareRotatingFileHandler(
log_path,
maxBytes=_MAX_BYTES,
backupCount=_BACKUP_COUNT,
encoding="utf-8",
)
chown_to_invoking_user(log_path)
_handler.setFormatter(logging.Formatter("%(message)s"))
_logger = logging.getLogger("decnet.syslog")
@@ -45,14 +54,19 @@ def _get_logger() -> logging.Logger:
return _logger
def _get_logger() -> logging.Logger:
if _logger is not None:
return _logger
return _init_file_handler()
def write_syslog(line: str) -> None:
"""Write a single RFC 5424 syslog line to the rotating log file."""
try:
_get_logger().info(line)
except Exception:
except Exception: # nosec B110
pass
def get_log_path() -> Path:
"""Return the configured log file path (for tests/inspection)."""
return Path(os.environ.get(_LOG_FILE_ENV, _DEFAULT_LOG_FILE))

View File

@@ -11,6 +11,8 @@ shared utilities for validating and parsing the log_target string.
import socket
from decnet.telemetry import traced as _traced
def parse_log_target(log_target: str) -> tuple[str, int]:
"""
@@ -23,6 +25,7 @@ def parse_log_target(log_target: str) -> tuple[str, int]:
return parts[0], int(parts[1])
@_traced("logging.probe_log_target")
def probe_log_target(log_target: str, timeout: float = 2.0) -> bool:
"""
Return True if the log target is reachable (TCP connect succeeds).

View File

@@ -0,0 +1,60 @@
"""
RotatingFileHandler that detects external deletion or rotation.
Stdlib ``RotatingFileHandler`` holds an open file descriptor for the
lifetime of the handler. If the target file is deleted (``rm``) or
rotated out (``logrotate`` without ``copytruncate``), the handler keeps
writing to the now-orphaned inode until its own size-based rotation
finally triggers — silently losing every line in between.
Stdlib ``WatchedFileHandler`` solves exactly this problem but doesn't
rotate by size. This subclass combines both: before each emit we stat
the configured path and compare its inode/device to the currently open
file; on mismatch we close and reopen.
Cheap: one ``os.stat`` per log record. Matches the pattern used by
``decnet/collector/worker.py:_reopen_if_needed``.
"""
from __future__ import annotations
import logging
import logging.handlers
import os
class InodeAwareRotatingFileHandler(logging.handlers.RotatingFileHandler):
"""RotatingFileHandler that reopens the target on external rotation/deletion."""
def _should_reopen(self) -> bool:
if self.stream is None:
return True
try:
disk_stat = os.stat(self.baseFilename)
except FileNotFoundError:
return True
except OSError:
return False
try:
open_stat = os.fstat(self.stream.fileno())
except OSError:
return True
return (disk_stat.st_ino != open_stat.st_ino
or disk_stat.st_dev != open_stat.st_dev)
def emit(self, record: logging.LogRecord) -> None:
if self._should_reopen():
try:
if self.stream is not None:
self.close()
except Exception: # nosec B110
pass
try:
self.stream = self._open()
except OSError:
# A logging handler MUST NOT crash its caller. If we can't
# reopen (e.g. file is root-owned after `sudo decnet deploy`
# and the current process is non-root), defer to the stdlib
# error path, which just prints a traceback to stderr.
self.handleError(record)
return
super().emit(record)

View File

@@ -1,4 +1,3 @@
from __future__ import annotations
"""
RFC 5424 syslog formatter for DECNET.
@@ -6,16 +5,18 @@ Produces fully-compliant syslog messages:
<PRI>1 TIMESTAMP HOSTNAME APP-NAME PROCID MSGID [SD-ELEMENT] MSG
Facility: local0 (16)
PEN for structured data: decnet@55555
PEN for structured data: relay@55555
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any
FACILITY_LOCAL0 = 16
NILVALUE = "-"
_SD_ID = "decnet@55555"
_SD_ID = "relay@55555"
SEVERITY_INFO = 6
SEVERITY_WARNING = 4

123
decnet/models.py Normal file
View File

@@ -0,0 +1,123 @@
"""
DECNET Domain Models.
Centralized repository for all Pydantic specifications used throughout the project.
This file ensures that core domain logic has no dependencies on the web or database layers.
"""
from typing import Optional, List, Dict, Literal, Annotated, Any
from pydantic import BaseModel, ConfigDict, Field as PydanticField, field_validator, BeforeValidator
import configparser
# --- INI Specification Models ---
def validate_ini_string(v: Any) -> str:
"""Structural validator for DECNET INI strings using configparser."""
if not isinstance(v, str):
# This remains an internal type mismatch (caught by Pydantic usually)
raise ValueError("INI content must be a string")
# 512KB limit to prevent DoS/OOM
if len(v) > 512 * 1024:
raise ValueError("INI content is too large (max 512KB)")
if not v.strip():
# Using exact phrasing expected by tests
raise ValueError("INI content is empty")
parser = configparser.ConfigParser(interpolation=None, allow_no_value=True, strict=False)
try:
parser.read_string(v)
if not parser.sections():
raise ValueError("The provided INI content must contain at least one section (no sections found)")
except configparser.Error as e:
# If it's a generic parsing error, we check if it's effectively a "missing sections" error
if "no section headers" in str(e).lower():
raise ValueError("Invalid INI format: no sections found")
raise ValueError(f"Invalid INI format: {str(e)}")
return v
# Reusable type that enforces INI structure during initialization.
# Removed min_length=1 to make empty strings schema-compliant yet semantically invalid (mapped to 409).
IniContent = Annotated[str, BeforeValidator(validate_ini_string)]
class DeckySpec(BaseModel):
"""Configuration spec for a single decky as defined in the INI file."""
model_config = ConfigDict(strict=True, extra="forbid")
name: str = PydanticField(..., max_length=128, pattern=r"^[A-Za-z0-9\-_.]+$")
ip: Optional[str] = None
services: Optional[List[str]] = None
archetype: Optional[str] = None
service_config: Dict[str, Dict] = PydanticField(default_factory=dict)
nmap_os: Optional[str] = None
mutate_interval: Optional[int] = PydanticField(None, ge=1)
class CustomServiceSpec(BaseModel):
"""Spec for a user-defined (bring-your-own) service."""
model_config = ConfigDict(strict=True, extra="forbid")
name: str
image: str
exec_cmd: str
ports: List[int] = PydanticField(default_factory=list)
class IniConfig(BaseModel):
"""The complete structured representation of a DECNET INI file."""
model_config = ConfigDict(strict=True, extra="forbid")
subnet: Optional[str] = None
gateway: Optional[str] = None
interface: Optional[str] = None
mutate_interval: Optional[int] = PydanticField(None, ge=1)
deckies: List[DeckySpec] = PydanticField(default_factory=list, min_length=1)
custom_services: List[CustomServiceSpec] = PydanticField(default_factory=list)
@field_validator("deckies")
@classmethod
def at_least_one_decky(cls, v: List[DeckySpec]) -> List[DeckySpec]:
"""Ensure that an INI deployment always contains at least one machine."""
if not v:
raise ValueError("INI must contain at least one decky section")
return v
# --- Runtime Configuration Models ---
class DeckyConfig(BaseModel):
"""Full operational configuration for a deployed decky container."""
model_config = ConfigDict(strict=True, extra="forbid")
name: str
ip: str
services: list[str] = PydanticField(..., min_length=1)
distro: str # slug from distros.DISTROS, e.g. "debian", "ubuntu22"
base_image: str # Docker image for the base/IP-holder container
build_base: str = "debian:bookworm-slim" # apt-compatible image for service Dockerfiles
hostname: str
archetype: str | None = None # archetype slug if spawned from an archetype profile
service_config: dict[str, dict] = PydanticField(default_factory=dict)
nmap_os: str = "linux" # OS family for TCP/IP stack spoofing (see os_fingerprint.py)
mutate_interval: int | None = None # automatic rotation interval in minutes
last_mutated: float = 0.0 # timestamp of last mutation
last_login_attempt: float = 0.0 # timestamp of most recent interaction
# SWARM: the SwarmHost.uuid that runs this decky. None in unihost mode
# so existing state files deserialize unchanged.
host_uuid: str | None = None
@field_validator("services")
@classmethod
def services_not_empty(cls, v: list[str]) -> list[str]:
if not v:
raise ValueError("A decky must have at least one service.")
return v
class DecnetConfig(BaseModel):
"""Root configuration for the entire DECNET fleet deployment."""
mode: Literal["unihost", "swarm"]
interface: str
subnet: str
gateway: str
deckies: list[DeckyConfig] = PydanticField(..., min_length=1)
log_file: str | None = None # host path where the collector writes the log file
ipvlan: bool = False # use IPvlan L2 instead of MACVLAN (WiFi-friendly)
mutate_interval: int | None = 30 # global automatic rotation interval in minutes

View File

@@ -0,0 +1,3 @@
from decnet.mutator.engine import mutate_all, mutate_decky, run_watch_loop
__all__ = ["mutate_all", "mutate_decky", "run_watch_loop"]

147
decnet/mutator/engine.py Normal file
View File

@@ -0,0 +1,147 @@
"""
Mutation Engine for DECNET.
Handles dynamic rotation of exposed honeypot services over time.
"""
import random
import time
from typing import Optional
from rich.console import Console
from decnet.archetypes import get_archetype
from decnet.fleet import all_service_names
from decnet.composer import write_compose
from decnet.config import DeckyConfig, DecnetConfig
from decnet.engine import _compose_with_retry
from decnet.logging import get_logger
from decnet.telemetry import traced as _traced
from pathlib import Path
import anyio
import asyncio
from decnet.web.db.repository import BaseRepository
log = get_logger("mutator")
console = Console()
@_traced("mutator.mutate_decky")
async def mutate_decky(decky_name: str, repo: BaseRepository) -> bool:
"""
Perform an Intra-Archetype Shuffle for a specific decky.
Returns True if mutation succeeded, False otherwise.
"""
log.debug("mutate_decky: start decky=%s", decky_name)
state_dict = await repo.get_state("deployment")
if state_dict is None:
log.error("mutate_decky: no active deployment found in database")
console.print("[red]No active deployment found in database.[/]")
return False
config = DecnetConfig(**state_dict["config"])
compose_path = Path(state_dict["compose_path"])
decky: Optional[DeckyConfig] = next((d for d in config.deckies if d.name == decky_name), None)
if not decky:
console.print(f"[red]Decky '{decky_name}' not found in state.[/]")
return False
if decky.archetype:
try:
arch = get_archetype(decky.archetype)
svc_pool = list(arch.services)
except ValueError:
svc_pool = all_service_names()
else:
svc_pool = all_service_names()
if not svc_pool:
console.print(f"[yellow]No services available for mutating '{decky_name}'.[/]")
return False
current_services = set(decky.services)
attempts = 0
while True:
count = random.randint(1, min(3, len(svc_pool))) # nosec B311
chosen = set(random.sample(svc_pool, count)) # nosec B311
attempts += 1
if chosen != current_services or attempts > 20:
break
decky.services = list(chosen)
decky.last_mutated = time.time()
# Save to DB
await repo.set_state("deployment", {"config": config.model_dump(), "compose_path": str(compose_path)})
# Still writes files for Docker to use
write_compose(config, compose_path)
log.info("mutation applied decky=%s services=%s", decky_name, ",".join(decky.services))
console.print(f"[cyan]Mutating '{decky_name}' to services: {', '.join(decky.services)}[/]")
try:
# Wrap blocking call in thread
await anyio.to_thread.run_sync(_compose_with_retry, "up", "-d", "--remove-orphans", compose_path)
except Exception as e:
log.error("mutation failed decky=%s error=%s", decky_name, e)
console.print(f"[red]Failed to mutate '{decky_name}': {e}[/]")
return False
return True
@_traced("mutator.mutate_all")
async def mutate_all(repo: BaseRepository, force: bool = False) -> None:
"""
Check all deckies and mutate those that are due.
If force=True, mutates all deckies regardless of schedule.
"""
log.debug("mutate_all: start force=%s", force)
state_dict = await repo.get_state("deployment")
if state_dict is None:
log.error("mutate_all: no active deployment found")
console.print("[red]No active deployment found.[/]")
return
config = DecnetConfig(**state_dict["config"])
now = time.time()
mutated_count = 0
for decky in config.deckies:
interval_mins = decky.mutate_interval or config.mutate_interval
if interval_mins is None and not force:
continue
if force:
due = True
else:
elapsed_secs = now - decky.last_mutated
due = elapsed_secs >= (interval_mins * 60)
if due:
success = await mutate_decky(decky.name, repo=repo)
if success:
mutated_count += 1
if mutated_count == 0 and not force:
log.debug("mutate_all: no deckies due for mutation")
console.print("[dim]No deckies are due for mutation.[/]")
else:
log.info("mutate_all: complete mutated_count=%d", mutated_count)
@_traced("mutator.watch_loop")
async def run_watch_loop(repo: BaseRepository, poll_interval_secs: int = 10) -> None:
"""Run an infinite loop checking for deckies that need mutation."""
log.info("mutator watch loop started poll_interval_secs=%d", poll_interval_secs)
console.print(f"[green]DECNET Mutator Watcher started (polling every {poll_interval_secs}s).[/]")
try:
while True:
await mutate_all(force=False, repo=repo)
await asyncio.sleep(poll_interval_secs)
except KeyboardInterrupt:
log.info("mutator watch loop stopped")
console.print("\n[dim]Mutator watcher stopped.[/]")

View File

@@ -8,11 +8,8 @@ Handles:
- IP allocation (sequential, skipping reserved addresses)
"""
import ipaddress
import os
import shutil
import socket
import subprocess
import subprocess # nosec B404
from ipaddress import IPv4Address, IPv4Interface, IPv4Network
import docker
@@ -27,7 +24,7 @@ HOST_IPVLAN_IFACE = "decnet_ipvlan0"
# ---------------------------------------------------------------------------
def _run(cmd: list[str], check: bool = True) -> subprocess.CompletedProcess:
return subprocess.run(cmd, capture_output=True, text=True, check=check)
return subprocess.run(cmd, capture_output=True, text=True, check=check) # nosec B603 B404
def detect_interface() -> str:
@@ -129,22 +126,57 @@ def allocate_ips(
# Docker MACVLAN network
# ---------------------------------------------------------------------------
def create_macvlan_network(
def _ensure_network(
client: docker.DockerClient,
*,
driver: str,
interface: str,
subnet: str,
gateway: str,
ip_range: str,
extra_options: dict | None = None,
) -> None:
"""Create the MACVLAN Docker network. No-op if it already exists."""
existing = [n.name for n in client.networks.list()]
if MACVLAN_NETWORK_NAME in existing:
return
"""Create the decnet docker network with ``driver``, replacing any
existing network of the same name that was built with a different driver.
Why the replace-on-driver-mismatch: macvlan and ipvlan slaves can't
coexist on the same parent interface. If an earlier run left behind a
macvlan-driver network and we're now asked for ipvlan (or vice versa),
short-circuiting on name alone leaves Docker attaching new containers
to the old driver and the host NIC ends up EBUSY on the next port
create. So: when driver disagrees, disconnect everything and DROP it.
"""
options = {"parent": interface}
if extra_options:
options.update(extra_options)
for net in client.networks.list(names=[MACVLAN_NETWORK_NAME]):
if net.attrs.get("Driver") == driver:
# Same driver — but if the IPAM pool drifted (different subnet,
# gateway, or ip-range than this deploy asks for), reusing it
# hands out addresses from the old pool and we race the real LAN.
# Compare and rebuild on mismatch.
pools = (net.attrs.get("IPAM") or {}).get("Config") or []
cur = pools[0] if pools else {}
if (
cur.get("Subnet") == subnet
and cur.get("Gateway") == gateway
and cur.get("IPRange") == ip_range
):
return # right driver AND matching pool, leave it alone
# Driver mismatch OR IPAM drift — tear it down. Disconnect any live
# containers first so `remove()` doesn't refuse with ErrNetworkInUse.
for cid in (net.attrs.get("Containers") or {}):
try:
net.disconnect(cid, force=True)
except docker.errors.APIError:
pass
net.remove()
client.networks.create(
name=MACVLAN_NETWORK_NAME,
driver="macvlan",
options={"parent": interface},
driver=driver,
options=options,
ipam=docker.types.IPAMConfig(
driver="default",
pool_configs=[
@@ -158,6 +190,21 @@ def create_macvlan_network(
)
def create_macvlan_network(
client: docker.DockerClient,
interface: str,
subnet: str,
gateway: str,
ip_range: str,
) -> None:
"""Create the MACVLAN Docker network, replacing an ipvlan-driver one of
the same name if necessary (parent-NIC can't host both drivers)."""
_ensure_network(
client, driver="macvlan", interface=interface,
subnet=subnet, gateway=gateway, ip_range=ip_range,
)
def create_ipvlan_network(
client: docker.DockerClient,
interface: str,
@@ -165,25 +212,12 @@ def create_ipvlan_network(
gateway: str,
ip_range: str,
) -> None:
"""Create an IPvlan L2 Docker network. No-op if it already exists."""
existing = [n.name for n in client.networks.list()]
if MACVLAN_NETWORK_NAME in existing:
return
client.networks.create(
name=MACVLAN_NETWORK_NAME,
driver="ipvlan",
options={"parent": interface, "ipvlan_mode": "l2"},
ipam=docker.types.IPAMConfig(
driver="default",
pool_configs=[
docker.types.IPAMPool(
subnet=subnet,
gateway=gateway,
iprange=ip_range,
)
],
),
"""Create an IPvlan L2 Docker network, replacing a macvlan-driver one of
the same name if necessary (parent-NIC can't host both drivers)."""
_ensure_network(
client, driver="ipvlan", interface=interface,
subnet=subnet, gateway=gateway, ip_range=ip_range,
extra_options={"ipvlan_mode": "l2"},
)
@@ -207,10 +241,14 @@ def _require_root() -> None:
def setup_host_macvlan(interface: str, host_macvlan_ip: str, decky_ip_range: str) -> None:
"""
Create a macvlan interface on the host so the deployer can reach deckies.
Idempotent — skips steps that are already done.
Idempotent — skips steps that are already done. Drops a stale ipvlan
host-helper first: the two drivers can share a parent NIC on paper but
leaving the opposite helper in place is just cruft after a driver swap.
"""
_require_root()
_run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False)
# Check if interface already exists
result = _run(["ip", "link", "show", HOST_MACVLAN_IFACE], check=False)
if result.returncode != 0:
@@ -230,10 +268,14 @@ def teardown_host_macvlan(decky_ip_range: str) -> None:
def setup_host_ipvlan(interface: str, host_ipvlan_ip: str, decky_ip_range: str) -> None:
"""
Create an IPvlan interface on the host so the deployer can reach deckies.
Idempotent — skips steps that are already done.
Idempotent — skips steps that are already done. Drops a stale macvlan
host-helper first so a prior macvlan deploy doesn't leave its slave
dangling on the parent NIC after the driver swap.
"""
_require_root()
_run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False)
result = _run(["ip", "link", "show", HOST_IPVLAN_IFACE], check=False)
if result.returncode != 0:
_run(["ip", "link", "add", HOST_IPVLAN_IFACE, "link", interface, "type", "ipvlan", "mode", "l2"])

View File

@@ -5,17 +5,31 @@ Maps an nmap OS family slug to a dict of Linux kernel sysctls that, when applied
to a container's network namespace, make its TCP/IP stack behaviour resemble the
claimed OS as closely as possible within the Linux kernel's constraints.
All sysctls listed here are network-namespace-scoped and safe to set per-container
without --privileged (beyond the NET_ADMIN capability already granted).
Primary discriminator leveraged by nmap: net.ipv4.ip_default_ttl (TTL)
Linux → 64
Windows → 128
BSD (FreeBSD/macOS)→ 64 (different TCP options, but same TTL as Linux)
Embedded / network → 255
Secondary tuning (TCP behaviour):
Secondary discriminators (nmap OPS / WIN / ECN / T2T6 probe groups):
net.ipv4.tcp_syn_retries SYN retransmits before giving up
net.ipv4.tcp_timestamps TCP timestamp option (OPS probes); Windows = off
net.ipv4.tcp_window_scaling Window scale option; embedded/Cisco typically off
net.ipv4.tcp_sack Selective ACK option; absent on most embedded stacks
net.ipv4.tcp_ecn ECN negotiation; Linux offers (2), Windows off (0)
net.ipv4.ip_no_pmtu_disc DF bit in ICMP replies (IE probes); embedded on
net.ipv4.tcp_fin_timeout FIN_WAIT_2 seconds (T2T6 timing); Windows shorter
ICMP tuning (nmap IE / U1 probe groups):
net.ipv4.icmp_ratelimit Min ms between ICMP error replies; Windows = 0 (none)
net.ipv4.icmp_ratemask Bitmask of ICMP types subject to rate limiting
Note: net.core.rmem_default is a global (non-namespaced) sysctl and cannot be
set per-container without --privileged; it is intentionally excluded.
set per-container without --privileged; TCP window size is already correct for
Windows (64240) from the kernel's default tcp_rmem settings.
"""
from __future__ import annotations
@@ -24,27 +38,69 @@ OS_SYSCTLS: dict[str, dict[str, str]] = {
"linux": {
"net.ipv4.ip_default_ttl": "64",
"net.ipv4.tcp_syn_retries": "6",
"net.ipv4.tcp_timestamps": "1",
"net.ipv4.tcp_window_scaling": "1",
"net.ipv4.tcp_sack": "1",
"net.ipv4.tcp_ecn": "2",
"net.ipv4.ip_no_pmtu_disc": "0",
"net.ipv4.tcp_fin_timeout": "60",
"net.ipv4.icmp_ratelimit": "1000",
"net.ipv4.icmp_ratemask": "6168",
},
"windows": {
"net.ipv4.ip_default_ttl": "128",
"net.ipv4.tcp_syn_retries": "2",
"net.ipv4.tcp_timestamps": "0",
"net.ipv4.tcp_window_scaling": "1",
"net.ipv4.tcp_sack": "1",
"net.ipv4.tcp_ecn": "0",
"net.ipv4.ip_no_pmtu_disc": "0",
"net.ipv4.tcp_fin_timeout": "30",
"net.ipv4.icmp_ratelimit": "0",
"net.ipv4.icmp_ratemask": "0",
},
"bsd": {
"net.ipv4.ip_default_ttl": "64",
"net.ipv4.tcp_syn_retries": "6",
"net.ipv4.tcp_timestamps": "1",
"net.ipv4.tcp_window_scaling": "1",
"net.ipv4.tcp_sack": "1",
"net.ipv4.tcp_ecn": "0",
"net.ipv4.ip_no_pmtu_disc": "0",
"net.ipv4.tcp_fin_timeout": "60",
"net.ipv4.icmp_ratelimit": "250",
"net.ipv4.icmp_ratemask": "6168",
},
"embedded": {
"net.ipv4.ip_default_ttl": "255",
"net.ipv4.tcp_syn_retries": "3",
"net.ipv4.tcp_timestamps": "0",
"net.ipv4.tcp_window_scaling": "0",
"net.ipv4.tcp_sack": "0",
"net.ipv4.tcp_ecn": "0",
"net.ipv4.ip_no_pmtu_disc": "1",
"net.ipv4.tcp_fin_timeout": "15",
"net.ipv4.icmp_ratelimit": "0",
"net.ipv4.icmp_ratemask": "0",
},
"cisco": {
"net.ipv4.ip_default_ttl": "255",
"net.ipv4.tcp_syn_retries": "2",
"net.ipv4.tcp_timestamps": "0",
"net.ipv4.tcp_window_scaling": "0",
"net.ipv4.tcp_sack": "0",
"net.ipv4.tcp_ecn": "0",
"net.ipv4.ip_no_pmtu_disc": "1",
"net.ipv4.tcp_fin_timeout": "15",
"net.ipv4.icmp_ratelimit": "0",
"net.ipv4.icmp_ratemask": "0",
},
}
_DEFAULT_OS = "linux"
_REQUIRED_SYSCTLS: frozenset[str] = frozenset(OS_SYSCTLS["linux"].keys())
def get_os_sysctls(nmap_os: str) -> dict[str, str]:
"""Return the sysctl dict for *nmap_os*. Falls back to Linux on unknown slugs."""
@@ -54,3 +110,4 @@ def get_os_sysctls(nmap_os: str) -> dict[str, str]:
def all_os_families() -> list[str]:
"""Return all registered nmap OS family slugs."""
return list(OS_SYSCTLS.keys())

67
decnet/privdrop.py Normal file
View File

@@ -0,0 +1,67 @@
"""
Helpers for dropping root ownership on files created during privileged
operations (e.g. `sudo decnet deploy` needs root for MACVLAN, but its log
files should be owned by the invoking user so a subsequent non-root
`decnet api` can append to them).
When sudo invokes a process, it sets SUDO_UID / SUDO_GID in the
environment to the original user's IDs. We use those to chown files
back after creation.
"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Optional
def _sudo_ids() -> Optional[tuple[int, int]]:
"""Return (uid, gid) of the sudo-invoking user, or None when the
process was not launched via sudo / the env vars are missing."""
raw_uid = os.environ.get("SUDO_UID")
raw_gid = os.environ.get("SUDO_GID")
if not raw_uid or not raw_gid:
return None
try:
return int(raw_uid), int(raw_gid)
except ValueError:
return None
def chown_to_invoking_user(path: str | os.PathLike[str]) -> None:
"""Best-effort chown of *path* to the sudo-invoking user.
No-op when:
* not running as root (nothing to drop),
* not launched via sudo (no SUDO_UID/SUDO_GID),
* the path does not exist,
* chown fails (logged-only — never raises).
"""
if os.geteuid() != 0:
return
ids = _sudo_ids()
if ids is None:
return
uid, gid = ids
p = Path(path)
if not p.exists():
return
try:
os.chown(p, uid, gid)
except OSError:
# Best-effort; a failed chown is not fatal to logging.
pass
def chown_tree_to_invoking_user(root: str | os.PathLike[str]) -> None:
"""Apply :func:`chown_to_invoking_user` to *root* and every file/dir
beneath it. Used for parent directories that we just created with
``mkdir(parents=True)`` as root."""
if os.geteuid() != 0 or _sudo_ids() is None:
return
root_path = Path(root)
if not root_path.exists():
return
chown_to_invoking_user(root_path)
for entry in root_path.rglob("*"):
chown_to_invoking_user(entry)

13
decnet/prober/__init__.py Normal file
View File

@@ -0,0 +1,13 @@
"""
DECNET-PROBER — standalone active network probing service.
Runs as a detached host-level process (no container). Sends crafted TLS
probes to discover C2 frameworks and other attacker infrastructure via
JARM fingerprinting. Results are written as RFC 5424 syslog + JSON to the
same log file the collector uses, so the existing ingestion pipeline picks
them up automatically.
"""
from decnet.prober.worker import prober_worker
__all__ = ["prober_worker"]

252
decnet/prober/hassh.py Normal file
View File

@@ -0,0 +1,252 @@
"""
HASSHServer — SSH server fingerprinting via KEX_INIT algorithm ordering.
Connects to an SSH server, completes the version exchange, captures the
server's SSH_MSG_KEXINIT message, and hashes the server-to-client algorithm
fields (kex, encryption, MAC, compression) into a 32-character MD5 digest.
This is the *server* variant of HASSH (HASSHServer). It fingerprints what
the server *offers*, which identifies the SSH implementation (OpenSSH,
Paramiko, libssh, Cobalt Strike SSH, etc.).
Stdlib only (socket, struct, hashlib) plus decnet.telemetry for tracing (zero-cost when disabled).
"""
from __future__ import annotations
import hashlib
import socket
import struct
from typing import Any
from decnet.telemetry import traced as _traced
# SSH protocol constants
_SSH_MSG_KEXINIT = 20
_KEX_INIT_COOKIE_LEN = 16
_KEX_INIT_NAME_LISTS = 10 # 10 name-list fields in KEX_INIT
# Blend in as a normal OpenSSH client
_CLIENT_BANNER = b"SSH-2.0-OpenSSH_9.6\r\n"
# Max bytes to read for server banner
_MAX_BANNER_LEN = 256
# Max bytes for a single SSH packet (KEX_INIT is typically < 2KB)
_MAX_PACKET_LEN = 35000
# ─── SSH connection + KEX_INIT capture ──────────────────────────────────────
@_traced("prober.hassh_ssh_connect")
def _ssh_connect(
host: str,
port: int,
timeout: float,
) -> tuple[str, bytes] | None:
"""
TCP connect, exchange version strings, read server's KEX_INIT.
Returns (server_banner, kex_init_payload) or None on failure.
The kex_init_payload starts at the SSH_MSG_KEXINIT type byte.
"""
sock = None
try:
sock = socket.create_connection((host, port), timeout=timeout)
sock.settimeout(timeout)
# 1. Read server banner (line ending \r\n or \n)
banner = _read_banner(sock)
if banner is None or not banner.startswith("SSH-"):
return None
# 2. Send our client version string
sock.sendall(_CLIENT_BANNER)
# 3. Read the server's first binary packet (should be KEX_INIT)
payload = _read_ssh_packet(sock)
if payload is None or len(payload) < 1:
return None
if payload[0] != _SSH_MSG_KEXINIT:
return None
return (banner, payload)
except (OSError, socket.timeout, TimeoutError, ConnectionError):
return None
finally:
if sock is not None:
try:
sock.close()
except OSError:
pass
def _read_banner(sock: socket.socket) -> str | None:
"""Read the SSH version banner line from the socket."""
buf = b""
while len(buf) < _MAX_BANNER_LEN:
try:
byte = sock.recv(1)
except (OSError, socket.timeout, TimeoutError):
return None
if not byte:
return None
buf += byte
if buf.endswith(b"\n"):
break
try:
return buf.decode("utf-8", errors="replace").rstrip("\r\n")
except Exception:
return None
def _read_ssh_packet(sock: socket.socket) -> bytes | None:
"""
Read a single SSH binary packet and return its payload.
SSH binary packet format:
uint32 packet_length (not including itself or MAC)
byte padding_length
byte[] payload (packet_length - padding_length - 1)
byte[] padding
"""
header = _recv_exact(sock, 4)
if header is None:
return None
packet_length = struct.unpack("!I", header)[0]
if packet_length < 2 or packet_length > _MAX_PACKET_LEN:
return None
rest = _recv_exact(sock, packet_length)
if rest is None:
return None
padding_length = rest[0]
payload_length = packet_length - padding_length - 1
if payload_length < 1 or payload_length > len(rest) - 1:
return None
return rest[1 : 1 + payload_length]
def _recv_exact(sock: socket.socket, n: int) -> bytes | None:
"""Read exactly n bytes from socket, or None on failure."""
buf = b""
while len(buf) < n:
try:
chunk = sock.recv(n - len(buf))
except (OSError, socket.timeout, TimeoutError):
return None
if not chunk:
return None
buf += chunk
return buf
# ─── KEX_INIT parsing ──────────────────────────────────────────────────────
def _parse_kex_init(payload: bytes) -> dict[str, str] | None:
"""
Parse SSH_MSG_KEXINIT payload and extract the 10 name-list fields.
Payload layout:
byte SSH_MSG_KEXINIT (20)
byte[16] cookie
10 × name-list:
uint32 length
byte[] utf-8 string (comma-separated algorithm names)
bool first_kex_packet_follows
uint32 reserved
Returns dict with keys: kex_algorithms, server_host_key_algorithms,
encryption_client_to_server, encryption_server_to_client,
mac_client_to_server, mac_server_to_client,
compression_client_to_server, compression_server_to_client,
languages_client_to_server, languages_server_to_client.
"""
if len(payload) < 1 + _KEX_INIT_COOKIE_LEN + 4:
return None
offset = 1 + _KEX_INIT_COOKIE_LEN # skip type byte + cookie
field_names = [
"kex_algorithms",
"server_host_key_algorithms",
"encryption_client_to_server",
"encryption_server_to_client",
"mac_client_to_server",
"mac_server_to_client",
"compression_client_to_server",
"compression_server_to_client",
"languages_client_to_server",
"languages_server_to_client",
]
fields: dict[str, str] = {}
for name in field_names:
if offset + 4 > len(payload):
return None
length = struct.unpack("!I", payload[offset : offset + 4])[0]
offset += 4
if offset + length > len(payload):
return None
fields[name] = payload[offset : offset + length].decode(
"utf-8", errors="replace"
)
offset += length
return fields
# ─── HASSH computation ──────────────────────────────────────────────────────
def _compute_hassh(kex: str, enc: str, mac: str, comp: str) -> str:
"""
Compute HASSHServer hash: MD5 of "kex;enc_s2c;mac_s2c;comp_s2c".
Returns 32-character lowercase hex digest.
"""
raw = f"{kex};{enc};{mac};{comp}"
return hashlib.md5(raw.encode("utf-8"), usedforsecurity=False).hexdigest()
# ─── Public API ─────────────────────────────────────────────────────────────
@_traced("prober.hassh_server")
def hassh_server(
host: str,
port: int,
timeout: float = 5.0,
) -> dict[str, Any] | None:
"""
Connect to an SSH server and compute its HASSHServer fingerprint.
Returns a dict with the hash, banner, and raw algorithm fields,
or None if the host is not running an SSH server on the given port.
"""
result = _ssh_connect(host, port, timeout)
if result is None:
return None
banner, payload = result
fields = _parse_kex_init(payload)
if fields is None:
return None
kex = fields["kex_algorithms"]
enc = fields["encryption_server_to_client"]
mac = fields["mac_server_to_client"]
comp = fields["compression_server_to_client"]
return {
"hassh_server": _compute_hassh(kex, enc, mac, comp),
"banner": banner,
"kex_algorithms": kex,
"encryption_s2c": enc,
"mac_s2c": mac,
"compression_s2c": comp,
}

506
decnet/prober/jarm.py Normal file
View File

@@ -0,0 +1,506 @@
"""
JARM TLS fingerprinting — pure stdlib implementation.
JARM sends 10 crafted TLS ClientHello packets to a target, each varying
TLS version, cipher suite order, extensions, and ALPN values. The
ServerHello responses are parsed and hashed to produce a 62-character
fingerprint that identifies the TLS server implementation.
Reference: https://github.com/salesforce/jarm
Only DECNET import is decnet.telemetry for tracing (zero-cost when disabled).
"""
from __future__ import annotations
import hashlib
import socket
import struct
import time
from typing import Any
from decnet.telemetry import traced as _traced
# ─── Constants ────────────────────────────────────────────────────────────────
JARM_EMPTY_HASH = "0" * 62
_INTER_PROBE_DELAY = 0.1 # seconds between probes to avoid IDS triggers
# TLS version bytes
_TLS_1_0 = b"\x03\x01"
_TLS_1_1 = b"\x03\x02"
_TLS_1_2 = b"\x03\x03"
_TLS_1_3 = b"\x03\x03" # TLS 1.3 uses 0x0303 in record layer
# TLS record types
_CONTENT_HANDSHAKE = 0x16
_HANDSHAKE_CLIENT_HELLO = 0x01
_HANDSHAKE_SERVER_HELLO = 0x02
# Extension types
_EXT_SERVER_NAME = 0x0000
_EXT_EC_POINT_FORMATS = 0x000B
_EXT_SUPPORTED_GROUPS = 0x000A
_EXT_SESSION_TICKET = 0x0023
_EXT_ENCRYPT_THEN_MAC = 0x0016
_EXT_EXTENDED_MASTER_SECRET = 0x0017
_EXT_SIGNATURE_ALGORITHMS = 0x000D
_EXT_SUPPORTED_VERSIONS = 0x002B
_EXT_PSK_KEY_EXCHANGE_MODES = 0x002D
_EXT_KEY_SHARE = 0x0033
_EXT_ALPN = 0x0010
_EXT_PADDING = 0x0015
# ─── Cipher suite lists per JARM spec ────────────────────────────────────────
# Forward cipher order (standard)
_CIPHERS_FORWARD = [
0x0016, 0x0033, 0x0067, 0xC09E, 0xC0A2, 0x009E, 0x0039, 0x006B,
0xC09F, 0xC0A3, 0x009F, 0x0045, 0x00BE, 0x0088, 0x00C4, 0x009A,
0xC008, 0xC009, 0xC023, 0xC0AC, 0xC0AE, 0xC02B, 0xC00A, 0xC024,
0xC0AD, 0xC0AF, 0xC02C, 0xC072, 0xC073, 0xCCA8, 0x1301, 0x1302,
0x1303, 0xC013, 0xC014, 0xC02F, 0x009C, 0xC02E, 0x002F, 0x0035,
0x000A, 0x0005, 0x0004,
]
# Reverse cipher order
_CIPHERS_REVERSE = list(reversed(_CIPHERS_FORWARD))
# TLS 1.3-only ciphers
_CIPHERS_TLS13 = [0x1301, 0x1302, 0x1303]
# Middle-out cipher order (interleaved from center)
def _middle_out(lst: list[int]) -> list[int]:
result: list[int] = []
mid = len(lst) // 2
for i in range(mid + 1):
if mid + i < len(lst):
result.append(lst[mid + i])
if mid - i >= 0 and mid - i != mid + i:
result.append(lst[mid - i])
return result
_CIPHERS_MIDDLE_OUT = _middle_out(_CIPHERS_FORWARD)
# Rare/uncommon extensions cipher list
_CIPHERS_RARE = [
0x0016, 0x0033, 0xC011, 0xC012, 0x0067, 0xC09E, 0xC0A2, 0x009E,
0x0039, 0x006B, 0xC09F, 0xC0A3, 0x009F, 0x0045, 0x00BE, 0x0088,
0x00C4, 0x009A, 0xC008, 0xC009, 0xC023, 0xC0AC, 0xC0AE, 0xC02B,
0xC00A, 0xC024, 0xC0AD, 0xC0AF, 0xC02C, 0xC072, 0xC073, 0xCCA8,
0x1301, 0x1302, 0x1303, 0xC013, 0xC014, 0xC02F, 0x009C, 0xC02E,
0x002F, 0x0035, 0x000A, 0x0005, 0x0004,
]
# ─── Probe definitions ────────────────────────────────────────────────────────
# Each probe: (tls_version, cipher_list, tls13_support, alpn, extensions_style)
# tls_version: record-layer version bytes
# cipher_list: which cipher suite ordering to use
# tls13_support: whether to include TLS 1.3 extensions (supported_versions, key_share, psk)
# alpn: ALPN protocol string or None
# extensions_style: "standard", "rare", or "no_extensions"
_PROBE_CONFIGS: list[dict[str, Any]] = [
# 0: TLS 1.2 forward
{"version": _TLS_1_2, "ciphers": _CIPHERS_FORWARD, "tls13": False, "alpn": None, "style": "standard"},
# 1: TLS 1.2 reverse
{"version": _TLS_1_2, "ciphers": _CIPHERS_REVERSE, "tls13": False, "alpn": None, "style": "standard"},
# 2: TLS 1.1 forward
{"version": _TLS_1_1, "ciphers": _CIPHERS_FORWARD, "tls13": False, "alpn": None, "style": "standard"},
# 3: TLS 1.3 forward
{"version": _TLS_1_2, "ciphers": _CIPHERS_FORWARD, "tls13": True, "alpn": "h2", "style": "standard"},
# 4: TLS 1.3 reverse
{"version": _TLS_1_2, "ciphers": _CIPHERS_REVERSE, "tls13": True, "alpn": "h2", "style": "standard"},
# 5: TLS 1.3 invalid (advertise 1.3 support but no key_share)
{"version": _TLS_1_2, "ciphers": _CIPHERS_FORWARD, "tls13": "no_key_share", "alpn": None, "style": "standard"},
# 6: TLS 1.3 middle-out
{"version": _TLS_1_2, "ciphers": _CIPHERS_MIDDLE_OUT, "tls13": True, "alpn": None, "style": "standard"},
# 7: TLS 1.0 forward
{"version": _TLS_1_0, "ciphers": _CIPHERS_FORWARD, "tls13": False, "alpn": None, "style": "standard"},
# 8: TLS 1.2 middle-out
{"version": _TLS_1_2, "ciphers": _CIPHERS_MIDDLE_OUT, "tls13": False, "alpn": None, "style": "standard"},
# 9: TLS 1.2 with rare extensions
{"version": _TLS_1_2, "ciphers": _CIPHERS_RARE, "tls13": False, "alpn": "http/1.1", "style": "rare"},
]
# ─── Extension builders ──────────────────────────────────────────────────────
def _ext(ext_type: int, data: bytes) -> bytes:
return struct.pack("!HH", ext_type, len(data)) + data
def _ext_sni(host: str) -> bytes:
host_bytes = host.encode("ascii")
# ServerNameList: length(2) + ServerName: type(1) + length(2) + name
sni_data = struct.pack("!HBH", len(host_bytes) + 3, 0, len(host_bytes)) + host_bytes
return _ext(_EXT_SERVER_NAME, sni_data)
def _ext_supported_groups() -> bytes:
groups = [0x0017, 0x0018, 0x0019, 0x001D, 0x0100, 0x0101] # secp256r1, secp384r1, secp521r1, x25519, ffdhe2048, ffdhe3072
data = struct.pack("!H", len(groups) * 2) + b"".join(struct.pack("!H", g) for g in groups)
return _ext(_EXT_SUPPORTED_GROUPS, data)
def _ext_ec_point_formats() -> bytes:
formats = b"\x00" # uncompressed only
return _ext(_EXT_EC_POINT_FORMATS, struct.pack("B", len(formats)) + formats)
def _ext_signature_algorithms() -> bytes:
algos = [
0x0401, 0x0501, 0x0601, # RSA PKCS1 SHA256/384/512
0x0201, # RSA PKCS1 SHA1
0x0403, 0x0503, 0x0603, # ECDSA SHA256/384/512
0x0203, # ECDSA SHA1
0x0804, 0x0805, 0x0806, # RSA-PSS SHA256/384/512
]
data = struct.pack("!H", len(algos) * 2) + b"".join(struct.pack("!H", a) for a in algos)
return _ext(_EXT_SIGNATURE_ALGORITHMS, data)
def _ext_supported_versions_13() -> bytes:
versions = [0x0304, 0x0303] # TLS 1.3, 1.2
data = struct.pack("B", len(versions) * 2) + b"".join(struct.pack("!H", v) for v in versions)
return _ext(_EXT_SUPPORTED_VERSIONS, data)
def _ext_psk_key_exchange_modes() -> bytes:
return _ext(_EXT_PSK_KEY_EXCHANGE_MODES, b"\x01\x01") # psk_dhe_ke
def _ext_key_share() -> bytes:
# x25519 key share with 32 random-looking bytes
key_data = b"\x00" * 32
entry = struct.pack("!HH", 0x001D, 32) + key_data # x25519 group
data = struct.pack("!H", len(entry)) + entry
return _ext(_EXT_KEY_SHARE, data)
def _ext_alpn(protocol: str) -> bytes:
proto_bytes = protocol.encode("ascii")
proto_entry = struct.pack("B", len(proto_bytes)) + proto_bytes
data = struct.pack("!H", len(proto_entry)) + proto_entry
return _ext(_EXT_ALPN, data)
def _ext_session_ticket() -> bytes:
return _ext(_EXT_SESSION_TICKET, b"")
def _ext_encrypt_then_mac() -> bytes:
return _ext(_EXT_ENCRYPT_THEN_MAC, b"")
def _ext_extended_master_secret() -> bytes:
return _ext(_EXT_EXTENDED_MASTER_SECRET, b"")
def _ext_padding(target_length: int, current_length: int) -> bytes:
pad_needed = target_length - current_length - 4 # 4 bytes for ext type + length
if pad_needed < 0:
return b""
return _ext(_EXT_PADDING, b"\x00" * pad_needed)
# ─── ClientHello builder ─────────────────────────────────────────────────────
def _build_client_hello(probe_index: int, host: str = "localhost") -> bytes:
"""
Construct one of 10 JARM-specified ClientHello packets.
Args:
probe_index: 0-9, selects the probe configuration
host: target hostname for SNI extension
Returns:
Complete TLS record bytes ready to send on the wire.
"""
cfg = _PROBE_CONFIGS[probe_index]
version: bytes = cfg["version"]
ciphers: list[int] = cfg["ciphers"]
tls13 = cfg["tls13"]
alpn: str | None = cfg["alpn"]
# Random (32 bytes)
random_bytes = b"\x00" * 32
# Session ID (32 bytes, all zeros)
session_id = b"\x00" * 32
# Cipher suites
cipher_bytes = b"".join(struct.pack("!H", c) for c in ciphers)
cipher_data = struct.pack("!H", len(cipher_bytes)) + cipher_bytes
# Compression methods (null only)
compression = b"\x01\x00"
# Extensions
extensions = b""
extensions += _ext_sni(host)
extensions += _ext_supported_groups()
extensions += _ext_ec_point_formats()
extensions += _ext_session_ticket()
extensions += _ext_encrypt_then_mac()
extensions += _ext_extended_master_secret()
extensions += _ext_signature_algorithms()
if tls13 == True: # noqa: E712
extensions += _ext_supported_versions_13()
extensions += _ext_psk_key_exchange_modes()
extensions += _ext_key_share()
elif tls13 == "no_key_share":
extensions += _ext_supported_versions_13()
extensions += _ext_psk_key_exchange_modes()
# Intentionally omit key_share
if alpn:
extensions += _ext_alpn(alpn)
ext_data = struct.pack("!H", len(extensions)) + extensions
# ClientHello body
body = (
version # client_version (2)
+ random_bytes # random (32)
+ struct.pack("B", len(session_id)) + session_id # session_id
+ cipher_data # cipher_suites
+ compression # compression_methods
+ ext_data # extensions
)
# Handshake header: type(1) + length(3)
handshake = struct.pack("B", _HANDSHAKE_CLIENT_HELLO) + struct.pack("!I", len(body))[1:] + body
# TLS record header: type(1) + version(2) + length(2)
record = struct.pack("B", _CONTENT_HANDSHAKE) + _TLS_1_0 + struct.pack("!H", len(handshake)) + handshake
return record
# ─── ServerHello parser ──────────────────────────────────────────────────────
def _parse_server_hello(data: bytes) -> str:
"""
Extract cipher suite and TLS version from a ServerHello response.
Returns a pipe-delimited string "cipher|version|extensions" that forms
one component of the JARM hash, or "|||" on parse failure.
"""
try:
if len(data) < 6:
return "|||"
# TLS record header
if data[0] != _CONTENT_HANDSHAKE:
return "|||"
struct.unpack_from("!H", data, 1)[0] # record_version (unused)
record_len = struct.unpack_from("!H", data, 3)[0]
hs = data[5: 5 + record_len]
if len(hs) < 4:
return "|||"
# Handshake header
if hs[0] != _HANDSHAKE_SERVER_HELLO:
return "|||"
hs_len = struct.unpack_from("!I", b"\x00" + hs[1:4])[0]
body = hs[4: 4 + hs_len]
if len(body) < 34:
return "|||"
pos = 0
# Server version
server_version = struct.unpack_from("!H", body, pos)[0]
pos += 2
# Random (32 bytes)
pos += 32
# Session ID
if pos >= len(body):
return "|||"
sid_len = body[pos]
pos += 1 + sid_len
# Cipher suite
if pos + 2 > len(body):
return "|||"
cipher = struct.unpack_from("!H", body, pos)[0]
pos += 2
# Compression method
if pos >= len(body):
return "|||"
pos += 1
# Parse extensions for supported_versions (to detect actual TLS 1.3)
actual_version = server_version
extensions_str = ""
if pos + 2 <= len(body):
ext_total = struct.unpack_from("!H", body, pos)[0]
pos += 2
ext_end = pos + ext_total
ext_types: list[str] = []
while pos + 4 <= ext_end and pos + 4 <= len(body):
ext_type = struct.unpack_from("!H", body, pos)[0]
ext_len = struct.unpack_from("!H", body, pos + 2)[0]
ext_types.append(f"{ext_type:04x}")
if ext_type == _EXT_SUPPORTED_VERSIONS and ext_len >= 2:
actual_version = struct.unpack_from("!H", body, pos + 4)[0]
pos += 4 + ext_len
extensions_str = "-".join(ext_types)
version_str = _version_to_str(actual_version)
cipher_str = f"{cipher:04x}"
return f"{cipher_str}|{version_str}|{extensions_str}"
except Exception:
return "|||"
def _version_to_str(version: int) -> str:
return {
0x0304: "tls13",
0x0303: "tls12",
0x0302: "tls11",
0x0301: "tls10",
0x0300: "ssl30",
}.get(version, f"{version:04x}")
# ─── Probe sender ────────────────────────────────────────────────────────────
@_traced("prober.jarm_send_probe")
def _send_probe(host: str, port: int, hello: bytes, timeout: float = 5.0) -> bytes | None:
"""
Open a TCP connection, send the ClientHello, and read the ServerHello.
Returns raw response bytes or None on any failure.
"""
try:
sock = socket.create_connection((host, port), timeout=timeout)
try:
sock.sendall(hello)
sock.settimeout(timeout)
response = b""
while True:
chunk = sock.recv(1484)
if not chunk:
break
response += chunk
# We only need the first TLS record (ServerHello)
if len(response) >= 5:
record_len = struct.unpack_from("!H", response, 3)[0]
if len(response) >= 5 + record_len:
break
return response if response else None
finally:
sock.close()
except (OSError, socket.error, socket.timeout):
return None
# ─── JARM hash computation ───────────────────────────────────────────────────
def _compute_jarm(responses: list[str]) -> str:
"""
Compute the final 62-character JARM hash from 10 probe response strings.
The first 30 characters are the raw cipher/version concatenation.
The remaining 32 characters are a truncated SHA256 of the extensions.
"""
if all(r == "|||" for r in responses):
return JARM_EMPTY_HASH
# Build the fuzzy hash
raw_parts: list[str] = []
ext_parts: list[str] = []
for r in responses:
parts = r.split("|")
if len(parts) >= 3 and parts[0] != "":
cipher = parts[0]
version = parts[1]
extensions = parts[2] if len(parts) > 2 else ""
# Map version to single char
ver_char = {
"tls13": "d", "tls12": "c", "tls11": "b",
"tls10": "a", "ssl30": "0",
}.get(version, "0")
raw_parts.append(f"{cipher}{ver_char}")
ext_parts.append(extensions)
else:
raw_parts.append("000")
ext_parts.append("")
# First 30 chars: cipher(4) + version(1) = 5 chars * 10 probes = 50... no
# JARM spec: first part is c|v per probe joined, then SHA256 of extensions
# Actual format: each response contributes 3 chars (cipher_first2 + ver_char)
# to the first 30, then all extensions hashed for the remaining 32.
fuzzy_raw = ""
for r in responses:
parts = r.split("|")
if len(parts) >= 3 and parts[0] != "":
cipher = parts[0] # 4-char hex
version = parts[1]
ver_char = {
"tls13": "d", "tls12": "c", "tls11": "b",
"tls10": "a", "ssl30": "0",
}.get(version, "0")
fuzzy_raw += f"{cipher[0:2]}{ver_char}"
else:
fuzzy_raw += "000"
# fuzzy_raw is 30 chars (3 * 10)
ext_str = ",".join(ext_parts)
ext_hash = hashlib.sha256(ext_str.encode()).hexdigest()[:32]
return fuzzy_raw + ext_hash
# ─── Public API ──────────────────────────────────────────────────────────────
@_traced("prober.jarm_hash")
def jarm_hash(host: str, port: int, timeout: float = 5.0) -> str:
"""
Compute the JARM fingerprint for a TLS server.
Sends 10 crafted ClientHello packets and hashes the responses.
Args:
host: target IP or hostname
port: target port
timeout: per-probe TCP timeout in seconds
Returns:
62-character JARM hash string, or all-zeros on total failure.
"""
responses: list[str] = []
for i in range(10):
hello = _build_client_hello(i, host=host)
raw = _send_probe(host, port, hello, timeout=timeout)
if raw is not None:
parsed = _parse_server_hello(raw)
responses.append(parsed)
else:
responses.append("|||")
if i < 9:
time.sleep(_INTER_PROBE_DELAY)
return _compute_jarm(responses)

227
decnet/prober/tcpfp.py Normal file
View File

@@ -0,0 +1,227 @@
"""
TCP/IP stack fingerprinting via SYN-ACK analysis.
Sends a crafted TCP SYN packet to a target host:port, captures the
SYN-ACK response, and extracts OS/tool-identifying characteristics:
TTL, window size, DF bit, MSS, window scale, SACK support, timestamps,
and TCP options ordering.
Uses scapy for packet crafting and parsing. Requires root/CAP_NET_RAW.
"""
from __future__ import annotations
import hashlib
import random
from typing import Any
from decnet.telemetry import traced as _traced
# Lazy-import scapy to avoid breaking non-root usage of HASSH/JARM.
# The actual import happens inside functions that need it.
# ─── TCP option short codes ─────────────────────────────────────────────────
_OPT_CODES: dict[str, str] = {
"MSS": "M",
"WScale": "W",
"SAckOK": "S",
"SAck": "S",
"Timestamp": "T",
"NOP": "N",
"EOL": "E",
"AltChkSum": "A",
"AltChkSumOpt": "A",
"UTO": "U",
}
# ─── Packet construction ───────────────────────────────────────────────────
@_traced("prober.tcpfp_send_syn")
def _send_syn(
host: str,
port: int,
timeout: float,
) -> Any | None:
"""
Craft a TCP SYN with common options and send it. Returns the
SYN-ACK response packet or None on timeout/failure.
"""
from scapy.all import IP, TCP, conf, sr1
# Suppress scapy's noisy output
conf.verb = 0
src_port = random.randint(49152, 65535) # nosec B311 — ephemeral port, not crypto
pkt = (
IP(dst=host)
/ TCP(
sport=src_port,
dport=port,
flags="S",
options=[
("MSS", 1460),
("NOP", None),
("WScale", 7),
("NOP", None),
("NOP", None),
("Timestamp", (0, 0)),
("SAckOK", b""),
("EOL", None),
],
)
)
try:
resp = sr1(pkt, timeout=timeout, verbose=0)
except (OSError, PermissionError):
return None
if resp is None:
return None
# Verify it's a SYN-ACK (flags == 0x12)
from scapy.all import TCP as TCPLayer
if not resp.haslayer(TCPLayer):
return None
if resp[TCPLayer].flags != 0x12: # SYN-ACK
return None
# Send RST to clean up half-open connection
_send_rst(host, port, src_port, resp)
return resp
def _send_rst(
host: str,
dport: int,
sport: int,
resp: Any,
) -> None:
"""Send RST to clean up the half-open connection."""
try:
from scapy.all import IP, TCP, send
rst = (
IP(dst=host)
/ TCP(
sport=sport,
dport=dport,
flags="R",
seq=resp.ack,
)
)
send(rst, verbose=0)
except Exception: # nosec B110 — best-effort RST cleanup
pass
# ─── Response parsing ───────────────────────────────────────────────────────
def _parse_synack(resp: Any) -> dict[str, Any]:
"""
Extract fingerprint fields from a scapy SYN-ACK response packet.
"""
from scapy.all import IP, TCP
ip_layer = resp[IP]
tcp_layer = resp[TCP]
# IP fields
ttl = ip_layer.ttl
df_bit = 1 if (ip_layer.flags & 0x2) else 0 # DF = bit 1
ip_id = ip_layer.id
# TCP fields
window_size = tcp_layer.window
# Parse TCP options
mss = 0
window_scale = -1
sack_ok = 0
timestamp = 0
options_order = _extract_options_order(tcp_layer.options)
for opt_name, opt_value in tcp_layer.options:
if opt_name == "MSS":
mss = opt_value
elif opt_name == "WScale":
window_scale = opt_value
elif opt_name in ("SAckOK", "SAck"):
sack_ok = 1
elif opt_name == "Timestamp":
timestamp = 1
return {
"ttl": ttl,
"window_size": window_size,
"df_bit": df_bit,
"ip_id": ip_id,
"mss": mss,
"window_scale": window_scale,
"sack_ok": sack_ok,
"timestamp": timestamp,
"options_order": options_order,
}
def _extract_options_order(options: list[tuple[str, Any]]) -> str:
"""
Map scapy TCP option tuples to a short-code string.
E.g. [("MSS", 1460), ("NOP", None), ("WScale", 7)] → "M,N,W"
"""
codes = []
for opt_name, _ in options:
code = _OPT_CODES.get(opt_name, "?")
codes.append(code)
return ",".join(codes)
# ─── Fingerprint computation ───────────────────────────────────────────────
def _compute_fingerprint(fields: dict[str, Any]) -> tuple[str, str]:
"""
Compute fingerprint raw string and SHA256 hash from parsed fields.
Returns (raw_string, hash_hex_32).
"""
raw = (
f"{fields['ttl']}:{fields['window_size']}:{fields['df_bit']}:"
f"{fields['mss']}:{fields['window_scale']}:{fields['sack_ok']}:"
f"{fields['timestamp']}:{fields['options_order']}"
)
h = hashlib.sha256(raw.encode("utf-8")).hexdigest()[:32]
return raw, h
# ─── Public API ─────────────────────────────────────────────────────────────
@_traced("prober.tcp_fingerprint")
def tcp_fingerprint(
host: str,
port: int,
timeout: float = 5.0,
) -> dict[str, Any] | None:
"""
Send a TCP SYN to host:port and fingerprint the SYN-ACK response.
Returns a dict with the hash, raw fingerprint string, and individual
fields, or None if no SYN-ACK was received.
Requires root/CAP_NET_RAW.
"""
resp = _send_syn(host, port, timeout)
if resp is None:
return None
fields = _parse_synack(resp)
raw, h = _compute_fingerprint(fields)
return {
"tcpfp_hash": h,
"tcpfp_raw": raw,
**fields,
}

478
decnet/prober/worker.py Normal file
View File

@@ -0,0 +1,478 @@
"""
DECNET-PROBER standalone worker.
Runs as a detached host-level process. Discovers attacker IPs by tailing the
collector's JSON log file, then fingerprints them via multiple active probes:
- JARM (TLS server fingerprinting)
- HASSHServer (SSH server fingerprinting)
- TCP/IP stack fingerprinting (OS/tool identification)
Results are written as RFC 5424 syslog + JSON to the same log files.
Target discovery is fully automatic — every unique attacker IP seen in the
log stream gets probed. No manual target list required.
Tech debt: writing directly to the collector's log files couples the
prober to the collector's file format. A future refactor should introduce
a shared log-sink abstraction.
"""
from __future__ import annotations
import asyncio
import json
import re
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from decnet.logging import get_logger
from decnet.prober.hassh import hassh_server
from decnet.prober.jarm import JARM_EMPTY_HASH, jarm_hash
from decnet.prober.tcpfp import tcp_fingerprint
from decnet.telemetry import traced as _traced
logger = get_logger("prober")
# ─── Default ports per probe type ───────────────────────────────────────────
# JARM: common C2 callback / TLS server ports
DEFAULT_PROBE_PORTS: list[int] = [
443, 8443, 8080, 4443, 50050, 2222, 993, 995, 8888, 9001,
]
# HASSHServer: common SSH server ports
DEFAULT_SSH_PORTS: list[int] = [22, 2222, 22222, 2022]
# TCP/IP stack: probe on ports commonly open on attacker machines.
# Wide spread gives the best chance of a SYN-ACK for TTL/fingerprint extraction.
DEFAULT_TCPFP_PORTS: list[int] = [22, 80, 443, 8080, 8443, 445, 3389]
# ─── RFC 5424 formatting (inline, mirrors templates/*/decnet_logging.py) ─────
_FACILITY_LOCAL0 = 16
_SD_ID = "relay@55555"
_SEVERITY_INFO = 6
_SEVERITY_WARNING = 4
_MAX_HOSTNAME = 255
_MAX_APPNAME = 48
_MAX_MSGID = 32
def _sd_escape(value: str) -> str:
return value.replace("\\", "\\\\").replace('"', '\\"').replace("]", "\\]")
def _sd_element(fields: dict[str, Any]) -> str:
if not fields:
return "-"
params = " ".join(f'{k}="{_sd_escape(str(v))}"' for k, v in fields.items())
return f"[{_SD_ID} {params}]"
def _syslog_line(
event_type: str,
severity: int = _SEVERITY_INFO,
msg: str | None = None,
**fields: Any,
) -> str:
pri = f"<{_FACILITY_LOCAL0 * 8 + severity}>"
ts = datetime.now(timezone.utc).isoformat()
hostname = "decnet-prober"
appname = "prober"
msgid = (event_type or "-")[:_MAX_MSGID]
sd = _sd_element(fields)
message = f" {msg}" if msg else ""
return f"{pri}1 {ts} {hostname} {appname} - {msgid} {sd}{message}"
# ─── RFC 5424 parser (subset of collector's, for JSON generation) ─────────────
_RFC5424_RE = re.compile(
r"^<\d+>1 "
r"(\S+) " # 1: TIMESTAMP
r"(\S+) " # 2: HOSTNAME
r"(\S+) " # 3: APP-NAME
r"- " # PROCID
r"(\S+) " # 4: MSGID (event_type)
r"(.+)$", # 5: SD + MSG
)
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
_PARAM_RE = re.compile(r'(\w+)="((?:[^"\\]|\\.)*)"')
_IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "ip", "target_ip")
def _parse_to_json(line: str) -> dict[str, Any] | None:
m = _RFC5424_RE.match(line)
if not m:
return None
ts_raw, decky, service, event_type, sd_rest = m.groups()
fields: dict[str, str] = {}
msg = ""
if sd_rest.startswith("["):
block = _SD_BLOCK_RE.search(sd_rest)
if block:
for k, v in _PARAM_RE.findall(block.group(1)):
fields[k] = v.replace('\\"', '"').replace("\\\\", "\\").replace("\\]", "]")
msg_match = re.search(r'\]\s+(.+)$', sd_rest)
if msg_match:
msg = msg_match.group(1).strip()
attacker_ip = "Unknown"
for fname in _IP_FIELDS:
if fname in fields:
attacker_ip = fields[fname]
break
try:
ts_formatted = datetime.fromisoformat(ts_raw).strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
ts_formatted = ts_raw
return {
"timestamp": ts_formatted,
"decky": decky,
"service": service,
"event_type": event_type,
"attacker_ip": attacker_ip,
"fields": fields,
"msg": msg,
"raw_line": line,
}
# ─── Log writer ──────────────────────────────────────────────────────────────
def _write_event(
log_path: Path,
json_path: Path,
event_type: str,
severity: int = _SEVERITY_INFO,
msg: str | None = None,
**fields: Any,
) -> None:
line = _syslog_line(event_type, severity=severity, msg=msg, **fields)
with open(log_path, "a", encoding="utf-8") as f:
f.write(line + "\n")
f.flush()
parsed = _parse_to_json(line)
if parsed:
with open(json_path, "a", encoding="utf-8") as f:
f.write(json.dumps(parsed) + "\n")
f.flush()
# ─── Target discovery from log stream ────────────────────────────────────────
@_traced("prober.discover_attackers")
def _discover_attackers(json_path: Path, position: int) -> tuple[set[str], int]:
"""
Read new JSON log lines from the given position and extract unique
attacker IPs. Returns (new_ips, new_position).
Only considers IPs that are not "Unknown" and come from events that
indicate real attacker interaction (not prober's own events).
"""
new_ips: set[str] = set()
if not json_path.exists():
return new_ips, position
size = json_path.stat().st_size
if size < position:
position = 0 # file rotated
if size == position:
return new_ips, position
with open(json_path, "r", encoding="utf-8", errors="replace") as f:
f.seek(position)
while True:
line = f.readline()
if not line:
break
if not line.endswith("\n"):
break # partial line
try:
record = json.loads(line.strip())
except json.JSONDecodeError:
position = f.tell()
continue
# Skip our own events
if record.get("service") == "prober":
position = f.tell()
continue
ip = record.get("attacker_ip", "Unknown")
if ip != "Unknown" and ip:
new_ips.add(ip)
position = f.tell()
return new_ips, position
# ─── Probe cycle ─────────────────────────────────────────────────────────────
@_traced("prober.probe_cycle")
def _probe_cycle(
targets: set[str],
probed: dict[str, dict[str, set[int]]],
jarm_ports: list[int],
ssh_ports: list[int],
tcpfp_ports: list[int],
log_path: Path,
json_path: Path,
timeout: float = 5.0,
) -> None:
"""
Probe all known attacker IPs with JARM, HASSH, and TCP/IP fingerprinting.
Args:
targets: set of attacker IPs to probe
probed: dict mapping IP -> {probe_type -> set of ports already probed}
jarm_ports: TLS ports for JARM fingerprinting
ssh_ports: SSH ports for HASSHServer fingerprinting
tcpfp_ports: ports for TCP/IP stack fingerprinting
log_path: RFC 5424 log file
json_path: JSON log file
timeout: per-probe TCP timeout
"""
for ip in sorted(targets):
ip_probed = probed.setdefault(ip, {})
# Phase 1: JARM (TLS fingerprinting)
_jarm_phase(ip, ip_probed, jarm_ports, log_path, json_path, timeout)
# Phase 2: HASSHServer (SSH fingerprinting)
_hassh_phase(ip, ip_probed, ssh_ports, log_path, json_path, timeout)
# Phase 3: TCP/IP stack fingerprinting
_tcpfp_phase(ip, ip_probed, tcpfp_ports, log_path, json_path, timeout)
@_traced("prober.jarm_phase")
def _jarm_phase(
ip: str,
ip_probed: dict[str, set[int]],
ports: list[int],
log_path: Path,
json_path: Path,
timeout: float,
) -> None:
"""JARM-fingerprint an IP on the given TLS ports."""
done = ip_probed.setdefault("jarm", set())
for port in ports:
if port in done:
continue
try:
h = jarm_hash(ip, port, timeout=timeout)
done.add(port)
if h == JARM_EMPTY_HASH:
continue
_write_event(
log_path, json_path,
"jarm_fingerprint",
target_ip=ip,
target_port=str(port),
jarm_hash=h,
msg=f"JARM {ip}:{port} = {h}",
)
logger.info("prober: JARM %s:%d = %s", ip, port, h)
except Exception as exc:
done.add(port)
_write_event(
log_path, json_path,
"prober_error",
severity=_SEVERITY_WARNING,
target_ip=ip,
target_port=str(port),
error=str(exc),
msg=f"JARM probe failed for {ip}:{port}: {exc}",
)
logger.warning("prober: JARM probe failed %s:%d: %s", ip, port, exc)
@_traced("prober.hassh_phase")
def _hassh_phase(
ip: str,
ip_probed: dict[str, set[int]],
ports: list[int],
log_path: Path,
json_path: Path,
timeout: float,
) -> None:
"""HASSHServer-fingerprint an IP on the given SSH ports."""
done = ip_probed.setdefault("hassh", set())
for port in ports:
if port in done:
continue
try:
result = hassh_server(ip, port, timeout=timeout)
done.add(port)
if result is None:
continue
_write_event(
log_path, json_path,
"hassh_fingerprint",
target_ip=ip,
target_port=str(port),
hassh_server_hash=result["hassh_server"],
ssh_banner=result["banner"],
kex_algorithms=result["kex_algorithms"],
encryption_s2c=result["encryption_s2c"],
mac_s2c=result["mac_s2c"],
compression_s2c=result["compression_s2c"],
msg=f"HASSH {ip}:{port} = {result['hassh_server']}",
)
logger.info("prober: HASSH %s:%d = %s", ip, port, result["hassh_server"])
except Exception as exc:
done.add(port)
_write_event(
log_path, json_path,
"prober_error",
severity=_SEVERITY_WARNING,
target_ip=ip,
target_port=str(port),
error=str(exc),
msg=f"HASSH probe failed for {ip}:{port}: {exc}",
)
logger.warning("prober: HASSH probe failed %s:%d: %s", ip, port, exc)
@_traced("prober.tcpfp_phase")
def _tcpfp_phase(
ip: str,
ip_probed: dict[str, set[int]],
ports: list[int],
log_path: Path,
json_path: Path,
timeout: float,
) -> None:
"""TCP/IP stack fingerprint an IP on the given ports."""
done = ip_probed.setdefault("tcpfp", set())
for port in ports:
if port in done:
continue
try:
result = tcp_fingerprint(ip, port, timeout=timeout)
done.add(port)
if result is None:
continue
_write_event(
log_path, json_path,
"tcpfp_fingerprint",
target_ip=ip,
target_port=str(port),
tcpfp_hash=result["tcpfp_hash"],
tcpfp_raw=result["tcpfp_raw"],
ttl=str(result["ttl"]),
window_size=str(result["window_size"]),
df_bit=str(result["df_bit"]),
mss=str(result["mss"]),
window_scale=str(result["window_scale"]),
sack_ok=str(result["sack_ok"]),
timestamp=str(result["timestamp"]),
options_order=result["options_order"],
msg=f"TCPFP {ip}:{port} = {result['tcpfp_hash']}",
)
logger.info("prober: TCPFP %s:%d = %s", ip, port, result["tcpfp_hash"])
except Exception as exc:
done.add(port)
_write_event(
log_path, json_path,
"prober_error",
severity=_SEVERITY_WARNING,
target_ip=ip,
target_port=str(port),
error=str(exc),
msg=f"TCPFP probe failed for {ip}:{port}: {exc}",
)
logger.warning("prober: TCPFP probe failed %s:%d: %s", ip, port, exc)
# ─── Main worker ─────────────────────────────────────────────────────────────
@_traced("prober.worker")
async def prober_worker(
log_file: str,
interval: int = 300,
timeout: float = 5.0,
ports: list[int] | None = None,
ssh_ports: list[int] | None = None,
tcpfp_ports: list[int] | None = None,
) -> None:
"""
Main entry point for the standalone prober process.
Discovers attacker IPs automatically by tailing the JSON log file,
then fingerprints each IP via JARM, HASSH, and TCP/IP stack probes.
Args:
log_file: base path for log files (RFC 5424 to .log, JSON to .json)
interval: seconds between probe cycles
timeout: per-probe TCP timeout
ports: JARM TLS ports (defaults to DEFAULT_PROBE_PORTS)
ssh_ports: HASSH SSH ports (defaults to DEFAULT_SSH_PORTS)
tcpfp_ports: TCP fingerprint ports (defaults to DEFAULT_TCPFP_PORTS)
"""
jarm_ports = ports or DEFAULT_PROBE_PORTS
hassh_ports = ssh_ports or DEFAULT_SSH_PORTS
tcp_ports = tcpfp_ports or DEFAULT_TCPFP_PORTS
all_ports_str = (
f"jarm={','.join(str(p) for p in jarm_ports)} "
f"ssh={','.join(str(p) for p in hassh_ports)} "
f"tcpfp={','.join(str(p) for p in tcp_ports)}"
)
log_path = Path(log_file)
json_path = log_path.with_suffix(".json")
log_path.parent.mkdir(parents=True, exist_ok=True)
logger.info(
"prober started interval=%ds %s log=%s",
interval, all_ports_str, log_path,
)
_write_event(
log_path, json_path,
"prober_startup",
interval=str(interval),
probe_ports=all_ports_str,
msg=f"DECNET-PROBER started, interval {interval}s, {all_ports_str}",
)
known_attackers: set[str] = set()
probed: dict[str, dict[str, set[int]]] = {} # IP -> {type -> ports}
log_position: int = 0
while True:
# Discover new attacker IPs from the log stream
new_ips, log_position = await asyncio.to_thread(
_discover_attackers, json_path, log_position,
)
if new_ips - known_attackers:
fresh = new_ips - known_attackers
known_attackers.update(fresh)
logger.info(
"prober: discovered %d new attacker(s), total=%d",
len(fresh), len(known_attackers),
)
if known_attackers:
await asyncio.to_thread(
_probe_cycle, known_attackers, probed,
jarm_ports, hassh_ports, tcp_ports,
log_path, json_path, timeout,
)
await asyncio.sleep(interval)

View File

@@ -0,0 +1,5 @@
"""DECNET profiler — standalone attacker profile builder worker."""
from decnet.profiler.worker import attacker_profile_worker
__all__ = ["attacker_profile_worker"]

View File

@@ -0,0 +1,602 @@
"""
Behavioral and timing analysis for DECNET attacker profiles.
Consumes the chronological `LogEvent` stream already built by
`decnet.correlation.engine.CorrelationEngine` and derives per-IP metrics:
- Inter-event timing statistics (mean / median / stdev / min / max)
- Coefficient-of-variation (jitter metric)
- Beaconing vs. interactive vs. scanning vs. brute_force vs. slow_scan
classification
- Tool attribution against known C2 frameworks (Cobalt Strike, Sliver,
Havoc, Mythic) using default beacon/jitter profiles — returns a list,
since multiple tools can be in use simultaneously
- Header-based tool detection (Nmap NSE, Gophish, Nikto, sqlmap, etc.)
from HTTP request events
- Recon → exfil phase sequencing (latency between the last recon event
and the first exfil-like event)
- OS / TCP fingerprint + retransmit rollup from sniffer-emitted events,
with TTL-based fallback when p0f returns no match
Pure-Python; no external dependencies. All functions are safe to call from
both sync and async contexts.
"""
from __future__ import annotations
import json
import re
import statistics
from collections import Counter
from typing import Any
from decnet.correlation.parser import LogEvent
from decnet.telemetry import traced as _traced, get_tracer as _get_tracer
# ─── Event-type taxonomy ────────────────────────────────────────────────────
# Sniffer-emitted packet events that feed into fingerprint rollup.
_SNIFFER_SYN_EVENT: str = "tcp_syn_fingerprint"
_SNIFFER_FLOW_EVENT: str = "tcp_flow_timing"
# Prober-emitted active-probe result (SYN-ACK fingerprint of attacker machine).
_PROBER_TCPFP_EVENT: str = "tcpfp_fingerprint"
# Canonical initial TTL for each coarse OS bucket. Used to derive hop
# distance when only the observed TTL is available (prober path).
_INITIAL_TTL: dict[str, int] = {
"linux": 64,
"windows": 128,
"embedded": 255,
}
# Events that signal "recon" phase (scans, probes, auth attempts).
_RECON_EVENT_TYPES: frozenset[str] = frozenset({
"scan", "connection", "banner", "probe",
"login_attempt", "auth", "auth_failure",
})
# Events that signal "exfil" / action-on-objective phase.
_EXFIL_EVENT_TYPES: frozenset[str] = frozenset({
"download", "upload", "file_transfer", "data_exfil",
"command", "exec", "query", "shell_input",
})
# Fields carrying payload byte counts (for "large payload" detection).
_PAYLOAD_SIZE_FIELDS: tuple[str, ...] = ("bytes", "size", "content_length")
# ─── C2 tool attribution signatures (beacon timing) ─────────────────────────
#
# Each entry lists the default beacon cadence profile of a popular C2.
# A profile *matches* an attacker when:
# - mean inter-event time is within ±`interval_tolerance` seconds, AND
# - jitter (cv = stdev / mean) is within ±`jitter_tolerance`
#
# Multiple matches are all returned (attacker may run multiple implants).
_TOOL_SIGNATURES: tuple[dict[str, Any], ...] = (
{
"name": "cobalt_strike",
"interval_s": 60.0,
"interval_tolerance_s": 8.0,
"jitter_cv": 0.20,
"jitter_tolerance": 0.05,
},
{
"name": "sliver",
"interval_s": 60.0,
"interval_tolerance_s": 10.0,
"jitter_cv": 0.30,
"jitter_tolerance": 0.08,
},
{
"name": "havoc",
"interval_s": 45.0,
"interval_tolerance_s": 8.0,
"jitter_cv": 0.10,
"jitter_tolerance": 0.03,
},
{
"name": "mythic",
"interval_s": 30.0,
"interval_tolerance_s": 6.0,
"jitter_cv": 0.15,
"jitter_tolerance": 0.03,
},
)
# ─── Header-based tool signatures ───────────────────────────────────────────
#
# Scanned against HTTP `request` events. `pattern` is a case-insensitive
# substring (or a regex anchored with ^ if it starts with that character).
# `header` is matched case-insensitively against the event's headers dict.
_HEADER_TOOL_SIGNATURES: tuple[dict[str, str], ...] = (
{"name": "nmap", "header": "user-agent", "pattern": "Nmap Scripting Engine"},
{"name": "gophish", "header": "x-mailer", "pattern": "gophish"},
{"name": "nikto", "header": "user-agent", "pattern": "Nikto"},
{"name": "sqlmap", "header": "user-agent", "pattern": "sqlmap"},
{"name": "nuclei", "header": "user-agent", "pattern": "Nuclei"},
{"name": "masscan", "header": "user-agent", "pattern": "masscan"},
{"name": "zgrab", "header": "user-agent", "pattern": "zgrab"},
{"name": "metasploit", "header": "user-agent", "pattern": "Metasploit"},
{"name": "curl", "header": "user-agent", "pattern": "^curl/"},
{"name": "python_requests", "header": "user-agent", "pattern": "python-requests"},
{"name": "gobuster", "header": "user-agent", "pattern": "gobuster"},
{"name": "dirbuster", "header": "user-agent", "pattern": "DirBuster"},
{"name": "hydra", "header": "user-agent", "pattern": "hydra"},
{"name": "wfuzz", "header": "user-agent", "pattern": "Wfuzz"},
)
# ─── TTL → coarse OS bucket (fallback when p0f returns nothing) ─────────────
def _os_from_ttl(ttl_str: str | None) -> str | None:
"""Derive a coarse OS guess from observed TTL when p0f has no match."""
if not ttl_str:
return None
try:
ttl = int(ttl_str)
except (TypeError, ValueError):
return None
if 55 <= ttl <= 70:
return "linux"
if 115 <= ttl <= 135:
return "windows"
if 235 <= ttl <= 255:
return "embedded"
return None
# ─── Timing stats ───────────────────────────────────────────────────────────
@_traced("profiler.timing_stats")
def timing_stats(events: list[LogEvent]) -> dict[str, Any]:
"""
Compute inter-arrival-time statistics across *events* (sorted by ts).
Returns a dict with:
mean_iat_s, median_iat_s, stdev_iat_s, min_iat_s, max_iat_s, cv,
event_count, duration_s
For n < 2 events the interval-based fields are None/0.
"""
if not events:
return {
"event_count": 0,
"duration_s": 0.0,
"mean_iat_s": None,
"median_iat_s": None,
"stdev_iat_s": None,
"min_iat_s": None,
"max_iat_s": None,
"cv": None,
}
sorted_events = sorted(events, key=lambda e: e.timestamp)
duration_s = (sorted_events[-1].timestamp - sorted_events[0].timestamp).total_seconds()
if len(sorted_events) < 2:
return {
"event_count": len(sorted_events),
"duration_s": round(duration_s, 3),
"mean_iat_s": None,
"median_iat_s": None,
"stdev_iat_s": None,
"min_iat_s": None,
"max_iat_s": None,
"cv": None,
}
iats = [
(sorted_events[i].timestamp - sorted_events[i - 1].timestamp).total_seconds()
for i in range(1, len(sorted_events))
]
# Exclude spuriously-negative (clock-skew) intervals.
iats = [v for v in iats if v >= 0]
if not iats:
return {
"event_count": len(sorted_events),
"duration_s": round(duration_s, 3),
"mean_iat_s": None,
"median_iat_s": None,
"stdev_iat_s": None,
"min_iat_s": None,
"max_iat_s": None,
"cv": None,
}
mean = statistics.fmean(iats)
median = statistics.median(iats)
stdev = statistics.pstdev(iats) if len(iats) > 1 else 0.0
cv = (stdev / mean) if mean > 0 else None
return {
"event_count": len(sorted_events),
"duration_s": round(duration_s, 3),
"mean_iat_s": round(mean, 3),
"median_iat_s": round(median, 3),
"stdev_iat_s": round(stdev, 3),
"min_iat_s": round(min(iats), 3),
"max_iat_s": round(max(iats), 3),
"cv": round(cv, 4) if cv is not None else None,
}
# ─── Behavior classification ────────────────────────────────────────────────
@_traced("profiler.classify_behavior")
def classify_behavior(stats: dict[str, Any], services_count: int) -> str:
"""
Coarse behavior bucket:
beaconing | interactive | scanning | brute_force | slow_scan | mixed | unknown
Heuristics (evaluated in priority order):
* `scanning` — ≥ 3 services touched OR mean IAT < 2 s, ≥ 3 events
* `brute_force` — 1 service, n ≥ 8, mean IAT < 5 s, CV < 0.6
* `beaconing` — CV < 0.35, mean IAT ≥ 5 s, ≥ 4 events
* `slow_scan` — ≥ 2 services, mean IAT ≥ 10 s, ≥ 4 events
* `interactive` — mean IAT < 5 s AND CV ≥ 0.5, ≥ 6 events
* `mixed` — catch-all for sessions with enough data
* `unknown` — too few data points
"""
n = stats.get("event_count") or 0
mean = stats.get("mean_iat_s")
cv = stats.get("cv")
if n < 3 or mean is None:
return "unknown"
# Slow scan / low-and-slow: multiple services with long gaps.
# Must be checked before generic scanning so slow multi-service sessions
# don't get mis-bucketed as a fast sweep.
if services_count >= 2 and mean >= 10.0 and n >= 4:
return "slow_scan"
# Scanning: broad service sweep (multi-service) or very rapid single-service bursts.
if n >= 3 and (
(services_count >= 3 and mean < 10.0)
or (services_count >= 2 and mean < 2.0)
):
return "scanning"
# Brute force: hammering one service rapidly and repeatedly.
if services_count == 1 and n >= 8 and mean < 5.0 and cv is not None and cv < 0.6:
return "brute_force"
# Beaconing: regular cadence over multiple events.
if cv is not None and cv < 0.35 and mean >= 5.0 and n >= 4:
return "beaconing"
# Interactive: short but irregular bursts (human or tool with think time).
if cv is not None and cv >= 0.5 and mean < 5.0 and n >= 6:
return "interactive"
return "mixed"
# ─── C2 tool attribution (beacon timing) ────────────────────────────────────
def guess_tools(mean_iat_s: float | None, cv: float | None) -> list[str]:
"""
Match (mean_iat, cv) against known C2 default beacon profiles.
Returns a list of all matching tool names (may be empty). Multiple
matches are all returned because an attacker can run several implants.
"""
if mean_iat_s is None or cv is None:
return []
hits: list[str] = []
for sig in _TOOL_SIGNATURES:
if abs(mean_iat_s - sig["interval_s"]) > sig["interval_tolerance_s"]:
continue
if abs(cv - sig["jitter_cv"]) > sig["jitter_tolerance"]:
continue
hits.append(sig["name"])
return hits
# Keep the old name as an alias so callers that expected a single string still
# compile, but mark it deprecated. Returns the first hit or None.
def guess_tool(mean_iat_s: float | None, cv: float | None) -> str | None:
"""Deprecated: use guess_tools() instead."""
hits = guess_tools(mean_iat_s, cv)
if len(hits) == 1:
return hits[0]
return None
# ─── Header-based tool detection ────────────────────────────────────────────
@_traced("profiler.detect_tools_from_headers")
def detect_tools_from_headers(events: list[LogEvent]) -> list[str]:
"""
Scan HTTP `request` events for tool-identifying headers.
Checks User-Agent, X-Mailer, and other headers case-insensitively
against `_HEADER_TOOL_SIGNATURES`. Returns a deduplicated list of
matched tool names in detection order.
"""
found: list[str] = []
seen: set[str] = set()
for e in events:
if e.event_type != "request":
continue
raw_headers = e.fields.get("headers")
if not raw_headers:
continue
# headers may arrive as a JSON string, a Python-repr string (legacy),
# or a dict already (in-memory / test paths).
if isinstance(raw_headers, str):
try:
headers: dict[str, str] = json.loads(raw_headers)
except (json.JSONDecodeError, ValueError):
# Backward-compat: events written before the JSON-encode fix
# were serialized as Python repr via str(dict). ast.literal_eval
# handles that safely (no arbitrary code execution).
try:
import ast as _ast
_parsed = _ast.literal_eval(raw_headers)
if isinstance(_parsed, dict):
headers = _parsed
else:
continue
except Exception: # nosec B112 — skip unparseable header values
continue
elif isinstance(raw_headers, dict):
headers = raw_headers
else:
continue
# Normalise header keys to lowercase for matching.
lc_headers: dict[str, str] = {k.lower(): str(v) for k, v in headers.items()}
for sig in _HEADER_TOOL_SIGNATURES:
name = sig["name"]
if name in seen:
continue
value = lc_headers.get(sig["header"])
if value is None:
continue
pattern = sig["pattern"]
if pattern.startswith("^"):
if re.match(pattern, value, re.IGNORECASE):
found.append(name)
seen.add(name)
else:
if pattern.lower() in value.lower():
found.append(name)
seen.add(name)
return found
# ─── Phase sequencing ───────────────────────────────────────────────────────
@_traced("profiler.phase_sequence")
def phase_sequence(events: list[LogEvent]) -> dict[str, Any]:
"""
Derive recon→exfil phase transition info.
Returns:
recon_end_ts : ISO timestamp of last recon-class event (or None)
exfil_start_ts : ISO timestamp of first exfil-class event (or None)
exfil_latency_s : seconds between them (None if not both present)
large_payload_count: count of events whose *fields* report a payload
≥ 1 MiB (heuristic for bulk data transfer)
"""
recon_end = None
exfil_start = None
large_payload_count = 0
for e in sorted(events, key=lambda x: x.timestamp):
if e.event_type in _RECON_EVENT_TYPES:
recon_end = e.timestamp
elif e.event_type in _EXFIL_EVENT_TYPES and exfil_start is None:
exfil_start = e.timestamp
for fname in _PAYLOAD_SIZE_FIELDS:
raw = e.fields.get(fname)
if raw is None:
continue
try:
if int(raw) >= 1_048_576:
large_payload_count += 1
break
except (TypeError, ValueError):
continue
latency: float | None = None
if recon_end is not None and exfil_start is not None and exfil_start >= recon_end:
latency = round((exfil_start - recon_end).total_seconds(), 3)
return {
"recon_end_ts": recon_end.isoformat() if recon_end else None,
"exfil_start_ts": exfil_start.isoformat() if exfil_start else None,
"exfil_latency_s": latency,
"large_payload_count": large_payload_count,
}
# ─── Sniffer rollup (OS fingerprint + retransmits) ──────────────────────────
@_traced("profiler.sniffer_rollup")
def sniffer_rollup(events: list[LogEvent]) -> dict[str, Any]:
"""
Roll up sniffer-emitted `tcp_syn_fingerprint` and `tcp_flow_timing`
events into a per-attacker summary.
OS guess priority:
1. Modal p0f label from os_guess field (if not "unknown"/empty).
2. TTL-based coarse bucket (linux / windows / embedded) as fallback.
Hop distance: median of non-zero reported values only.
"""
os_guesses: list[str] = []
ttl_values: list[str] = []
hops: list[int] = []
tcp_fp: dict[str, Any] | None = None
retransmits = 0
for e in events:
if e.event_type == _SNIFFER_SYN_EVENT:
og = e.fields.get("os_guess")
if og and og != "unknown":
os_guesses.append(og)
# Collect raw TTL for fallback OS derivation.
ttl_raw = e.fields.get("ttl") or e.fields.get("initial_ttl")
if ttl_raw:
ttl_values.append(ttl_raw)
# Only include hop distances that are valid and non-zero.
hop_raw = e.fields.get("hop_distance")
if hop_raw:
try:
hop_val = int(hop_raw)
if hop_val > 0:
hops.append(hop_val)
except (TypeError, ValueError):
pass
# Keep the latest fingerprint snapshot.
tcp_fp = {
"window": _int_or_none(e.fields.get("window")),
"wscale": _int_or_none(e.fields.get("wscale")),
"mss": _int_or_none(e.fields.get("mss")),
"options_sig": e.fields.get("options_sig", ""),
"has_sack": e.fields.get("has_sack") == "true",
"has_timestamps": e.fields.get("has_timestamps") == "true",
}
elif e.event_type == _SNIFFER_FLOW_EVENT:
try:
retransmits += int(e.fields.get("retransmits", "0"))
except (TypeError, ValueError):
pass
elif e.event_type == _PROBER_TCPFP_EVENT:
# Active-probe result: prober sent SYN to attacker, got SYN-ACK back.
# Field names differ from the passive sniffer (different emitter).
ttl_raw = e.fields.get("ttl")
if ttl_raw:
ttl_values.append(ttl_raw)
# Derive hop distance from observed TTL vs canonical initial TTL.
os_hint = _os_from_ttl(ttl_raw)
if os_hint:
initial = _INITIAL_TTL.get(os_hint)
if initial:
try:
hop_val = initial - int(ttl_raw)
if hop_val > 0:
hops.append(hop_val)
except (TypeError, ValueError):
pass
# Prober uses window_size/window_scale/options_order instead of
# the sniffer's window/wscale/options_sig.
tcp_fp = {
"window": _int_or_none(e.fields.get("window_size")),
"wscale": _int_or_none(e.fields.get("window_scale")),
"mss": _int_or_none(e.fields.get("mss")),
"options_sig": e.fields.get("options_order", ""),
"has_sack": e.fields.get("sack_ok") == "1",
"has_timestamps": e.fields.get("timestamp") == "1",
}
# Mode for the OS bucket — most frequently observed label.
os_guess: str | None = None
if os_guesses:
os_guess = Counter(os_guesses).most_common(1)[0][0]
else:
# TTL-based fallback: use the most common observed TTL value.
if ttl_values:
modal_ttl = Counter(ttl_values).most_common(1)[0][0]
os_guess = _os_from_ttl(modal_ttl)
# Median hop distance (robust to the occasional weird TTL).
hop_distance: int | None = None
if hops:
hop_distance = int(statistics.median(hops))
return {
"os_guess": os_guess,
"hop_distance": hop_distance,
"tcp_fingerprint": tcp_fp or {},
"retransmit_count": retransmits,
}
def _int_or_none(v: Any) -> int | None:
if v is None or v == "":
return None
try:
return int(v)
except (TypeError, ValueError):
return None
# ─── Composite: build the full AttackerBehavior record ──────────────────────
@_traced("profiler.build_behavior_record")
def build_behavior_record(events: list[LogEvent]) -> dict[str, Any]:
"""
Build the dict to persist in the `attacker_behavior` table.
Callers (profiler worker) pre-serialize JSON-typed fields; we do the
JSON encoding here to keep the repo layer schema-agnostic.
"""
# Timing stats are computed across *all* events (not filtered), because
# a C2 beacon often reuses the same "connection" event_type on each
# check-in. Filtering would throw that signal away.
stats = timing_stats(events)
services = {e.service for e in events}
behavior = classify_behavior(stats, len(services))
rollup = sniffer_rollup(events)
phase = phase_sequence(events)
# Combine beacon-timing tool matches with header-based detections.
beacon_tools = guess_tools(stats.get("mean_iat_s"), stats.get("cv"))
header_tools = detect_tools_from_headers(events)
all_tools: list[str] = list(dict.fromkeys(beacon_tools + header_tools)) # dedup, preserve order
# Promote TCP-level scanner identification to tool_guesses.
# p0f fingerprints nmap from the TCP handshake alone — this fires even
# when no HTTP service is present, making it far more reliable than the
# header-based path for raw port scans.
if rollup["os_guess"] == "nmap" and "nmap" not in all_tools:
all_tools.insert(0, "nmap")
# Beacon-specific projection: only surface interval/jitter when we've
# classified the flow as beaconing (otherwise these numbers are noise).
beacon_interval_s: float | None = None
beacon_jitter_pct: float | None = None
if behavior == "beaconing":
beacon_interval_s = stats.get("mean_iat_s")
cv = stats.get("cv")
beacon_jitter_pct = round(cv * 100, 2) if cv is not None else None
_tracer = _get_tracer("profiler")
with _tracer.start_as_current_span("profiler.behavior_summary") as _span:
_span.set_attribute("behavior_class", behavior)
_span.set_attribute("os_guess", rollup["os_guess"] or "unknown")
_span.set_attribute("tool_count", len(all_tools))
_span.set_attribute("event_count", stats.get("event_count", 0))
if all_tools:
_span.set_attribute("tools", ",".join(all_tools))
return {
"os_guess": rollup["os_guess"],
"hop_distance": rollup["hop_distance"],
"tcp_fingerprint": json.dumps(rollup["tcp_fingerprint"]),
"retransmit_count": rollup["retransmit_count"],
"behavior_class": behavior,
"beacon_interval_s": beacon_interval_s,
"beacon_jitter_pct": beacon_jitter_pct,
"tool_guesses": json.dumps(all_tools),
"timing_stats": json.dumps(stats),
"phase_sequence": json.dumps(phase),
}

215
decnet/profiler/worker.py Normal file
View File

@@ -0,0 +1,215 @@
"""
Attacker profile builder — incremental background worker.
Maintains a persistent CorrelationEngine and a log-ID cursor across cycles.
On cold start (first cycle or process restart), performs one full build from
all stored logs. Subsequent cycles fetch only new logs via the cursor,
ingest them into the existing engine, and rebuild profiles for affected IPs
only.
Complexity per cycle: O(new_logs + affected_ips) instead of O(total_logs²).
"""
from __future__ import annotations
import asyncio
import json
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any
from decnet.correlation.engine import CorrelationEngine
from decnet.correlation.parser import LogEvent
from decnet.logging import get_logger
from decnet.profiler.behavioral import build_behavior_record
from decnet.telemetry import traced as _traced, get_tracer as _get_tracer
from decnet.web.db.repository import BaseRepository
logger = get_logger("attacker_worker")
_BATCH_SIZE = 500
_STATE_KEY = "attacker_worker_cursor"
# Event types that indicate active command/query execution (not just connection/scan)
_COMMAND_EVENT_TYPES = frozenset({
"command", "exec", "query", "input", "shell_input",
"execute", "run", "sql_query", "redis_command",
})
# Fields that carry the executed command/query text
_COMMAND_FIELDS = ("command", "query", "input", "line", "sql", "cmd")
@dataclass
class _WorkerState:
engine: CorrelationEngine = field(default_factory=CorrelationEngine)
last_log_id: int = 0
initialized: bool = False
async def attacker_profile_worker(repo: BaseRepository, *, interval: int = 30) -> None:
"""Periodically updates the Attacker table incrementally. Designed to run as an asyncio Task."""
logger.info("attacker profile worker started interval=%ds", interval)
state = _WorkerState()
_saved_cursor = await repo.get_state(_STATE_KEY)
if _saved_cursor:
state.last_log_id = _saved_cursor.get("last_log_id", 0)
state.initialized = True
logger.info("attacker worker: resumed from cursor last_log_id=%d", state.last_log_id)
while True:
await asyncio.sleep(interval)
try:
await _incremental_update(repo, state)
except Exception as exc:
logger.error("attacker worker: update failed: %s", exc)
@_traced("profiler.incremental_update")
async def _incremental_update(repo: BaseRepository, state: _WorkerState) -> None:
was_cold = not state.initialized
affected_ips: set[str] = set()
while True:
batch = await repo.get_logs_after_id(state.last_log_id, limit=_BATCH_SIZE)
if not batch:
break
for row in batch:
event = state.engine.ingest(row["raw_line"])
if event and event.attacker_ip:
affected_ips.add(event.attacker_ip)
state.last_log_id = row["id"]
await asyncio.sleep(0) # yield to event loop after each batch
if len(batch) < _BATCH_SIZE:
break
state.initialized = True
if not affected_ips:
await repo.set_state(_STATE_KEY, {"last_log_id": state.last_log_id})
return
await _update_profiles(repo, state, affected_ips)
await repo.set_state(_STATE_KEY, {"last_log_id": state.last_log_id})
if was_cold:
logger.info("attacker worker: cold start rebuilt %d profiles", len(affected_ips))
else:
logger.info("attacker worker: updated %d profiles (incremental)", len(affected_ips))
@_traced("profiler.update_profiles")
async def _update_profiles(
repo: BaseRepository,
state: _WorkerState,
ips: set[str],
) -> None:
traversal_map = {t.attacker_ip: t for t in state.engine.traversals(min_deckies=2)}
bounties_map = await repo.get_bounties_for_ips(ips)
_tracer = _get_tracer("profiler")
for ip in ips:
events = state.engine._events.get(ip, [])
if not events:
continue
with _tracer.start_as_current_span("profiler.process_ip") as _span:
_span.set_attribute("attacker_ip", ip)
_span.set_attribute("event_count", len(events))
traversal = traversal_map.get(ip)
bounties = bounties_map.get(ip, [])
commands = _extract_commands_from_events(events)
record = _build_record(ip, events, traversal, bounties, commands)
attacker_uuid = await repo.upsert_attacker(record)
_span.set_attribute("is_traversal", traversal is not None)
_span.set_attribute("bounty_count", len(bounties))
_span.set_attribute("command_count", len(commands))
# Behavioral / fingerprint rollup lives in a sibling table so failures
# here never block the core attacker profile upsert.
try:
behavior = build_behavior_record(events)
await repo.upsert_attacker_behavior(attacker_uuid, behavior)
except Exception as exc:
_span.record_exception(exc)
logger.error("attacker worker: behavior upsert failed for %s: %s", ip, exc)
def _build_record(
ip: str,
events: list[LogEvent],
traversal: Any,
bounties: list[dict[str, Any]],
commands: list[dict[str, Any]],
) -> dict[str, Any]:
services = sorted({e.service for e in events})
deckies = (
traversal.deckies
if traversal
else _first_contact_deckies(events)
)
fingerprints = [b for b in bounties if b.get("bounty_type") == "fingerprint"]
credential_count = sum(1 for b in bounties if b.get("bounty_type") == "credential")
return {
"ip": ip,
"first_seen": min(e.timestamp for e in events),
"last_seen": max(e.timestamp for e in events),
"event_count": len(events),
"service_count": len(services),
"decky_count": len({e.decky for e in events}),
"services": json.dumps(services),
"deckies": json.dumps(deckies),
"traversal_path": traversal.path if traversal else None,
"is_traversal": traversal is not None,
"bounty_count": len(bounties),
"credential_count": credential_count,
"fingerprints": json.dumps(fingerprints),
"commands": json.dumps(commands),
"updated_at": datetime.now(timezone.utc),
}
def _first_contact_deckies(events: list[LogEvent]) -> list[str]:
"""Return unique deckies in first-contact order (for non-traversal attackers)."""
seen: list[str] = []
for e in sorted(events, key=lambda x: x.timestamp):
if e.decky not in seen:
seen.append(e.decky)
return seen
def _extract_commands_from_events(events: list[LogEvent]) -> list[dict[str, Any]]:
"""
Extract executed commands from LogEvent objects.
Works directly on LogEvent.fields (already a dict), so no JSON parsing needed.
"""
commands: list[dict[str, Any]] = []
for event in events:
if event.event_type not in _COMMAND_EVENT_TYPES:
continue
cmd_text: str | None = None
for key in _COMMAND_FIELDS:
val = event.fields.get(key)
if val:
cmd_text = str(val)
break
if not cmd_text:
continue
commands.append({
"service": event.service,
"decky": event.decky,
"command": cmd_text,
"timestamp": event.timestamp.isoformat(),
})
return commands

View File

@@ -13,6 +13,7 @@ class BaseService(ABC):
name: str # unique slug, e.g. "ssh", "smb"
ports: list[int] # ports this service listens on inside the container
default_image: str # Docker image tag, or "build" if a Dockerfile is needed
fleet_singleton: bool = False # True = runs once fleet-wide, not per-decky
@abstractmethod
def compose_fragment(

View File

@@ -1,26 +1,35 @@
from pathlib import Path
from decnet.services.base import BaseService
class ConpotService(BaseService):
"""ICS/SCADA honeypot covering Modbus (502), SNMP (161 UDP), and HTTP (80).
Uses the official honeynet/conpot image which ships a default ICS profile
that emulates a Siemens S7-200 PLC.
Uses a custom build context wrapping the official honeynet/conpot image
to fix Modbus binding to port 502.
"""
name = "conpot"
ports = [502, 161, 80]
default_image = "honeynet/conpot"
default_image = "build"
def compose_fragment(self, decky_name: str, log_target: str | None = None, service_cfg: dict | None = None) -> dict:
env = {
"CONPOT_TEMPLATE": "default",
"NODE_NAME": decky_name,
}
if log_target:
env["LOG_TARGET"] = log_target
return {
"image": "honeynet/conpot",
"build": {
"context": str(self.dockerfile_context()),
"args": {"BASE_IMAGE": "honeynet/conpot:latest"},
},
"container_name": f"{decky_name}-conpot",
"restart": "unless-stopped",
"environment": {
"CONPOT_TEMPLATE": "default",
},
"environment": env,
}
def dockerfile_context(self):
return None
return Path(__file__).parent.parent / "templates" / "conpot"

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "docker_api"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "docker_api"
class DockerAPIService(BaseService):

View File

@@ -2,7 +2,7 @@ from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "elasticsearch"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "elasticsearch"
class ElasticsearchService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "ftp"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "ftp"
class FTPService(BaseService):

View File

@@ -2,7 +2,7 @@ import json
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "http"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "http"
class HTTPService(BaseService):

59
decnet/services/https.py Normal file
View File

@@ -0,0 +1,59 @@
import json
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "https"
class HTTPSService(BaseService):
name = "https"
ports = [443]
default_image = "build"
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
cfg = service_cfg or {}
fragment: dict = {
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-https",
"restart": "unless-stopped",
"environment": {
"NODE_NAME": decky_name,
},
}
if log_target:
fragment["environment"]["LOG_TARGET"] = log_target
# Optional persona overrides — only injected when explicitly set
if "server_header" in cfg:
fragment["environment"]["SERVER_HEADER"] = cfg["server_header"]
if "response_code" in cfg:
fragment["environment"]["RESPONSE_CODE"] = str(cfg["response_code"])
if "fake_app" in cfg:
fragment["environment"]["FAKE_APP"] = cfg["fake_app"]
if "extra_headers" in cfg:
val = cfg["extra_headers"]
fragment["environment"]["EXTRA_HEADERS"] = (
json.dumps(val) if isinstance(val, dict) else val
)
if "custom_body" in cfg:
fragment["environment"]["CUSTOM_BODY"] = cfg["custom_body"]
if "files" in cfg:
files_path = str(Path(cfg["files"]).resolve())
fragment["environment"]["FILES_DIR"] = "/opt/html_files"
fragment.setdefault("volumes", []).append(f"{files_path}:/opt/html_files:ro")
if "tls_cert" in cfg:
fragment["environment"]["TLS_CERT"] = cfg["tls_cert"]
if "tls_key" in cfg:
fragment["environment"]["TLS_KEY"] = cfg["tls_key"]
if "tls_cn" in cfg:
fragment["environment"]["TLS_CN"] = cfg["tls_cn"]
return fragment
def dockerfile_context(self) -> Path | None:
return TEMPLATES_DIR

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "imap"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "imap"
class IMAPService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "k8s"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "k8s"
class KubernetesAPIService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "ldap"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "ldap"
class LDAPService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "llmnr"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "llmnr"
class LLMNRService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "mongodb"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "mongodb"
class MongoDBService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "mqtt"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "mqtt"
class MQTTService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "mssql"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "mssql"
class MSSQLService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "mysql"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "mysql"
class MySQLService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "pop3"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "pop3"
class POP3Service(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "postgres"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "postgres"
class PostgresService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "rdp"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "rdp"
class RDPService(BaseService):

View File

@@ -1,46 +0,0 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "real_ssh"
class RealSSHService(BaseService):
"""
Fully interactive OpenSSH server — no honeypot emulation.
Used for the deaddeck (entry-point machine). Attackers get a real shell.
Credentials are intentionally weak to invite exploitation.
service_cfg keys:
password Root password (default: "admin")
hostname Override container hostname
"""
name = "real_ssh"
ports = [22]
default_image = "build"
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
cfg = service_cfg or {}
env: dict = {
"SSH_ROOT_PASSWORD": cfg.get("password", "admin"),
}
if "hostname" in cfg:
env["SSH_HOSTNAME"] = cfg["hostname"]
return {
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-real-ssh",
"restart": "unless-stopped",
"cap_add": ["NET_BIND_SERVICE"],
"environment": env,
}
def dockerfile_context(self) -> Path:
return TEMPLATES_DIR

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "redis"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "redis"
class RedisService(BaseService):

View File

@@ -26,6 +26,8 @@ def _load_plugins() -> None:
continue
importlib.import_module(f"decnet.services.{module_info.name}")
for cls in BaseService.__subclasses__():
if not cls.__module__.startswith("decnet.services."):
continue
instance = cls()
_registry[instance.name] = instance
_loaded = True

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "sip"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "sip"
class SIPService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "smb"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "smb"
class SMBService(BaseService):

View File

@@ -2,7 +2,7 @@ from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "smtp"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "smtp"
class SMTPService(BaseService):

View File

@@ -0,0 +1,43 @@
from pathlib import Path
from decnet.services.base import BaseService
# Reuses the same template as the smtp service — only difference is
# SMTP_OPEN_RELAY=1 in the environment, which enables the open relay persona.
_TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "smtp"
class SMTPRelayService(BaseService):
"""SMTP open relay bait — accepts any RCPT TO and delivers messages."""
name = "smtp_relay"
ports = [25, 587]
default_image = "build"
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
cfg = service_cfg or {}
fragment: dict = {
"build": {"context": str(_TEMPLATES_DIR)},
"container_name": f"{decky_name}-smtp_relay",
"restart": "unless-stopped",
"cap_add": ["NET_BIND_SERVICE"],
"environment": {
"NODE_NAME": decky_name,
"SMTP_OPEN_RELAY": "1",
},
}
if log_target:
fragment["environment"]["LOG_TARGET"] = log_target
if "banner" in cfg:
fragment["environment"]["SMTP_BANNER"] = cfg["banner"]
if "mta" in cfg:
fragment["environment"]["SMTP_MTA"] = cfg["mta"]
return fragment
def dockerfile_context(self) -> Path:
return _TEMPLATES_DIR

View File

@@ -0,0 +1,41 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "sniffer"
class SnifferService(BaseService):
"""
Passive network sniffer deployed alongside deckies on the MACVLAN.
Captures TLS handshakes in promiscuous mode and extracts JA3/JA3S hashes
plus connection metadata. Requires NET_RAW + NET_ADMIN capabilities.
No inbound ports — purely passive.
"""
name = "sniffer"
ports: list[int] = []
default_image = "build"
fleet_singleton = True
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
fragment: dict = {
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-sniffer",
"restart": "unless-stopped",
"cap_add": ["NET_RAW", "NET_ADMIN"],
"environment": {
"NODE_NAME": decky_name,
},
}
if log_target:
fragment["environment"]["LOG_TARGET"] = log_target
return fragment
def dockerfile_context(self) -> Path | None:
return TEMPLATES_DIR

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "snmp"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "snmp"
class SNMPService(BaseService):

View File

@@ -1,12 +1,26 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "cowrie"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "ssh"
class SSHService(BaseService):
"""
Interactive OpenSSH server for general-purpose deckies.
Replaced Cowrie emulation with a real sshd so fingerprinting tools and
experienced attackers cannot trivially identify the honeypot. Auth events,
sudo activity, and interactive commands are all forwarded to stdout as
RFC 5424 via the rsyslog bridge baked into the image.
service_cfg keys:
password Root password (default: "admin")
hostname Override container hostname
"""
name = "ssh"
ports = [22, 2222]
ports = [22]
default_image = "build"
def compose_fragment(
@@ -17,35 +31,29 @@ class SSHService(BaseService):
) -> dict:
cfg = service_cfg or {}
env: dict = {
"SSH_ROOT_PASSWORD": cfg.get("password", "admin"),
# NODE_NAME is the authoritative decky identifier for log
# attribution — matches the host path used for the artifacts
# bind mount below. The container hostname (optionally overridden
# via SSH_HOSTNAME) is cosmetic and may differ to keep the
# decoy looking heterogeneous.
"NODE_NAME": decky_name,
"COWRIE_HOSTNAME": decky_name,
"COWRIE_HONEYPOT_LISTEN_ENDPOINTS": "tcp:22:interface=0.0.0.0 tcp:2222:interface=0.0.0.0",
"COWRIE_SSH_LISTEN_ENDPOINTS": "tcp:22:interface=0.0.0.0 tcp:2222:interface=0.0.0.0",
}
if log_target:
host, port = log_target.rsplit(":", 1)
env["COWRIE_OUTPUT_TCP_ENABLED"] = "true"
env["COWRIE_OUTPUT_TCP_HOST"] = host
env["COWRIE_OUTPUT_TCP_PORT"] = port
# Optional persona overrides
if "kernel_version" in cfg:
env["COWRIE_HONEYPOT_KERNEL_VERSION"] = cfg["kernel_version"]
if "kernel_build_string" in cfg:
env["COWRIE_HONEYPOT_KERNEL_BUILD_STRING"] = cfg["kernel_build_string"]
if "hardware_platform" in cfg:
env["COWRIE_HONEYPOT_HARDWARE_PLATFORM"] = cfg["hardware_platform"]
if "ssh_banner" in cfg:
env["COWRIE_SSH_VERSION"] = cfg["ssh_banner"]
if "users" in cfg:
env["COWRIE_USERDB_ENTRIES"] = cfg["users"]
if "hostname" in cfg:
env["SSH_HOSTNAME"] = cfg["hostname"]
# File-catcher quarantine: bind-mount a per-decky host dir so attacker
# drops (scp/sftp/wget) are mirrored out-of-band for forensic analysis.
# The in-container path masquerades as systemd-coredump so `mount`/`df`
# from inside the container looks benign.
quarantine_host = f"/var/lib/decnet/artifacts/{decky_name}/ssh"
return {
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-ssh",
"restart": "unless-stopped",
"cap_add": ["NET_BIND_SERVICE"],
"environment": env,
"volumes": [f"{quarantine_host}:/var/lib/systemd/coredump:rw"],
}
def dockerfile_context(self) -> Path:

View File

@@ -1,31 +1,47 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "telnet"
class TelnetService(BaseService):
"""
Real telnetd using busybox telnetd + rsyslog logging pipeline.
Replaced Cowrie emulation (which also started an SSH daemon on port 22)
with a real busybox telnetd so only port 23 is exposed and auth events
are logged as RFC 5424 via the same rsyslog bridge used by the SSH service.
service_cfg keys:
password Root password (default: "admin")
hostname Override container hostname
"""
name = "telnet"
ports = [23]
default_image = "cowrie/cowrie"
default_image = "build"
def compose_fragment(self, decky_name: str, log_target: str | None = None, service_cfg: dict | None = None) -> dict:
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
cfg = service_cfg or {}
env: dict = {
"COWRIE_HONEYPOT_HOSTNAME": decky_name,
"COWRIE_TELNET_ENABLED": "true",
"COWRIE_TELNET_LISTEN_ENDPOINTS": "tcp:23:interface=0.0.0.0",
# Disable SSH so this container is telnet-only
"COWRIE_SSH_ENABLED": "false",
"TELNET_ROOT_PASSWORD": cfg.get("password", "admin"),
}
if log_target:
host, port = log_target.rsplit(":", 1)
env["COWRIE_OUTPUT_TCP_ENABLED"] = "true"
env["COWRIE_OUTPUT_TCP_HOST"] = host
env["COWRIE_OUTPUT_TCP_PORT"] = port
if "hostname" in cfg:
env["TELNET_HOSTNAME"] = cfg["hostname"]
return {
"image": "cowrie/cowrie",
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-telnet",
"restart": "unless-stopped",
"cap_add": ["NET_BIND_SERVICE"],
"environment": env,
}
def dockerfile_context(self):
return None
def dockerfile_context(self) -> Path:
return TEMPLATES_DIR

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "tftp"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "tftp"
class TFTPService(BaseService):

View File

@@ -1,7 +1,7 @@
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "vnc"
TEMPLATES_DIR = Path(__file__).parent.parent / "templates" / "vnc"
class VNCService(BaseService):

Some files were not shown because too many files have changed in this diff Show More