merge testing->tomerge/main #7
Reference in New Issue
Block a user
Delete Branch "testing"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
- Modify Rfc5424Formatter to read decnet_component from LogRecord and use it as RFC 5424 APP-NAME field (falls back to 'decnet') - Add get_logger(component) factory in decnet/logging/__init__.py with _ComponentFilter that injects decnet_component on each record - Wire all five layers to their component tag: cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester, routers), mutator -> 'mutator', collector -> 'collector' - Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each layer per the defined vocabulary; DEBUG calls are suppressed unless DECNET_DEVELOPER=true - Add tests/test_logging.py covering factory, filter, formatter component-awareness, fallback behaviour, and level gatingMigrate Attacker model from IP-based to UUID-based primary key with auto-migration for old schema. Add GET /attackers (paginated, search, sort) and GET /attackers/{uuid} API routes. Rewrite Attackers.tsx as a card grid with full threat info and create AttackerDetail.tsx as a dedicated detail page with back navigation, stats, commands table, and fingerprints.New GET /attackers/{uuid}/commands?limit=&offset=&service= endpoint serves commands with server-side pagination and optional service filter. AttackerDetail frontend fetches commands from this endpoint with page controls. Service badge filter now drives both the API query and the local fingerprint filter.templates/decnet_logging.py calls str(v) on all SD-PARAM values, turning a headers dict into Python repr ('{'User-Agent': ...}') rather than JSON. detect_tools_from_headers() called json.loads() on that string and silently swallowed the error, returning [] for every HTTP event. Same bug prevented the ingester from extracting User-Agent bounty fingerprints. - templates/http/server.py: wrap headers dict in json.dumps() before passing to syslog_line so the value is a valid JSON string in the syslog record - behavioral.py: add ast.literal_eval fallback for existing DB rows that were stored with the old Python repr format - ingester.py: parse headers as JSON string in _extract_bounty so User-Agent fingerprints are stored correctly going forward - tests: add test_json_string_headers and test_python_repr_headers_fallback to exercise both formats in detect_tools_from_headersWhen decnet.system.log is root-owned (e.g. created by a pre-fix 'sudo decnet deploy') and a subsequent non-root process tries to log, the InodeAwareRotatingFileHandler raised PermissionError out of emit(), which propagated up through logger.debug/info and killed the collector's log stream loop ('log stream ended ... reason=[Errno 13]'). Now matches stdlib behaviour: wrap _open() in try/except OSError and defer to handleError() on failure. Adds a regression test. Also: scripts/profile/view.sh 'pyinstrument' keyword was matching memray-flamegraph-*.html files. Exclude the memray-* prefix.decnet api342916ca63Under high-concurrency MySQL load, uvicorn cancels request tasks when clients disconnect. If cancellation lands mid-query, session.close() tries to ROLLBACK on a connection that aiomysql has already marked as closed — raising InterfaceError("Cancelled during execution") and leaving the connection checked-out until GC, which the pool then warns about as a 'non-checked-in connection'. The old fallback tried sync.rollback() + sync.close(), but those still go through the async driver and fail the same way on a dead connection. Replace them with session.sync_session.invalidate(), which just flips the pool's internal record — no I/O, so it can't be cancelled — and tells the pool to drop the connection immediately instead of waiting for garbage collection.Two leaks remained after the inotifywait argv fix: 1. The bash running journal-relay showed its argv[1] (the script path) in /proc/PID/cmdline, producing a line like 'journal-relay /usr/libexec/udev/journal-relay' Apply argv_zap.so to that bash too. 2. argv_zap previously hardcoded PR_SET_NAME to 'kmsg-watch', which was wrong for any caller other than inotifywait. The comm name now comes from ARGV_ZAP_COMM so each caller can pick its own (kmsg-watch for inotifywait, journal-relay for the watcher bash). 3. The capture.sh header started with 'SSH honeypot file-catcher' — fatal if an attacker runs 'cat' on it. Rewritten as a plausible systemd-journal relay helper; stray 'attacker' / 'honeypot' words in mid-script comments stripped too.Adds the server-side wiring and frontend UI to surface files captured by the SSH honeypot for a given attacker. - New repository method get_attacker_artifacts (abstract + SQLModel impl) that joins the attacker's IP to `file_captured` log rows. - New route GET /attackers/{uuid}/artifacts. - New router /artifacts/{decky}/{service}/{stored_as} that streams a quarantined file back to an authenticated viewer. - AttackerDetail grows an ArtifactDrawer panel with per-file metadata (sha256, size, orig_path) and a download action. - ssh service fragment now sets NODE_NAME=decky_name so logs and the host-side artifacts bind-mount share the same decky identifier.decnet forwarderCLI to run syslog-over-TLS forwarder a6430cac4cdecnet swarm {enroll,list,decommission}+deploy --mode swarm1e8ca4cc05decnet swarm deckiesto list deployed shards by host 8914c27220Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind require_admin. Reuses the existing UpdaterClient + tar_working_tree + the per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is built exactly once per /push request and fanned out to every selected worker. /hosts filters out decommissioned hosts and agent-only enrollments (no updater bundle = not a target). Connection drops during /update-self are treated as success — the updater re-execs itself mid-response, so httpx always raises. Pydantic models live in decnet/web/db/models.py (single source of truth). 24 tests cover happy paths, rollback, transport failures, include_self ordering (skip on rolled-back agents), validation, and RBAC gating.The module-level _require_env('DECNET_JWT_SECRET') call blocked `decnet agent` and `decnet updater` from starting on workers that legitimately have no business knowing the master's JWT signing key. Move the resolution into a module `__getattr__`: only consumers that actually read `decnet.env.DECNET_JWT_SECRET` trigger the validation, which in practice means only decnet.web.auth (master-side). Adds tests/test_env_lazy_jwt.py covering both the in-process lazy path and an out-of-process `decnet agent --help` subprocess check with a fully sanitized environment.Rename log-file-path -> log-directory (maps to DECNET_LOG_DIRECTORY). Bundle now ships three systemd units rendered with agent_name/master_host and installs them into /etc/systemd/system/. Bootstrap replaces direct 'decnet X --daemon' calls with systemctl enable --now. Each unit pins DECNET_SYSTEM_LOGS so agent, forwarder, and deckies logs land at decnet.{agent,forwarder}.log and decnet.log under /var/log/decnet.The create helpers short-circuited on name alone, so a prior macvlan deploy left Docker's decnet_lan network in place. A subsequent ipvlan deploy would no-op the network create, then container attach would try to add a macvlan port on enp0s3 that already had an ipvlan slave — EBUSY, agent 500, docker ps empty. Now: when the existing network's driver disagrees with the requested one, disconnect any live containers and DROP it before recreating. Parent-NIC can host one driver at a time. Also: setup_host_{macvlan,ipvlan} opportunistically delete the opposite host-side helper so we don't leave cruft across driver swaps.decnet statusin agent mode f91ba9a16eThe bootstrap was installing into /opt/decnet/.venv with an editable `pip install -e .`, and /usr/local/bin/decnet pointed there. The updater writes releases to /opt/decnet/releases/active/ with a shared venv at /opt/decnet/venv — a parallel tree nothing on the box actually runs. Result: updates appeared to succeed (release dir rotated, SHA changed) but systemd kept executing the untouched bootstrap code. Changes: - Bootstrap now installs directly into /opt/decnet/releases/active with the shared venv at /opt/decnet/venv and /opt/decnet/current symlinked. Same layout the updater rotates in and out of. - /usr/local/bin/decnet -> /opt/decnet/venv/bin/decnet. - run_update / run_update_self heal /usr/local/bin/decnet on every push so already-enrolled hosts recover on the next update instead of needing a re-enroll. - run_update / run_update_self now log each phase (receive, extract, pip install, rotate, restart, probe) so the updater log actually shows what happened.Previously `decnet status` on an agent showed every microservice as DOWN because deploy's auto-spawn was unihost-scoped and the agent CLI gate hid the per-host commands. Now: - collect, probe, profiler, sniffer drop out of MASTER_ONLY_COMMANDS (they run per-host; master-side work stays master-gated). - mutate stays master-only (it orchestrates swarm-wide respawns). - decnet/mutator/ excluded from agent tarballs — never invoked there. - decnet/web exclusion tightened: ship db/ + auth.py + dependencies.py (profiler needs the repo singleton), drop api.py, swarm_api.py, ingester.py, router/, templates/. - Four new systemd unit templates (decnet-collector/prober/profiler/ sniffer) shipped in every enrollment tarball. - enroll_bootstrap.sh enables + starts all four alongside agent and forwarder at install time. - updater restarts the aux units on code push so they pick up the new release (best-effort — legacy enrollments without the units won't fail the update). - status table hides Mutator + API rows in agent mode.Agents already exposed POST /teardown; the master was missing the plumbing to reach it. Add: - POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body {decky_id: str|null}: null tears the whole host, a value tears one decky. On worker failure the master returns 502 and leaves DB shards intact so master and agent stay aligned. - BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky cleanup after a single-decky teardown. - SwarmHosts page: "Teardown all" button (keeps host enrolled). - SwarmDeckies page: per-row "Teardown" button. Also exclude setuptools' build/ staging dir from the enrollment tarball — `pip install -e` on the master generates build/lib/decnet_web/node_modules and the bundle walker was leaking it to agents. Align pyproject's bandit exclude with the git-hook invocation so both skip decnet/templates/.The nested list-comp `[f"{id}-{svc}" for svc in [d.services for d ...]]` iterated over a list of lists, so `svc` was the whole services list and f-string stringified it -> `decky3-['sip']`. docker compose saw "no such service" and the per-decky teardown failed 500. Flatten: find the matching decky once, then iterate its services. Noop early on unknown decky_id and on empty service lists. Regression test asserts the emitted compose args have no '[' or quote characters.Teardowns were synchronous all the way through: POST blocked on the worker's docker-compose-down cycle (seconds to minutes), the frontend locked tearingDown to a single string so only one button could be armed at a time, and operators couldn't queue a second teardown until the first returned. On a flaky worker that meant staring at a spinner for the whole RTT. Backend: POST /swarm/hosts/{uuid}/teardown returns 202 the instant the request is validated. Affected shards flip to state='tearing_down' synchronously before the response so the UI reflects progress immediately, then the actual AgentClient call + DB cleanup run in an asyncio.create_task (tracked in a module-level set to survive GC and to be drainable by tests). On failure the shard flips to 'teardown_failed' with the error recorded — nothing is re-raised, since there's no caller to catch it. Frontend: swap tearingDown / decommissioning from 'string | null' to 'Set<string>'. Each button tracks its own in-flight state; the poll loop picks up the final shard state from the backend. Multiple teardowns can now be queued without blocking each other.New POST /swarm/heartbeat on the swarm controller. Workers post every ~30s with the output of executor.status(); the master bumps SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh DeckyConfig snapshot and runtime-derived state (running/degraded). Security: CA-signed mTLS alone is not sufficient — a decommissioned worker's still-valid cert could resurrect ghost shards. The endpoint extracts the presented peer cert (primary: scope["extensions"]["tls"], fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it to the SwarmHost.client_cert_fingerprint stored for the claimed host_uuid. Extraction is factored into _extract_peer_fingerprint so tests can exercise both uvicorn scope shapes and the both-unavailable fail-closed path without mocking uvicorn's TLS pipeline. Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel impl reuses the indexed client_cert_fingerprint column).View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.