DECNET

Author	SHA1	Message	Date
anti	1e8b73c361	feat(config): add /etc/decnet/decnet.ini loader New decnet/config_ini.py parses a role-scoped INI file via stdlib configparser and seeds os.environ via setdefault — real env vars still win, keeping full back-compat with .env.local flows. [decnet] holds role-agnostic keys (mode, disallow-master, log-file-path); the role section matching `mode` is loaded, the other is ignored silently so a worker never reads master-only keys (and vice versa). Loader is standalone in this commit — not wired into cli.py yet.	2026-04-19 03:10:51 -04:00
anti	9b1299458d	fix(env): resolve DECNET_JWT_SECRET lazily so agent/updater subcommands don't need it The module-level _require_env('DECNET_JWT_SECRET') call blocked `decnet agent` and `decnet updater` from starting on workers that legitimately have no business knowing the master's JWT signing key. Move the resolution into a module `__getattr__`: only consumers that actually read `decnet.env.DECNET_JWT_SECRET` trigger the validation, which in practice means only decnet.web.auth (master-side). Adds tests/test_env_lazy_jwt.py covering both the in-process lazy path and an out-of-process `decnet agent --help` subprocess check with a fully sanitized environment.	2026-04-19 02:43:25 -04:00
anti	a266d6b17e	feat(web): Remote Updates API — dashboard endpoints for pushing code to workers Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind require_admin. Reuses the existing UpdaterClient + tar_working_tree + the per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is built exactly once per /push request and fanned out to every selected worker. /hosts filters out decommissioned hosts and agent-only enrollments (no updater bundle = not a target). Connection drops during /update-self are treated as success — the updater re-execs itself mid-response, so httpx always raises. Pydantic models live in decnet/web/db/models.py (single source of truth). 24 tests cover happy paths, rollback, transport failures, include_self ordering (skip on rolled-back agents), validation, and RBAC gating.	2026-04-19 01:01:09 -04:00
anti	f5a5fec607	feat(deploy): systemd units w/ capability-based hardening; updater restarts agent via systemctl Add deploy/ unit files for every DECNET daemon (agent, updater, api, web, swarmctl, listener, forwarder). All run as User=decnet with NoNewPrivileges, ProtectSystem, PrivateTmp, LockPersonality; AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW only on the agent (MACVLAN/scapy). Existing api/web units migrated to /opt/decnet layout and the same hardening stanza. Make the updater's _spawn_agent systemd-aware: under systemd (detected via INVOCATION_ID + systemctl on PATH), `systemctl restart decnet-agent.service` replaces the Popen path so the new agent inherits the unit's ambient caps instead of the updater's empty set. _stop_agent becomes a no-op in that mode to avoid racing systemctl's own stop phase. Tests cover the dispatcher branch selection, MainPID parsing, and the systemd no-op stop.	2026-04-19 00:44:06 -04:00
anti	40d3e86e55	fix(updater): bootstrap fresh venv with deps; rebuild self-update argv from env - _run_pip: on first venv use, install decnet with its full dep tree so the bootstrapped environment actually has typer/fastapi/uvicorn. Subsequent updates keep --no-deps for a near-no-op refresh. - run_update_self: do not reuse sys.argv to re-exec the updater. Inside the live process, sys.argv is the uvicorn subprocess invocation (--ssl-keyfile etc.), which 'decnet updater' CLI rejects. Reconstruct the operator-visible command from env vars set by updater.server.run.	2026-04-18 23:51:41 -04:00
anti	ebeaf08a49	fix(updater): fall back to /proc scan when agent.pid is missing If the agent was started outside the updater (manually, during dev, or from a prior systemd unit), there is no agent.pid for _stop_agent to target, so a successful code install leaves the old in-memory agent process still serving requests. Scan /proc for any decnet agent command and SIGTERM all matches so restart is reliable regardless of how the agent was originally launched.	2026-04-18 23:42:26 -04:00
anti	7765b36c50	feat(updater): remote self-update daemon with auto-rollback Adds a separate `decnet updater` daemon on each worker that owns the agent's release directory and installs tarball pushes from the master over mTLS. A normal `/update` never touches the updater itself, so the updater is always a known-good rescuer if a bad agent push breaks /health — the rotation is reversed and the agent restarted against the previous release. `POST /update-self` handles updater upgrades explicitly (no auto-rollback). - decnet/updater/: executor, FastAPI app, uvicorn launcher - decnet/swarm/updater_client.py, tar_tree.py: master-side push - cli: `decnet updater`, `decnet swarm update [--host\|--all] [--include-self] [--dry-run]`, `--updater` on `swarm enroll` - enrollment API issues a second cert (CN=updater@<host>) signed by the same CA; SwarmHost records updater_cert_fingerprint - tests: executor, app, CLI, tar tree, enroll-with-updater (37 new) - wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link	2026-04-18 21:40:21 -04:00
anti	8914c27220	feat(swarm): add `decnet swarm deckies` to list deployed shards by host `swarm list` only shows enrolled workers — there was no way to see which deckies are running and where. Adds GET /swarm/deckies on the controller (joins DeckyShard with SwarmHost for name/address/status) plus the CLI wrapper with --host / --state filters and --json.	2026-04-18 21:10:07 -04:00
anti	4db9c7464c	fix(swarm): relocalize master-built config on worker before deploy deploy --mode swarm was failing on every heterogeneous fleet: the master populates config.interface from its own box (detect_interface() → its default NIC), then ships that verbatim. The worker's deployer then calls get_host_ip(config.interface), hits 'ip addr show wlp6s0' on a VM whose NIC is enp0s3, and 500s. Fix: agent.executor._relocalize() runs on every swarm-mode deploy. Re-detects the worker's interface/subnet/gateway/host_ip locally and swaps them into the config before calling deployer.deploy(). When the worker's subnet doesn't match the master's, decky IPs are re-allocated from the worker's subnet via allocate_ips() so they're reachable. Unihost-mode configs are left untouched — they're already built against the local box and second-guessing them would be wrong. Validated against anti@192.168.1.13: master dispatched interface=wlp6s0, agent logged 'relocalized interface=enp0s3', deployer ran successfully, dry-run returned ok=deployed. 4 new tests cover both branches (matching-subnet preserves decky IPs; mismatch re-allocates), the end-to-end executor.deploy() path, and the unihost short-circuit.	2026-04-18 20:41:21 -04:00
anti	411a797120	feat(cli): add decnet swarm check wrapper for POST /swarm/check The swarmctl API already exposes POST /swarm/check — an active mTLS probe that refreshes SwarmHost.status + last_heartbeat for every enrolled worker. The CLI was missing a wrapper, so operators had to curl the endpoint directly (which is how the VM validation run did it, and how the wiki Deployment-Modes / SWARM-Mode pages ended up doc'ing a command that didn't exist yet). Matches the existing list/enroll/decommission pattern: typer subcommand under swarm_app, --url override, Rich table output plus --json for scripting. Three tests: populated table, empty-swarm path, and --json emission.	2026-04-18 20:28:34 -04:00
anti	3da5a2c4ee	feat(cli): add decnet listener + --agent-dir on agent New `decnet listener` command runs the master-side RFC 5425 syslog-TLS receiver as a standalone process (mirrors `decnet api` / `decnet swarmctl` pattern, SIGTERM/SIGINT handlers, --daemon support). `decnet agent` now accepts --agent-dir so operators running the worker agent under sudo/root can point at a bundle outside /root/.decnet/agent (the HOME under sudo propagation). Both flags were needed to stand up the full SWARM pipeline end-to-end on a throwaway VM: mTLS control plane reachable, syslog-over-TLS wire confirmed via tcpdump, master-crash/resume proved with zero loss and zero duplication across 10 forwarded lines. pyproject: bump asyncmy floor to 0.2.11 (resolver already pulled this in).	2026-04-18 20:15:25 -04:00
anti	1e8ca4cc05	feat(swarm-cli): add `decnet swarm {enroll,list,decommission}` + `deploy --mode swarm` New sub-app talks HTTP to the local swarm controller (127.0.0.1:8770 by default; override with --url or $DECNET_SWARMCTL_URL). - enroll: POSTs /swarm/enroll, prints fingerprint, optionally writes ca.crt/worker.crt/worker.key to --out-dir for scp to the worker. - list: renders enrolled workers as a rich table (with --status filter). - decommission: looks up uuid by --name, confirms, DELETEs. deploy --mode swarm now: 1. fetches enrolled+active workers from the controller, 2. round-robin-assigns host_uuid to each decky, 3. POSTs the sharded DecnetConfig to /swarm/deploy, 4. renders per-worker pass/fail in a results table. Exits non-zero if no workers exist or any worker's dispatch failed.	2026-04-18 19:52:37 -04:00
anti	a6430cac4c	feat(swarm): add `decnet forwarder` CLI to run syslog-over-TLS forwarder The forwarder module existed but had no runner — closes that gap so the worker-side process can actually be launched and runs isolated from the agent (asyncio.run + SIGTERM/SIGINT → stop_event). Guards: refuses to start without a worker cert bundle or a resolvable master host ($DECNET_SWARM_MASTER_HOST or --master-host).	2026-04-18 19:41:37 -04:00
anti	39d2077a3a	feat(swarm): syslog-over-TLS log pipeline (RFC 5425, TCP 6514) Worker-side log_forwarder tails the local RFC 5424 log file and ships each line as an octet-counted frame to the master over mTLS. Offset is persisted in a tiny local SQLite so master outages never cause loss or duplication — reconnect resumes from the exact byte where the previous session left off. Impostor workers (cert not signed by DECNET CA) are rejected at TLS handshake. Master-side log_listener terminates mTLS on 0.0.0.0:6514, validates the client cert, extracts the peer CN as authoritative worker provenance, and appends each frame to the master's ingest log files. Attacker- controlled syslog HOSTNAME field is ignored — the CA-controlled CN is the only source of provenance. 7 tests added covering framing codec, offset persistence across reopens, end-to-end mTLS delivery, crash-resilience (offset survives restart, no duplicate shipping), and impostor-CA rejection. DECNET_SWARM_SYSLOG_PORT / DECNET_SWARM_MASTER_HOST env bindings added.	2026-04-18 19:33:58 -04:00
anti	e2d6f857b5	refactor(swarm): move router DTOs into decnet/web/db/models.py _schemas.py was a local exception to the codebase convention. The rest of the app keeps all API request/response DTOs in decnet/web/db/models.py alongside UserResponse, DeployIniRequest, etc. — the swarm endpoints now follow the same convention (SwarmEnrollRequest, SwarmHostView, etc). Deletes decnet/web/router/swarm/_schemas.py.	2026-04-18 19:28:15 -04:00
anti	811136e600	refactor(swarm): one file per endpoint, matching existing router layout Splits the three grouped router files into eight api_<verb>_<resource>.py modules under decnet/web/router/swarm/ to match the convention used by router/fleet/ and router/config/. Shared request/response models live in _schemas.py. Keeps each endpoint easy to locate and modify without stepping on siblings.	2026-04-18 19:23:06 -04:00
anti	63b0a58527	feat(swarm): master-side SWARM controller (swarmctl) + agent CLI Adds decnet/web/swarm_api.py as an independent FastAPI app with routers for host enrollment, deployment dispatch (sharding DecnetConfig across enrolled workers via AgentClient), and active health probing. Runs as its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for the worker side. 29 tests added under tests/swarm/test_swarm_api.py cover enrollment (including bundle generation + duplicate rejection), host CRUD, sharding correctness, non-swarm-mode rejection, teardown, and health probes with a stubbed AgentClient.	2026-04-18 19:18:33 -04:00
anti	cd0057c129	feat(swarm): DeckyConfig.host_uuid + fix agent log/status field refs - decnet.models.DeckyConfig grows an optional 'host_uuid' (the SwarmHost that runs this decky). Defaults to None so legacy unihost state files deserialize unchanged. - decnet.agent.executor: replace non-existent config.name references with config.mode / config.interface in logs and status payload. - tests/swarm/test_state_schema.py covers legacy-dict roundtrip, field default, and swarm-mode assignments.	2026-04-18 19:10:25 -04:00
anti	0c77cdab32	feat(swarm): master AgentClient — mTLS httpx wrapper around worker API decnet.swarm.client exposes: - MasterIdentity / ensure_master_identity(): the master's own CA-signed client bundle, issued once into ~/.decnet/ca/master/. - AgentClient: async-context httpx wrapper that talks to a worker agent over mTLS. health/status/deploy/teardown methods mirror the agent API. SSL context is built from a bare ssl.SSLContext(PROTOCOL_TLS_CLIENT) instead of httpx.create_ssl_context — the latter layers on default-CA and purpose logic that broke private-CA mTLS. Server cert is pinned by CA + chain, not DNS (workers enroll with arbitrary SANs). tests/swarm/test_client_agent_roundtrip.py spins uvicorn in-process with real certs on disk and verifies: - A CA-signed master client passes health + status calls. - An impostor whose cert comes from a different CA cannot connect.	2026-04-18 19:08:36 -04:00
anti	8257bcc031	feat(swarm): worker agent + fix pre-existing base_repo coverage test Worker agent (decnet.agent): - mTLS FastAPI service exposing /deploy, /teardown, /status, /health, /mutate. uvicorn enforces CERT_REQUIRED with the DECNET CA pinned. - executor.py offloads the blocking deployer onto asyncio.to_thread so the event loop stays responsive. - server.py refuses to start without an enrolled bundle in ~/.decnet/agent/ — unauthenticated agents are not a supported mode. - docs/openapi disabled on the agent — narrow attack surface. tests/test_base_repo.py: DummyRepo was missing get_attacker_artifacts (pre-existing abstractmethod) and so could not be instantiated. Added the stub + coverage for the new swarm CRUD surface on BaseRepository.	2026-04-18 07:15:53 -04:00
anti	d3b90679c5	feat(swarm): PKI module — self-managed CA for master/worker mTLS decnet.swarm.pki provides: - generate_ca() / ensure_ca() — self-signed root, PKCS8 PEM, 4096-bit. - issue_worker_cert() — per-worker keypair + cert signed by the CA with serverAuth + clientAuth EKU so the same identity backs the agent's HTTPS endpoint AND the syslog-over-TLS upstream. - write_worker_bundle() / load_worker_bundle() — persist with 0600 on private keys. - fingerprint() — SHA-256 DER hex for master-side pinning. tests/swarm/test_pki.py covers: - CA idempotency on disk. - Signed chain validates against CA subject. - SAN population (DNS + IP). - Bundle roundtrip with 0600 key perms. - End-to-end mTLS handshake between two CA-issued peers. - Cross-CA client rejection (handshake fails).	2026-04-18 07:09:58 -04:00
anti	6657d3e097	feat(swarm): add SwarmHost and DeckyShard tables + repo CRUD Introduces the master-side persistence layer for swarm mode: - SwarmHost: enrolled worker metadata, cert fingerprint, heartbeat. - DeckyShard: per-decky host assignment, state, last error. Repo methods are added as default-raising on BaseRepository so unihost deployments are untouched; SQLModelRepository implements them (shared between the sqlite and mysql subclasses per the existing pattern).	2026-04-18 07:09:29 -04:00
anti	293da364a6	chores: fix linting	2026-04-18 06:46:10 -04:00
anti	2bf886e18e	feat(sniffer): probe ipvlan host iface when macvlan is absent The host-side sniffer interface depends on the deploy's driver choice (--ipvlan flag). Instead of hardcoding HOST_MACVLAN_IFACE, probe both names and pick whichever exists; warn and disable cleanly if neither is present. Explicit DECNET_SNIFFER_IFACE still wins.	2026-04-18 05:37:20 -04:00
anti	8bdc5b98c9	feat(collector): parse real PROCID and extract IPs from logger kv pairs - Relaxed RFC 5424 regex to accept either NILVALUE or a numeric PROCID; sshd / sudo go through rsyslog with their real PID, while syslog_bridge emitters keep using '-'. - Added a fallback pass that scans the MSG body for IP-shaped key=value tokens. This rescues attacker attribution for plain logger callers like the SSH PROMPT_COMMAND shim, which emits 'CMD … src=IP …' without SD-element params.	2026-04-18 05:37:08 -04:00
anti	41fd496128	feat(web): attacker artifacts endpoint + UI drawer Adds the server-side wiring and frontend UI to surface files captured by the SSH honeypot for a given attacker. - New repository method get_attacker_artifacts (abstract + SQLModel impl) that joins the attacker's IP to `file_captured` log rows. - New route GET /attackers/{uuid}/artifacts. - New router /artifacts/{decky}/{service}/{stored_as} that streams a quarantined file back to an authenticated viewer. - AttackerDetail grows an ArtifactDrawer panel with per-file metadata (sha256, size, orig_path) and a download action. - ssh service fragment now sets NODE_NAME=decky_name so logs and the host-side artifacts bind-mount share the same decky identifier.	2026-04-18 05:36:48 -04:00
anti	8dd4c78b33	refactor: strip DECNET tokens from container-visible surface Rename the container-side logging module decnet_logging → syslog_bridge (canonical at templates/syslog_bridge.py, synced into each template by the deployer). Drop the stale per-template copies; setuptools find was picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay" so no obvious token appears in the rendered container image. Apply the same cloaking pattern to the telnet template that SSH got: syslog pipe moves to /run/systemd/journal/syslog-relay and the relay is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename 99-decnet.conf → 50-journal-forward.conf. SSH capture script: /var/decnet/captured → /var/lib/systemd/coredump (real systemd path), logger tag decnet-capture → systemd-journal. Compose volume updated to match the new in-container quarantine path. SD element ID shifts decnet@55555 → relay@55555; synced across collector, parser, sniffer, prober, formatter, tests, and docs so the host-side pipeline still matches what containers emit.	2026-04-17 22:57:53 -04:00
anti	a773dddd5c	feat(ssh): capture attacker-dropped files with session attribution inotifywait watches writable paths in the SSH decky and mirrors any file close_write/moved_to into a per-decky host-mounted quarantine dir. Each artifact carries a .meta.json with attacker attribution resolved by walking the writer PID's PPid chain to the sshd session leader, then cross-referencing ss and utmp for source IP/user/login time. Also emits an RFC 5424 syslog line per capture for SIEM correlation.	2026-04-17 22:20:05 -04:00
anti	fb69a06ab3	fix(db): detach session cleanup onto fresh task on cancellation Previous attempt (shield + sync invalidate fallback) didn't work because shield only protects against cancellation from other tasks. When the caller task itself is cancelled mid-query, its next await re-raises CancelledError as soon as the shielded coroutine yields — rollback inside session.close() never completes, the aiomysql connection is orphaned, and the pool logs 'non-checked-in connection' when GC finally reaches it. Hand exception-path cleanup to loop.create_task() so the new task isn't subject to the caller's pending cancellation. close() (and the invalidate() fallback for a dead connection) runs to completion. Success path is unchanged — still awaits close() inline so callers see commit visibility and pool release before proceeding.	2026-04-17 21:13:43 -04:00
anti	1446f6da94	fix(db): invalidate pool connection when cancelled close fails Under high-concurrency MySQL load, uvicorn cancels request tasks when clients disconnect. If cancellation lands mid-query, session.close() tries to ROLLBACK on a connection that aiomysql has already marked as closed — raising InterfaceError("Cancelled during execution") and leaving the connection checked-out until GC, which the pool then warns about as a 'non-checked-in connection'. The old fallback tried sync.rollback() + sync.close(), but those still go through the async driver and fail the same way on a dead connection. Replace them with session.sync_session.invalidate(), which just flips the pool's internal record — no I/O, so it can't be cancelled — and tells the pool to drop the connection immediately instead of waiting for garbage collection.	2026-04-17 21:04:04 -04:00
anti	e967aaabfb	perf: cache get_user_by_username on the login hot path Locust @task(2) hammers /auth/login in steady state on top of the on_start burst. After caching the uuid-keyed user lookup and every other read endpoint, login alone accounted for 47% of total _execute at 500c/u — pure DB queueing on SELECT users WHERE username=?. 5s TTL, positive hits only (misses bypass so a freshly-created user can log in immediately). Password verify still runs against the cached hash, so security is unchanged — the only staleness window is: a changed password accepts the old password for up to 5s until invalidate_user_cache fires (it's called on every write).	2026-04-17 20:36:39 -04:00
anti	255c2e5eb7	perf: cache auth user-lookup and admin list_users The per-request SELECT users WHERE uuid=? in require_role was the hidden tax behind every authed endpoint — it kept _execute at ~60% across the profile even after the page caches landed. Even /health (with its DB and Docker probes cached) was still 52% _execute from this one query. - dependencies.py: 10s TTL cache on get_user_by_uuid, well below JWT expiry. invalidate_user_cache(uuid) is called on password change, role change, and user delete. - api_get_config.py: 5s TTL cache on the admin branch's list_users() (previously fetched every /config call). Invalidated on user create/update/delete. - api_change_pass.py + api_manage_users.py: invalidation hooks on all user-mutating endpoints.	2026-04-17 19:56:39 -04:00
anti	2dd86fb3bb	perf: cache /bounty, /logs/histogram, /deckies; bump /config TTL to 5s Round-2 follow-up: profile at 500c/u showed _execute still dominating the uncached read endpoints (/bounty 76%, /logs/histogram 73%, /deckies 56%). Same router-level TTL pattern as /stats — 5s window, asyncio.Lock to collapse concurrent calls into one DB hit. - /bounty: cache default unfiltered page (limit=50, offset=0, bounty_type=None, search=None). Filtered requests bypass. - /logs/histogram: cache default (interval_minutes=15, no filters). Filtered / non-default interval requests bypass. - /deckies: cache full response (endpoint takes no params). - /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare, 1s was too short for bursts to coalesce at high concurrency.	2026-04-17 19:30:11 -04:00
anti	3106d03135	perf(db): default pool_pre_ping=false for SQLite SQLite is a local file — a SELECT 1 per session checkout is pure overhead. Env var DECNET_DB_POOL_PRE_PING stays for anyone running on a network-mounted volume. MySQL backend keeps its current default.	2026-04-17 19:11:07 -04:00
anti	3cc5ba36e8	fix(cli): keep FileNotFoundError handling on decnet api Popen moved inside the try so a missing uvicorn falls through to the existing error message instead of crashing the CLI. test_cli was still patching the old subprocess.run entrypoint; switched both api command tests to patch subprocess.Popen / os.killpg to match the current path.	2026-04-17 19:09:15 -04:00
anti	6301504c0e	perf(api): TTL-cache /stats + unfiltered pagination counts Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT attacker_ip) FROM logs; every /logs and /attackers call ran an unfiltered count for the paginator. At 500 concurrent users these serialize through aiosqlite's worker threads and dominate wall time. Cache at the router layer (repo stays dialect-agnostic): - /stats response: 5s TTL - /logs total (only when no filters): 2s TTL - /attackers total (only when no filters): 2s TTL Filtered paths bypass the cache. Pattern reused from api_get_config and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).	2026-04-17 19:09:15 -04:00
anti	de4b64d857	perf(auth): avoid duplicate user lookup in require_role require_role._check previously chained from get_current_user, which already loaded the user — then looked it up again. Inline the decode + single user fetch + must_change_password + role check so every authenticated request costs one SELECT users WHERE uuid=? instead of two.	2026-04-17 17:48:42 -04:00
anti	b5d7bf818f	feat(health): 3-tier status (healthy / degraded / unhealthy) Only database, docker, and ingestion_worker now count as critical (→ 503 unhealthy). attacker/sniffer/collector failures drop overall status to degraded (still 200) so the dashboard doesn't panic when a non-essential worker isn't running.	2026-04-17 17:48:42 -04:00
anti	a10aee282f	perf(ingester): batch log writes into bulk commits The ingester now accumulates up to DECNET_BATCH_SIZE rows (default 100) or DECNET_BATCH_MAX_WAIT_MS (default 250ms) before flushing through repo.add_logs — one transaction, one COMMIT per batch instead of per row. Under attacker traffic this collapses N commits into ⌈N/100⌉ and takes most of the SQLite writer-lock contention off the hot path. Flush semantics are cancel-safe: _position only advances after a batch commits successfully, and the flush helper bails without touching the DB if the enclosing task is being cancelled (lifespan teardown). Un-flushed lines stay in the file and are re-read on next startup. Tests updated to assert on add_logs (bulk) instead of the per-row add_log that the ingester no longer uses, plus a new test that 250 lines flush in ≤5 calls.	2026-04-17 16:37:34 -04:00
anti	11b9e85874	feat(db): bulk add_logs for one-commit ingestion batches Adds BaseRepository.add_logs (default: loops add_log for backwards compatibility) and a real single-session/single-commit implementation on SQLModelRepository. Introduces DECNET_BATCH_SIZE (default 100) and DECNET_BATCH_MAX_WAIT_MS (default 250) so the ingester can flush on either a size or a time bound when it adopts the new method. Ingester wiring is deferred to a later pass — the single-log path was deadlocking tests when flushed during lifespan teardown, so this change ships the DB primitive alone.	2026-04-17 16:23:09 -04:00
anti	45039bd621	fix(cache): lazy-init TTL cache locks to survive event-loop turnover A module-level asyncio.Lock binds to the loop it was first awaited on. Under pytest-anyio (and xdist) each test spins up a new loop; any later test that hit /health or /config would wait on a lock owned by a dead loop and the whole worker would hang. Create the lock on first use and drop it in the test-reset helpers so a fresh loop always gets a fresh lock.	2026-04-17 16:23:00 -04:00
anti	4ea1c2ff4f	fix(health): move Docker client+ping off the event loop Under CPU saturation the sync docker.from_env()/ping() calls could miss their socket timeout, cache _docker_healthy=False, and return 503 for the full 5s TTL window. Both calls now run on a thread so the event loop keeps serving other requests while Docker is being probed.	2026-04-17 15:43:51 -04:00
anti	bb8d782e42	fix(cli): kill uvicorn worker tree on Ctrl+C With --workers > 1, SIGINT from the terminal raced uvicorn's supervisor: some workers got signaled directly, the supervisor respawned them, and the result behaved like a forkbomb. Start uvicorn in its own session and signal the whole process group (SIGTERM → 10s grace → SIGKILL) when we catch KeyboardInterrupt.	2026-04-17 15:32:08 -04:00
anti	342916ca63	feat(cli): expose --workers on `decnet api` Forwards straight to uvicorn's --workers. Default stays at 1 so the single-worker efficiency direction is preserved; raising it is available for threat-actor load scenarios where the honeypot needs to soak real attack traffic without queueing on one event loop.	2026-04-17 15:22:45 -04:00
anti	32340bea0d	perf: migrate hot-path JSON serialization to orjson stdlib json was FastAPI's default. Every response body, every SSE frame, and every add_log/state/payload write paid the stdlib encode cost. - pyproject.toml: add orjson>=3.10 as a core dep. - decnet/web/api.py: default_response_class=ORJSONResponse on the FastAPI app, so every endpoint return goes through orjson without touching call sites. Explicit JSONResponse sites in the validation exception handlers migrated to ORJSONResponse for consistency. - health endpoint's explicit JSONResponse → ORJSONResponse. - SSE stream (api_stream_events.py): 6 json.dumps call sites → orjson.dumps(...).decode() — the per-event frames that fire on every sse tick. - sqlmodel_repo.py: encode sites on the log-insert path switched to orjson (fields, payload, state value). Parser sites (json.loads) left as-is for now — not on the measured hot path.	2026-04-17 15:07:28 -04:00
anti	f1e14280c0	perf: 1s TTL cache for /health DB probe and /config state reads Locust hit /health and /config on every @task(3), so each request was firing repo.get_total_logs() and two repo.get_state() calls against aiosqlite — filling the driver queue for data that changes on the order of seconds, not milliseconds. Both caches follow the shape already used by the existing Docker cache: - asyncio.Lock with double-checked TTL so concurrent callers collapse into one DB hit per 1s window. - _reset_* helpers called from tests/api/conftest.py::setup_db so the module-level cache can't leak across tests. tests/test_health_config_cache.py asserts 50 concurrent callers produce exactly 1 repo call, and the cache expires after TTL.	2026-04-17 15:05:18 -04:00
anti	931f33fb06	perf: cache Docker daemon ping in /health (5s TTL) Creating a new docker.from_env() client per /health request opened a fresh unix-socket connection each time. Under load that's wasteful and hammers dockerd. Keep a module-level client + last-check timestamp; actually ping every 5 seconds, return cached state in between. Reset helper provided for tests.	2026-04-17 15:01:53 -04:00
anti	467511e997	db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL - aiomysql → asyncmy on both sides of the URL/import (faster, maintained). - Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE / PRE_PING for both SQLite and MySQL engines so stress runs can bump without code edits. - MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so concurrent uvicorn workers racing create_all() don't hit 'Table was skipped since its definition is being modified by concurrent DDL'. - sqlite & mysql repo get_log_histogram use the shared _session() helper instead of session_factory() for consistency with the rest of the repo. - SSE stream_events docstring updated to asyncmy.	2026-04-17 15:01:49 -04:00
anti	3945e72e11	perf: run bcrypt on a thread so it doesn't block the event loop verify_password / get_password_hash are CPU-bound and take ~250ms each at rounds=12. Called directly from async endpoints, they stall every other coroutine for that window — the single biggest single-worker bottleneck on the login path. Adds averify_password / ahash_password that wrap the sync versions in asyncio.to_thread. Sync versions stay put because _ensure_admin_user and tests still use them. 5 call sites updated: login, change-password, create-user, reset-password. tests/test_auth_async.py asserts parallel averify runs concurrently (~1x of a single verify, not 2x).	2026-04-17 14:52:22 -04:00
anti	bd406090a7	fix: re-seed admin password when still unfinalized (must_change_password=True) _ensure_admin_user was strict insert-if-missing: once a stale hash landed in decnet.db (e.g. from a deploy that used a different DECNET_ADMIN_PASSWORD), login silently 401'd because changing the env var later had no effect. Now on startup: if the admin still has must_change_password=True (they never finalized their own password), re-sync the hash from the current env var. Once the admin sets a real password, we leave it alone. Found via locustfile.py login storm — see tests/test_admin_seed.py. Note: this commit also bundles uncommitted pool-management work already present in sqlmodel_repo.py from prior sessions.	2026-04-17 14:49:13 -04:00

1 2 3 4

196 Commits