DECNET

Author	SHA1	Message	Date
anti	df18cb44cc	fix(swarm): don't paint healthy deckies as failed when a shard-sibling fails docker compose up is partial-success-friendly — a build failure on one service doesn't roll back the others. But the master was catching the agent's 500 and tagging every decky in the shard as 'failed' with the same error message. From the UI that looked like all three deckies died even though two were live on the worker. On dispatch exception, probe the agent's /status to learn which deckies actually have running containers, and upsert per-decky state accordingly. Only fall back to marking the whole shard failed if the status probe itself is unreachable. Enhance agent.executor.status() to include a 'runtime' map keyed by decky name with per-service container state, so the master has something concrete to consult.	2026-04-19 20:11:08 -04:00
anti	e8e11b2896	feat(web-ui): show decky IP on SwarmDeckies, drop compose-hash column Operators want to know what address to poke when triaging a swarm decky; the compose-hash column was debug scaffolding that never paid off. DeckyShard has no IP column (the deploy-time IP lives on DecnetConfig), so the list endpoint resolves it at read time by joining shards against the stored deployment state by decky_name. Missing lookups render as "—" rather than erroring — the list stays useful even after a master restart that hasn't persisted a config yet.	2026-04-19 19:48:27 -04:00
anti	5dad1bb315	feat(swarm): remote teardown API + UI (per-decky and per-host) Agents already exposed POST /teardown; the master was missing the plumbing to reach it. Add: - POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body {decky_id: str\|null}: null tears the whole host, a value tears one decky. On worker failure the master returns 502 and leaves DB shards intact so master and agent stay aligned. - BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky cleanup after a single-decky teardown. - SwarmHosts page: "Teardown all" button (keeps host enrolled). - SwarmDeckies page: per-row "Teardown" button. Also exclude setuptools' build/ staging dir from the enrollment tarball — `pip install -e` on the master generates build/lib/decnet_web/node_modules and the bundle walker was leaking it to agents. Align pyproject's bandit exclude with the git-hook invocation so both skip decnet/templates/.	2026-04-19 19:39:28 -04:00
anti	2bef3edb72	feat(swarm): unbundle master-only code from agent tarball + sync systemd units on update Agents now ship with collector/prober/sniffer as systemd services; mutator, profiler, web, and API stay master-only (profiler rebuilds attacker profiles against the master DB — no per-host DB exists). Expand _EXCLUDES to drop the full decnet/web, decnet/mutator, decnet/profiler, and decnet_web trees from the enrollment bundle. Updater now calls _heal_path_symlink + _sync_systemd_units after rotation so fleets pick up new unit files and /usr/local/bin/decnet tracks the shared venv without a manual reinstall. daemon-reload runs once per update when any unit changed. Fix _service_registry matchers to accept systemd-style /usr/local/bin/decnet cmdlines (psutil returns a list — join to string before substring-checking) so agent-mode `decnet status` reports collector/prober/sniffer correctly.	2026-04-19 19:19:17 -04:00
anti	6d7877c679	feat(swarm): per-host microservices as systemd units, mutator off agents Previously `decnet status` on an agent showed every microservice as DOWN because deploy's auto-spawn was unihost-scoped and the agent CLI gate hid the per-host commands. Now: - collect, probe, profiler, sniffer drop out of MASTER_ONLY_COMMANDS (they run per-host; master-side work stays master-gated). - mutate stays master-only (it orchestrates swarm-wide respawns). - decnet/mutator/ excluded from agent tarballs — never invoked there. - decnet/web exclusion tightened: ship db/ + auth.py + dependencies.py (profiler needs the repo singleton), drop api.py, swarm_api.py, ingester.py, router/, templates/. - Four new systemd unit templates (decnet-collector/prober/profiler/ sniffer) shipped in every enrollment tarball. - enroll_bootstrap.sh enables + starts all four alongside agent and forwarder at install time. - updater restarts the aux units on code push so they pick up the new release (best-effort — legacy enrollments without the units won't fail the update). - status table hides Mutator + API rows in agent mode.	2026-04-19 18:58:48 -04:00
anti	ee9ade4cd5	feat(enroll): strip master API and frontend from agent tarball Agents never run the FastAPI master app (decnet/web/) or serve the React frontend (decnet_web/) — they run decnet.agent, decnet.updater, and decnet.forwarder, none of which import decnet.web. Shipping the master tree bloats every enrollment payload and needlessly widens the worker's attack surface. Excluded paths are unreachable on the worker (all cli.py imports of decnet.web are inside master-only command bodies that the agent-mode gate strips). Tests assert neither tree leaks into the tarball.	2026-04-19 18:47:03 -04:00
anti	dad29249de	fix(updater): align bootstrap layout with updater; log update phases The bootstrap was installing into /opt/decnet/.venv with an editable `pip install -e .`, and /usr/local/bin/decnet pointed there. The updater writes releases to /opt/decnet/releases/active/ with a shared venv at /opt/decnet/venv — a parallel tree nothing on the box actually runs. Result: updates appeared to succeed (release dir rotated, SHA changed) but systemd kept executing the untouched bootstrap code. Changes: - Bootstrap now installs directly into /opt/decnet/releases/active with the shared venv at /opt/decnet/venv and /opt/decnet/current symlinked. Same layout the updater rotates in and out of. - /usr/local/bin/decnet -> /opt/decnet/venv/bin/decnet. - run_update / run_update_self heal /usr/local/bin/decnet on every push so already-enrolled hosts recover on the next update instead of needing a re-enroll. - run_update / run_update_self now log each phase (receive, extract, pip install, rotate, restart, probe) so the updater log actually shows what happened.	2026-04-19 18:39:11 -04:00
anti	a0a241f65d	feat(enroll): decnet-updater now runs under systemd, not a --daemon fork Bootstrap used to end with `decnet updater --daemon` which forks and detaches — invisible to systemctl, no auto-restart, dies on reboot. Ships a decnet-updater.service template matching the pattern of the other units (Restart=on-failure, log to /var/log/decnet/decnet.updater.log, certs from /etc/decnet/updater, install tree at /opt/decnet), bundles it alongside agent/forwarder/engine units, and the installer now `systemctl enable --now`s it when --with-updater is set.	2026-04-19 18:19:24 -04:00
anti	5df995fda1	feat(enroll): opt-in IPvlan per-agent for Wi-Fi-bridged VMs Wi-Fi APs bind one MAC per associated station, so VirtualBox/VMware guests bridged over Wi-Fi rotate the VM's DHCP lease when Docker's macvlan starts emitting container-MAC frames through the vNIC. Adds a `use_ipvlan` toggle on the Agent Enrollment tab (mirrors the updater daemon checkbox): flips the flag on SwarmHost, bakes `ipvlan=true` into the agent's decnet.ini, and `_worker_config` forces ipvlan=True on the per-host shard at dispatch. Safe no-op on wired/bare-metal agents.	2026-04-19 17:57:45 -04:00
anti	6d7567b6bb	fix(fleet): reset stale host_uuid on carried-over deckies before dispatch Deckies merged in from a prior deployment's saved state kept their original host_uuid — which dispatch_decnet_config then 404'd on if that host had since been decommissioned or re-enrolled at a different uuid. Before round-robin assignment, drop any host_uuid that isn't in the live swarm_hosts set so orphaned entries get reassigned instead of exploding with 'unknown host_uuid'.	2026-04-19 06:27:34 -04:00
anti	dbaccde143	fix(swarm-updates): offload tarball build to worker thread tar_working_tree (walks repo + gzips several MB) and detect_git_sha (shells out) were called directly on the event loop, so /swarm-updates/push and /swarm-updates/push-self froze every other request until the tarball was ready. Wrap both in asyncio.to_thread.	2026-04-19 06:21:27 -04:00
anti	b883f24ba2	fix(engine): pin docker compose project name to avoid empty-basename failure systemd daemons run with WorkingDirectory=/ by default; docker compose derives the project name from basename(cwd), which is empty at '/', and aborts with 'project name must not be empty'. Pass -p decnet explicitly so the project name is independent of cwd, and set WorkingDirectory=/opt/decnet on the three DECNET units so compose artifacts (decnet-compose.yml, build contexts) also land in the install dir.	2026-04-19 06:17:30 -04:00
anti	79db999030	feat(fleet): auto-swarm deploy — shard across enrolled workers when master POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence: when the caller is a master with at least one reachable swarm host, round- robin host_uuids are assigned over new deckies and the config is dispatched via AgentClient. Falls back to local docker-compose otherwise. Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config so both endpoints share the same shard/dispatch/persist path. Adds GET /system/deployment-mode for the UI to show 'will shard across N hosts' vs 'will deploy locally' before the operator clicks deploy.	2026-04-19 06:09:08 -04:00
anti	cb1a1d1270	fix(fleet): defer DecnetConfig build until deckies are expanded Stateless /api/v1/deckies/deploy previously instantiated DecnetConfig with deckies=[] so it could merge entries later — but DecnetConfig.deckies is min_length=1, so Pydantic raised and the global handler mapped it to 422 'Internal data consistency error'. Construct the config after build_deckies_from_ini returns at least one DeckyConfig.	2026-04-19 06:02:26 -04:00
anti	899ea559d9	feat(enroll): systemd units for agent/forwarder/engine + log-directory INI key Rename log-file-path -> log-directory (maps to DECNET_LOG_DIRECTORY). Bundle now ships three systemd units rendered with agent_name/master_host and installs them into /etc/systemd/system/. Bootstrap replaces direct 'decnet X --daemon' calls with systemctl enable --now. Each unit pins DECNET_SYSTEM_LOGS so agent, forwarder, and deckies logs land at decnet.{agent,forwarder}.log and decnet.log under /var/log/decnet.	2026-04-19 05:46:08 -04:00
anti	e67b6d7f73	refactor(swarm-mgmt): move agent/updater certs to /etc/decnet (root-owned)	2026-04-19 05:32:39 -04:00
anti	bc5f43c3f7	feat(swarm-mgmt): probe-on-read for GET /swarm/hosts heartbeat + status	2026-04-19 05:26:35 -04:00
anti	ff4c993617	refactor(swarm-mgmt): backfill host address from agent's .tgz source IP	2026-04-19 05:20:29 -04:00
anti	e32fdf9cbf	feat(swarm-mgmt): agent_host + updater opt-in; prevent duplicate forwarder spawn	2026-04-19 05:12:55 -04:00
anti	95ae175e1b	fix(swarm-mgmt): exclude .env from bundle, chmod +x decnet, mkdir log	2026-04-19 04:58:55 -04:00
anti	b4df9ea0a1	fix(swarm-mgmt): bundle URLs target master_host, not dashboard base_url	2026-04-19 04:52:20 -04:00
anti	c6f7de30d2	feat(swarm-mgmt): agent enrollment bundle flow + admin swarm endpoints	2026-04-19 04:25:57 -04:00
anti	a266d6b17e	feat(web): Remote Updates API — dashboard endpoints for pushing code to workers Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind require_admin. Reuses the existing UpdaterClient + tar_working_tree + the per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is built exactly once per /push request and fanned out to every selected worker. /hosts filters out decommissioned hosts and agent-only enrollments (no updater bundle = not a target). Connection drops during /update-self are treated as success — the updater re-execs itself mid-response, so httpx always raises. Pydantic models live in decnet/web/db/models.py (single source of truth). 24 tests cover happy paths, rollback, transport failures, include_self ordering (skip on rolled-back agents), validation, and RBAC gating.	2026-04-19 01:01:09 -04:00
anti	7765b36c50	feat(updater): remote self-update daemon with auto-rollback Adds a separate `decnet updater` daemon on each worker that owns the agent's release directory and installs tarball pushes from the master over mTLS. A normal `/update` never touches the updater itself, so the updater is always a known-good rescuer if a bad agent push breaks /health — the rotation is reversed and the agent restarted against the previous release. `POST /update-self` handles updater upgrades explicitly (no auto-rollback). - decnet/updater/: executor, FastAPI app, uvicorn launcher - decnet/swarm/updater_client.py, tar_tree.py: master-side push - cli: `decnet updater`, `decnet swarm update [--host\|--all] [--include-self] [--dry-run]`, `--updater` on `swarm enroll` - enrollment API issues a second cert (CN=updater@<host>) signed by the same CA; SwarmHost records updater_cert_fingerprint - tests: executor, app, CLI, tar tree, enroll-with-updater (37 new) - wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link	2026-04-18 21:40:21 -04:00
anti	8914c27220	feat(swarm): add `decnet swarm deckies` to list deployed shards by host `swarm list` only shows enrolled workers — there was no way to see which deckies are running and where. Adds GET /swarm/deckies on the controller (joins DeckyShard with SwarmHost for name/address/status) plus the CLI wrapper with --host / --state filters and --json.	2026-04-18 21:10:07 -04:00
anti	e2d6f857b5	refactor(swarm): move router DTOs into decnet/web/db/models.py _schemas.py was a local exception to the codebase convention. The rest of the app keeps all API request/response DTOs in decnet/web/db/models.py alongside UserResponse, DeployIniRequest, etc. — the swarm endpoints now follow the same convention (SwarmEnrollRequest, SwarmHostView, etc). Deletes decnet/web/router/swarm/_schemas.py.	2026-04-18 19:28:15 -04:00
anti	811136e600	refactor(swarm): one file per endpoint, matching existing router layout Splits the three grouped router files into eight api_<verb>_<resource>.py modules under decnet/web/router/swarm/ to match the convention used by router/fleet/ and router/config/. Shared request/response models live in _schemas.py. Keeps each endpoint easy to locate and modify without stepping on siblings.	2026-04-18 19:23:06 -04:00
anti	63b0a58527	feat(swarm): master-side SWARM controller (swarmctl) + agent CLI Adds decnet/web/swarm_api.py as an independent FastAPI app with routers for host enrollment, deployment dispatch (sharding DecnetConfig across enrolled workers via AgentClient), and active health probing. Runs as its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for the worker side. 29 tests added under tests/swarm/test_swarm_api.py cover enrollment (including bundle generation + duplicate rejection), host CRUD, sharding correctness, non-swarm-mode rejection, teardown, and health probes with a stubbed AgentClient.	2026-04-18 19:18:33 -04:00
anti	6657d3e097	feat(swarm): add SwarmHost and DeckyShard tables + repo CRUD Introduces the master-side persistence layer for swarm mode: - SwarmHost: enrolled worker metadata, cert fingerprint, heartbeat. - DeckyShard: per-decky host assignment, state, last error. Repo methods are added as default-raising on BaseRepository so unihost deployments are untouched; SQLModelRepository implements them (shared between the sqlite and mysql subclasses per the existing pattern).	2026-04-18 07:09:29 -04:00
anti	293da364a6	chores: fix linting	2026-04-18 06:46:10 -04:00
anti	41fd496128	feat(web): attacker artifacts endpoint + UI drawer Adds the server-side wiring and frontend UI to surface files captured by the SSH honeypot for a given attacker. - New repository method get_attacker_artifacts (abstract + SQLModel impl) that joins the attacker's IP to `file_captured` log rows. - New route GET /attackers/{uuid}/artifacts. - New router /artifacts/{decky}/{service}/{stored_as} that streams a quarantined file back to an authenticated viewer. - AttackerDetail grows an ArtifactDrawer panel with per-file metadata (sha256, size, orig_path) and a download action. - ssh service fragment now sets NODE_NAME=decky_name so logs and the host-side artifacts bind-mount share the same decky identifier.	2026-04-18 05:36:48 -04:00
anti	fb69a06ab3	fix(db): detach session cleanup onto fresh task on cancellation Previous attempt (shield + sync invalidate fallback) didn't work because shield only protects against cancellation from other tasks. When the caller task itself is cancelled mid-query, its next await re-raises CancelledError as soon as the shielded coroutine yields — rollback inside session.close() never completes, the aiomysql connection is orphaned, and the pool logs 'non-checked-in connection' when GC finally reaches it. Hand exception-path cleanup to loop.create_task() so the new task isn't subject to the caller's pending cancellation. close() (and the invalidate() fallback for a dead connection) runs to completion. Success path is unchanged — still awaits close() inline so callers see commit visibility and pool release before proceeding.	2026-04-17 21:13:43 -04:00
anti	1446f6da94	fix(db): invalidate pool connection when cancelled close fails Under high-concurrency MySQL load, uvicorn cancels request tasks when clients disconnect. If cancellation lands mid-query, session.close() tries to ROLLBACK on a connection that aiomysql has already marked as closed — raising InterfaceError("Cancelled during execution") and leaving the connection checked-out until GC, which the pool then warns about as a 'non-checked-in connection'. The old fallback tried sync.rollback() + sync.close(), but those still go through the async driver and fail the same way on a dead connection. Replace them with session.sync_session.invalidate(), which just flips the pool's internal record — no I/O, so it can't be cancelled — and tells the pool to drop the connection immediately instead of waiting for garbage collection.	2026-04-17 21:04:04 -04:00
anti	e967aaabfb	perf: cache get_user_by_username on the login hot path Locust @task(2) hammers /auth/login in steady state on top of the on_start burst. After caching the uuid-keyed user lookup and every other read endpoint, login alone accounted for 47% of total _execute at 500c/u — pure DB queueing on SELECT users WHERE username=?. 5s TTL, positive hits only (misses bypass so a freshly-created user can log in immediately). Password verify still runs against the cached hash, so security is unchanged — the only staleness window is: a changed password accepts the old password for up to 5s until invalidate_user_cache fires (it's called on every write).	2026-04-17 20:36:39 -04:00
anti	255c2e5eb7	perf: cache auth user-lookup and admin list_users The per-request SELECT users WHERE uuid=? in require_role was the hidden tax behind every authed endpoint — it kept _execute at ~60% across the profile even after the page caches landed. Even /health (with its DB and Docker probes cached) was still 52% _execute from this one query. - dependencies.py: 10s TTL cache on get_user_by_uuid, well below JWT expiry. invalidate_user_cache(uuid) is called on password change, role change, and user delete. - api_get_config.py: 5s TTL cache on the admin branch's list_users() (previously fetched every /config call). Invalidated on user create/update/delete. - api_change_pass.py + api_manage_users.py: invalidation hooks on all user-mutating endpoints.	2026-04-17 19:56:39 -04:00
anti	2dd86fb3bb	perf: cache /bounty, /logs/histogram, /deckies; bump /config TTL to 5s Round-2 follow-up: profile at 500c/u showed _execute still dominating the uncached read endpoints (/bounty 76%, /logs/histogram 73%, /deckies 56%). Same router-level TTL pattern as /stats — 5s window, asyncio.Lock to collapse concurrent calls into one DB hit. - /bounty: cache default unfiltered page (limit=50, offset=0, bounty_type=None, search=None). Filtered requests bypass. - /logs/histogram: cache default (interval_minutes=15, no filters). Filtered / non-default interval requests bypass. - /deckies: cache full response (endpoint takes no params). - /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare, 1s was too short for bursts to coalesce at high concurrency.	2026-04-17 19:30:11 -04:00
anti	3106d03135	perf(db): default pool_pre_ping=false for SQLite SQLite is a local file — a SELECT 1 per session checkout is pure overhead. Env var DECNET_DB_POOL_PRE_PING stays for anyone running on a network-mounted volume. MySQL backend keeps its current default.	2026-04-17 19:11:07 -04:00
anti	6301504c0e	perf(api): TTL-cache /stats + unfiltered pagination counts Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT attacker_ip) FROM logs; every /logs and /attackers call ran an unfiltered count for the paginator. At 500 concurrent users these serialize through aiosqlite's worker threads and dominate wall time. Cache at the router layer (repo stays dialect-agnostic): - /stats response: 5s TTL - /logs total (only when no filters): 2s TTL - /attackers total (only when no filters): 2s TTL Filtered paths bypass the cache. Pattern reused from api_get_config and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).	2026-04-17 19:09:15 -04:00
anti	de4b64d857	perf(auth): avoid duplicate user lookup in require_role require_role._check previously chained from get_current_user, which already loaded the user — then looked it up again. Inline the decode + single user fetch + must_change_password + role check so every authenticated request costs one SELECT users WHERE uuid=? instead of two.	2026-04-17 17:48:42 -04:00
anti	b5d7bf818f	feat(health): 3-tier status (healthy / degraded / unhealthy) Only database, docker, and ingestion_worker now count as critical (→ 503 unhealthy). attacker/sniffer/collector failures drop overall status to degraded (still 200) so the dashboard doesn't panic when a non-essential worker isn't running.	2026-04-17 17:48:42 -04:00
anti	a10aee282f	perf(ingester): batch log writes into bulk commits The ingester now accumulates up to DECNET_BATCH_SIZE rows (default 100) or DECNET_BATCH_MAX_WAIT_MS (default 250ms) before flushing through repo.add_logs — one transaction, one COMMIT per batch instead of per row. Under attacker traffic this collapses N commits into ⌈N/100⌉ and takes most of the SQLite writer-lock contention off the hot path. Flush semantics are cancel-safe: _position only advances after a batch commits successfully, and the flush helper bails without touching the DB if the enclosing task is being cancelled (lifespan teardown). Un-flushed lines stay in the file and are re-read on next startup. Tests updated to assert on add_logs (bulk) instead of the per-row add_log that the ingester no longer uses, plus a new test that 250 lines flush in ≤5 calls.	2026-04-17 16:37:34 -04:00
anti	11b9e85874	feat(db): bulk add_logs for one-commit ingestion batches Adds BaseRepository.add_logs (default: loops add_log for backwards compatibility) and a real single-session/single-commit implementation on SQLModelRepository. Introduces DECNET_BATCH_SIZE (default 100) and DECNET_BATCH_MAX_WAIT_MS (default 250) so the ingester can flush on either a size or a time bound when it adopts the new method. Ingester wiring is deferred to a later pass — the single-log path was deadlocking tests when flushed during lifespan teardown, so this change ships the DB primitive alone.	2026-04-17 16:23:09 -04:00
anti	45039bd621	fix(cache): lazy-init TTL cache locks to survive event-loop turnover A module-level asyncio.Lock binds to the loop it was first awaited on. Under pytest-anyio (and xdist) each test spins up a new loop; any later test that hit /health or /config would wait on a lock owned by a dead loop and the whole worker would hang. Create the lock on first use and drop it in the test-reset helpers so a fresh loop always gets a fresh lock.	2026-04-17 16:23:00 -04:00
anti	4ea1c2ff4f	fix(health): move Docker client+ping off the event loop Under CPU saturation the sync docker.from_env()/ping() calls could miss their socket timeout, cache _docker_healthy=False, and return 503 for the full 5s TTL window. Both calls now run on a thread so the event loop keeps serving other requests while Docker is being probed.	2026-04-17 15:43:51 -04:00
anti	32340bea0d	perf: migrate hot-path JSON serialization to orjson stdlib json was FastAPI's default. Every response body, every SSE frame, and every add_log/state/payload write paid the stdlib encode cost. - pyproject.toml: add orjson>=3.10 as a core dep. - decnet/web/api.py: default_response_class=ORJSONResponse on the FastAPI app, so every endpoint return goes through orjson without touching call sites. Explicit JSONResponse sites in the validation exception handlers migrated to ORJSONResponse for consistency. - health endpoint's explicit JSONResponse → ORJSONResponse. - SSE stream (api_stream_events.py): 6 json.dumps call sites → orjson.dumps(...).decode() — the per-event frames that fire on every sse tick. - sqlmodel_repo.py: encode sites on the log-insert path switched to orjson (fields, payload, state value). Parser sites (json.loads) left as-is for now — not on the measured hot path.	2026-04-17 15:07:28 -04:00
anti	f1e14280c0	perf: 1s TTL cache for /health DB probe and /config state reads Locust hit /health and /config on every @task(3), so each request was firing repo.get_total_logs() and two repo.get_state() calls against aiosqlite — filling the driver queue for data that changes on the order of seconds, not milliseconds. Both caches follow the shape already used by the existing Docker cache: - asyncio.Lock with double-checked TTL so concurrent callers collapse into one DB hit per 1s window. - _reset_* helpers called from tests/api/conftest.py::setup_db so the module-level cache can't leak across tests. tests/test_health_config_cache.py asserts 50 concurrent callers produce exactly 1 repo call, and the cache expires after TTL.	2026-04-17 15:05:18 -04:00
anti	931f33fb06	perf: cache Docker daemon ping in /health (5s TTL) Creating a new docker.from_env() client per /health request opened a fresh unix-socket connection each time. Under load that's wasteful and hammers dockerd. Keep a module-level client + last-check timestamp; actually ping every 5 seconds, return cached state in between. Reset helper provided for tests.	2026-04-17 15:01:53 -04:00
anti	467511e997	db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL - aiomysql → asyncmy on both sides of the URL/import (faster, maintained). - Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE / PRE_PING for both SQLite and MySQL engines so stress runs can bump without code edits. - MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so concurrent uvicorn workers racing create_all() don't hit 'Table was skipped since its definition is being modified by concurrent DDL'. - sqlite & mysql repo get_log_histogram use the shared _session() helper instead of session_factory() for consistency with the rest of the repo. - SSE stream_events docstring updated to asyncmy.	2026-04-17 15:01:49 -04:00
anti	3945e72e11	perf: run bcrypt on a thread so it doesn't block the event loop verify_password / get_password_hash are CPU-bound and take ~250ms each at rounds=12. Called directly from async endpoints, they stall every other coroutine for that window — the single biggest single-worker bottleneck on the login path. Adds averify_password / ahash_password that wrap the sync versions in asyncio.to_thread. Sync versions stay put because _ensure_admin_user and tests still use them. 5 call sites updated: login, change-password, create-user, reset-password. tests/test_auth_async.py asserts parallel averify runs concurrently (~1x of a single verify, not 2x).	2026-04-17 14:52:22 -04:00
anti	bd406090a7	fix: re-seed admin password when still unfinalized (must_change_password=True) _ensure_admin_user was strict insert-if-missing: once a stale hash landed in decnet.db (e.g. from a deploy that used a different DECNET_ADMIN_PASSWORD), login silently 401'd because changing the env var later had no effect. Now on startup: if the admin still has must_change_password=True (they never finalized their own password), re-sync the hash from the current env var. Once the admin sets a real password, we leave it alone. Found via locustfile.py login storm — see tests/test_admin_seed.py. Note: this commit also bundles uncommitted pool-management work already present in sqlmodel_repo.py from prior sessions.	2026-04-17 14:49:13 -04:00

1 2 3

131 Commits