DECNET

Author	SHA1	Message	Date
anti	148e51011c	feat(swarm): agent→master heartbeat with per-host cert pinning New POST /swarm/heartbeat on the swarm controller. Workers post every ~30s with the output of executor.status(); the master bumps SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh DeckyConfig snapshot and runtime-derived state (running/degraded). Security: CA-signed mTLS alone is not sufficient — a decommissioned worker's still-valid cert could resurrect ghost shards. The endpoint extracts the presented peer cert (primary: scope["extensions"]["tls"], fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it to the SwarmHost.client_cert_fingerprint stored for the claimed host_uuid. Extraction is factored into _extract_peer_fingerprint so tests can exercise both uvicorn scope shapes and the both-unavailable fail-closed path without mocking uvicorn's TLS pipeline. Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel impl reuses the indexed client_cert_fingerprint column).	2026-04-19 21:37:15 -04:00
anti	3ebd206bca	feat(swarm): persist DeckyConfig snapshot per shard + enrich list API Dispatch now writes the full serialised DeckyConfig into DeckyShard.decky_config (plus decky_ip as a cheap extract), so the master can render the same rich per-decky card the local-fleet view uses — hostname, distro, archetype, service_config, mutate_interval, last_mutated — without round-tripping to the worker on every page render. DeckyShardView gains the corresponding fields; the repository flattens the snapshot at read time. Pre-migration rows keep working (fields fall through as None/defaults). Columns are additive + nullable so SQLModel.metadata.create_all handles the change on both SQLite and MySQL. Backfill happens organically on the next dispatch or (in a follow-up) agent heartbeat.	2026-04-19 21:29:45 -04:00
anti	14250cacad	feat(swarm): self-destruct agent on decommission Decommissioning a worker from the dashboard (or swarm controller) now asks the agent to wipe its own install before the master forgets it. The agent stops decky containers + every decnet-* systemd unit, then deletes /opt/decnet, /etc/systemd/system/decnet-, /var/lib/decnet/, and /usr/local/bin/decnet. Logs under /var/log are preserved. The reaper runs as a detached /tmp script (start_new_session=True) so it survives the agent process being killed. Self-destruct dispatch is best-effort — a dead worker doesn't block master-side cleanup.	2026-04-19 20:47:09 -04:00
anti	9d68bb45c7	feat(web): async teardowns — 202 + background task, UI allows parallel queue Teardowns were synchronous all the way through: POST blocked on the worker's docker-compose-down cycle (seconds to minutes), the frontend locked tearingDown to a single string so only one button could be armed at a time, and operators couldn't queue a second teardown until the first returned. On a flaky worker that meant staring at a spinner for the whole RTT. Backend: POST /swarm/hosts/{uuid}/teardown returns 202 the instant the request is validated. Affected shards flip to state='tearing_down' synchronously before the response so the UI reflects progress immediately, then the actual AgentClient call + DB cleanup run in an asyncio.create_task (tracked in a module-level set to survive GC and to be drainable by tests). On failure the shard flips to 'teardown_failed' with the error recorded — nothing is re-raised, since there's no caller to catch it. Frontend: swap tearingDown / decommissioning from 'string \| null' to 'Set<string>'. Each button tracks its own in-flight state; the poll loop picks up the final shard state from the backend. Multiple teardowns can now be queued without blocking each other.	2026-04-19 20:30:56 -04:00
anti	07ec4bc269	fix(fleet): INI fully replaces prior decky state on redeploy Submitting an INI with a single [decky1] was silently redeploying the deckies from the previous deploy too. POST /deckies/deploy merged the new INI into the stored DecnetConfig by name, so a 1-decky INI on top of a prior 3-decky run still pushed 3 deckies to the worker. Those stale decky2/decky3 kept their old IPs, collided on the parent NIC, and the agent failed with 'Address already in use' — the deploy the operator never asked for. The INI is the source of truth for which deckies exist this deploy. Full replace: config.deckies = list(new_decky_configs). Operators who want to add more deckies should list them all in the INI. Update the deploy-limit test to reflect the new replace semantics, and add a regression test asserting prior state is dropped.	2026-04-19 20:24:29 -04:00
anti	df18cb44cc	fix(swarm): don't paint healthy deckies as failed when a shard-sibling fails docker compose up is partial-success-friendly — a build failure on one service doesn't roll back the others. But the master was catching the agent's 500 and tagging every decky in the shard as 'failed' with the same error message. From the UI that looked like all three deckies died even though two were live on the worker. On dispatch exception, probe the agent's /status to learn which deckies actually have running containers, and upsert per-decky state accordingly. Only fall back to marking the whole shard failed if the status probe itself is unreachable. Enhance agent.executor.status() to include a 'runtime' map keyed by decky name with per-service container state, so the master has something concrete to consult.	2026-04-19 20:11:08 -04:00
anti	e8e11b2896	feat(web-ui): show decky IP on SwarmDeckies, drop compose-hash column Operators want to know what address to poke when triaging a swarm decky; the compose-hash column was debug scaffolding that never paid off. DeckyShard has no IP column (the deploy-time IP lives on DecnetConfig), so the list endpoint resolves it at read time by joining shards against the stored deployment state by decky_name. Missing lookups render as "—" rather than erroring — the list stays useful even after a master restart that hasn't persisted a config yet.	2026-04-19 19:48:27 -04:00
anti	5dad1bb315	feat(swarm): remote teardown API + UI (per-decky and per-host) Agents already exposed POST /teardown; the master was missing the plumbing to reach it. Add: - POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body {decky_id: str\|null}: null tears the whole host, a value tears one decky. On worker failure the master returns 502 and leaves DB shards intact so master and agent stay aligned. - BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky cleanup after a single-decky teardown. - SwarmHosts page: "Teardown all" button (keeps host enrolled). - SwarmDeckies page: per-row "Teardown" button. Also exclude setuptools' build/ staging dir from the enrollment tarball — `pip install -e` on the master generates build/lib/decnet_web/node_modules and the bundle walker was leaking it to agents. Align pyproject's bandit exclude with the git-hook invocation so both skip decnet/templates/.	2026-04-19 19:39:28 -04:00
anti	2bef3edb72	feat(swarm): unbundle master-only code from agent tarball + sync systemd units on update Agents now ship with collector/prober/sniffer as systemd services; mutator, profiler, web, and API stay master-only (profiler rebuilds attacker profiles against the master DB — no per-host DB exists). Expand _EXCLUDES to drop the full decnet/web, decnet/mutator, decnet/profiler, and decnet_web trees from the enrollment bundle. Updater now calls _heal_path_symlink + _sync_systemd_units after rotation so fleets pick up new unit files and /usr/local/bin/decnet tracks the shared venv without a manual reinstall. daemon-reload runs once per update when any unit changed. Fix _service_registry matchers to accept systemd-style /usr/local/bin/decnet cmdlines (psutil returns a list — join to string before substring-checking) so agent-mode `decnet status` reports collector/prober/sniffer correctly.	2026-04-19 19:19:17 -04:00
anti	6d7877c679	feat(swarm): per-host microservices as systemd units, mutator off agents Previously `decnet status` on an agent showed every microservice as DOWN because deploy's auto-spawn was unihost-scoped and the agent CLI gate hid the per-host commands. Now: - collect, probe, profiler, sniffer drop out of MASTER_ONLY_COMMANDS (they run per-host; master-side work stays master-gated). - mutate stays master-only (it orchestrates swarm-wide respawns). - decnet/mutator/ excluded from agent tarballs — never invoked there. - decnet/web exclusion tightened: ship db/ + auth.py + dependencies.py (profiler needs the repo singleton), drop api.py, swarm_api.py, ingester.py, router/, templates/. - Four new systemd unit templates (decnet-collector/prober/profiler/ sniffer) shipped in every enrollment tarball. - enroll_bootstrap.sh enables + starts all four alongside agent and forwarder at install time. - updater restarts the aux units on code push so they pick up the new release (best-effort — legacy enrollments without the units won't fail the update). - status table hides Mutator + API rows in agent mode.	2026-04-19 18:58:48 -04:00
anti	ee9ade4cd5	feat(enroll): strip master API and frontend from agent tarball Agents never run the FastAPI master app (decnet/web/) or serve the React frontend (decnet_web/) — they run decnet.agent, decnet.updater, and decnet.forwarder, none of which import decnet.web. Shipping the master tree bloats every enrollment payload and needlessly widens the worker's attack surface. Excluded paths are unreachable on the worker (all cli.py imports of decnet.web are inside master-only command bodies that the agent-mode gate strips). Tests assert neither tree leaks into the tarball.	2026-04-19 18:47:03 -04:00
anti	a0a241f65d	feat(enroll): decnet-updater now runs under systemd, not a --daemon fork Bootstrap used to end with `decnet updater --daemon` which forks and detaches — invisible to systemctl, no auto-restart, dies on reboot. Ships a decnet-updater.service template matching the pattern of the other units (Restart=on-failure, log to /var/log/decnet/decnet.updater.log, certs from /etc/decnet/updater, install tree at /opt/decnet), bundles it alongside agent/forwarder/engine units, and the installer now `systemctl enable --now`s it when --with-updater is set.	2026-04-19 18:19:24 -04:00
anti	5df995fda1	feat(enroll): opt-in IPvlan per-agent for Wi-Fi-bridged VMs Wi-Fi APs bind one MAC per associated station, so VirtualBox/VMware guests bridged over Wi-Fi rotate the VM's DHCP lease when Docker's macvlan starts emitting container-MAC frames through the vNIC. Adds a `use_ipvlan` toggle on the Agent Enrollment tab (mirrors the updater daemon checkbox): flips the flag on SwarmHost, bakes `ipvlan=true` into the agent's decnet.ini, and `_worker_config` forces ipvlan=True on the per-host shard at dispatch. Safe no-op on wired/bare-metal agents.	2026-04-19 17:57:45 -04:00
anti	6d7567b6bb	fix(fleet): reset stale host_uuid on carried-over deckies before dispatch Deckies merged in from a prior deployment's saved state kept their original host_uuid — which dispatch_decnet_config then 404'd on if that host had since been decommissioned or re-enrolled at a different uuid. Before round-robin assignment, drop any host_uuid that isn't in the live swarm_hosts set so orphaned entries get reassigned instead of exploding with 'unknown host_uuid'.	2026-04-19 06:27:34 -04:00
anti	dbaccde143	fix(swarm-updates): offload tarball build to worker thread tar_working_tree (walks repo + gzips several MB) and detect_git_sha (shells out) were called directly on the event loop, so /swarm-updates/push and /swarm-updates/push-self froze every other request until the tarball was ready. Wrap both in asyncio.to_thread.	2026-04-19 06:21:27 -04:00
anti	79db999030	feat(fleet): auto-swarm deploy — shard across enrolled workers when master POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence: when the caller is a master with at least one reachable swarm host, round- robin host_uuids are assigned over new deckies and the config is dispatched via AgentClient. Falls back to local docker-compose otherwise. Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config so both endpoints share the same shard/dispatch/persist path. Adds GET /system/deployment-mode for the UI to show 'will shard across N hosts' vs 'will deploy locally' before the operator clicks deploy.	2026-04-19 06:09:08 -04:00
anti	cb1a1d1270	fix(fleet): defer DecnetConfig build until deckies are expanded Stateless /api/v1/deckies/deploy previously instantiated DecnetConfig with deckies=[] so it could merge entries later — but DecnetConfig.deckies is min_length=1, so Pydantic raised and the global handler mapped it to 422 'Internal data consistency error'. Construct the config after build_deckies_from_ini returns at least one DeckyConfig.	2026-04-19 06:02:26 -04:00
anti	899ea559d9	feat(enroll): systemd units for agent/forwarder/engine + log-directory INI key Rename log-file-path -> log-directory (maps to DECNET_LOG_DIRECTORY). Bundle now ships three systemd units rendered with agent_name/master_host and installs them into /etc/systemd/system/. Bootstrap replaces direct 'decnet X --daemon' calls with systemctl enable --now. Each unit pins DECNET_SYSTEM_LOGS so agent, forwarder, and deckies logs land at decnet.{agent,forwarder}.log and decnet.log under /var/log/decnet.	2026-04-19 05:46:08 -04:00
anti	e67b6d7f73	refactor(swarm-mgmt): move agent/updater certs to /etc/decnet (root-owned)	2026-04-19 05:32:39 -04:00
anti	bc5f43c3f7	feat(swarm-mgmt): probe-on-read for GET /swarm/hosts heartbeat + status	2026-04-19 05:26:35 -04:00
anti	ff4c993617	refactor(swarm-mgmt): backfill host address from agent's .tgz source IP	2026-04-19 05:20:29 -04:00
anti	e32fdf9cbf	feat(swarm-mgmt): agent_host + updater opt-in; prevent duplicate forwarder spawn	2026-04-19 05:12:55 -04:00
anti	95ae175e1b	fix(swarm-mgmt): exclude .env from bundle, chmod +x decnet, mkdir log	2026-04-19 04:58:55 -04:00
anti	b4df9ea0a1	fix(swarm-mgmt): bundle URLs target master_host, not dashboard base_url	2026-04-19 04:52:20 -04:00
anti	c6f7de30d2	feat(swarm-mgmt): agent enrollment bundle flow + admin swarm endpoints	2026-04-19 04:25:57 -04:00
anti	a266d6b17e	feat(web): Remote Updates API — dashboard endpoints for pushing code to workers Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind require_admin. Reuses the existing UpdaterClient + tar_working_tree + the per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is built exactly once per /push request and fanned out to every selected worker. /hosts filters out decommissioned hosts and agent-only enrollments (no updater bundle = not a target). Connection drops during /update-self are treated as success — the updater re-execs itself mid-response, so httpx always raises. Pydantic models live in decnet/web/db/models.py (single source of truth). 24 tests cover happy paths, rollback, transport failures, include_self ordering (skip on rolled-back agents), validation, and RBAC gating.	2026-04-19 01:01:09 -04:00
anti	7765b36c50	feat(updater): remote self-update daemon with auto-rollback Adds a separate `decnet updater` daemon on each worker that owns the agent's release directory and installs tarball pushes from the master over mTLS. A normal `/update` never touches the updater itself, so the updater is always a known-good rescuer if a bad agent push breaks /health — the rotation is reversed and the agent restarted against the previous release. `POST /update-self` handles updater upgrades explicitly (no auto-rollback). - decnet/updater/: executor, FastAPI app, uvicorn launcher - decnet/swarm/updater_client.py, tar_tree.py: master-side push - cli: `decnet updater`, `decnet swarm update [--host\|--all] [--include-self] [--dry-run]`, `--updater` on `swarm enroll` - enrollment API issues a second cert (CN=updater@<host>) signed by the same CA; SwarmHost records updater_cert_fingerprint - tests: executor, app, CLI, tar tree, enroll-with-updater (37 new) - wiki: Remote-Updates page + sidebar + SWARM-Mode cross-link	2026-04-18 21:40:21 -04:00
anti	8914c27220	feat(swarm): add `decnet swarm deckies` to list deployed shards by host `swarm list` only shows enrolled workers — there was no way to see which deckies are running and where. Adds GET /swarm/deckies on the controller (joins DeckyShard with SwarmHost for name/address/status) plus the CLI wrapper with --host / --state filters and --json.	2026-04-18 21:10:07 -04:00
anti	e2d6f857b5	refactor(swarm): move router DTOs into decnet/web/db/models.py _schemas.py was a local exception to the codebase convention. The rest of the app keeps all API request/response DTOs in decnet/web/db/models.py alongside UserResponse, DeployIniRequest, etc. — the swarm endpoints now follow the same convention (SwarmEnrollRequest, SwarmHostView, etc). Deletes decnet/web/router/swarm/_schemas.py.	2026-04-18 19:28:15 -04:00
anti	811136e600	refactor(swarm): one file per endpoint, matching existing router layout Splits the three grouped router files into eight api_<verb>_<resource>.py modules under decnet/web/router/swarm/ to match the convention used by router/fleet/ and router/config/. Shared request/response models live in _schemas.py. Keeps each endpoint easy to locate and modify without stepping on siblings.	2026-04-18 19:23:06 -04:00
anti	63b0a58527	feat(swarm): master-side SWARM controller (swarmctl) + agent CLI Adds decnet/web/swarm_api.py as an independent FastAPI app with routers for host enrollment, deployment dispatch (sharding DecnetConfig across enrolled workers via AgentClient), and active health probing. Runs as its own uvicorn subprocess via 'decnet swarmctl', mirroring the isolation pattern used by 'decnet api'. Also wires up 'decnet agent' CLI entry for the worker side. 29 tests added under tests/swarm/test_swarm_api.py cover enrollment (including bundle generation + duplicate rejection), host CRUD, sharding correctness, non-swarm-mode rejection, teardown, and health probes with a stubbed AgentClient.	2026-04-18 19:18:33 -04:00
anti	41fd496128	feat(web): attacker artifacts endpoint + UI drawer Adds the server-side wiring and frontend UI to surface files captured by the SSH honeypot for a given attacker. - New repository method get_attacker_artifacts (abstract + SQLModel impl) that joins the attacker's IP to `file_captured` log rows. - New route GET /attackers/{uuid}/artifacts. - New router /artifacts/{decky}/{service}/{stored_as} that streams a quarantined file back to an authenticated viewer. - AttackerDetail grows an ArtifactDrawer panel with per-file metadata (sha256, size, orig_path) and a download action. - ssh service fragment now sets NODE_NAME=decky_name so logs and the host-side artifacts bind-mount share the same decky identifier.	2026-04-18 05:36:48 -04:00
anti	e967aaabfb	perf: cache get_user_by_username on the login hot path Locust @task(2) hammers /auth/login in steady state on top of the on_start burst. After caching the uuid-keyed user lookup and every other read endpoint, login alone accounted for 47% of total _execute at 500c/u — pure DB queueing on SELECT users WHERE username=?. 5s TTL, positive hits only (misses bypass so a freshly-created user can log in immediately). Password verify still runs against the cached hash, so security is unchanged — the only staleness window is: a changed password accepts the old password for up to 5s until invalidate_user_cache fires (it's called on every write).	2026-04-17 20:36:39 -04:00
anti	255c2e5eb7	perf: cache auth user-lookup and admin list_users The per-request SELECT users WHERE uuid=? in require_role was the hidden tax behind every authed endpoint — it kept _execute at ~60% across the profile even after the page caches landed. Even /health (with its DB and Docker probes cached) was still 52% _execute from this one query. - dependencies.py: 10s TTL cache on get_user_by_uuid, well below JWT expiry. invalidate_user_cache(uuid) is called on password change, role change, and user delete. - api_get_config.py: 5s TTL cache on the admin branch's list_users() (previously fetched every /config call). Invalidated on user create/update/delete. - api_change_pass.py + api_manage_users.py: invalidation hooks on all user-mutating endpoints.	2026-04-17 19:56:39 -04:00
anti	2dd86fb3bb	perf: cache /bounty, /logs/histogram, /deckies; bump /config TTL to 5s Round-2 follow-up: profile at 500c/u showed _execute still dominating the uncached read endpoints (/bounty 76%, /logs/histogram 73%, /deckies 56%). Same router-level TTL pattern as /stats — 5s window, asyncio.Lock to collapse concurrent calls into one DB hit. - /bounty: cache default unfiltered page (limit=50, offset=0, bounty_type=None, search=None). Filtered requests bypass. - /logs/histogram: cache default (interval_minutes=15, no filters). Filtered / non-default interval requests bypass. - /deckies: cache full response (endpoint takes no params). - /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare, 1s was too short for bursts to coalesce at high concurrency.	2026-04-17 19:30:11 -04:00
anti	6301504c0e	perf(api): TTL-cache /stats + unfiltered pagination counts Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT attacker_ip) FROM logs; every /logs and /attackers call ran an unfiltered count for the paginator. At 500 concurrent users these serialize through aiosqlite's worker threads and dominate wall time. Cache at the router layer (repo stays dialect-agnostic): - /stats response: 5s TTL - /logs total (only when no filters): 2s TTL - /attackers total (only when no filters): 2s TTL Filtered paths bypass the cache. Pattern reused from api_get_config and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).	2026-04-17 19:09:15 -04:00
anti	b5d7bf818f	feat(health): 3-tier status (healthy / degraded / unhealthy) Only database, docker, and ingestion_worker now count as critical (→ 503 unhealthy). attacker/sniffer/collector failures drop overall status to degraded (still 200) so the dashboard doesn't panic when a non-essential worker isn't running.	2026-04-17 17:48:42 -04:00
anti	45039bd621	fix(cache): lazy-init TTL cache locks to survive event-loop turnover A module-level asyncio.Lock binds to the loop it was first awaited on. Under pytest-anyio (and xdist) each test spins up a new loop; any later test that hit /health or /config would wait on a lock owned by a dead loop and the whole worker would hang. Create the lock on first use and drop it in the test-reset helpers so a fresh loop always gets a fresh lock.	2026-04-17 16:23:00 -04:00
anti	4ea1c2ff4f	fix(health): move Docker client+ping off the event loop Under CPU saturation the sync docker.from_env()/ping() calls could miss their socket timeout, cache _docker_healthy=False, and return 503 for the full 5s TTL window. Both calls now run on a thread so the event loop keeps serving other requests while Docker is being probed.	2026-04-17 15:43:51 -04:00
anti	32340bea0d	perf: migrate hot-path JSON serialization to orjson stdlib json was FastAPI's default. Every response body, every SSE frame, and every add_log/state/payload write paid the stdlib encode cost. - pyproject.toml: add orjson>=3.10 as a core dep. - decnet/web/api.py: default_response_class=ORJSONResponse on the FastAPI app, so every endpoint return goes through orjson without touching call sites. Explicit JSONResponse sites in the validation exception handlers migrated to ORJSONResponse for consistency. - health endpoint's explicit JSONResponse → ORJSONResponse. - SSE stream (api_stream_events.py): 6 json.dumps call sites → orjson.dumps(...).decode() — the per-event frames that fire on every sse tick. - sqlmodel_repo.py: encode sites on the log-insert path switched to orjson (fields, payload, state value). Parser sites (json.loads) left as-is for now — not on the measured hot path.	2026-04-17 15:07:28 -04:00
anti	f1e14280c0	perf: 1s TTL cache for /health DB probe and /config state reads Locust hit /health and /config on every @task(3), so each request was firing repo.get_total_logs() and two repo.get_state() calls against aiosqlite — filling the driver queue for data that changes on the order of seconds, not milliseconds. Both caches follow the shape already used by the existing Docker cache: - asyncio.Lock with double-checked TTL so concurrent callers collapse into one DB hit per 1s window. - _reset_* helpers called from tests/api/conftest.py::setup_db so the module-level cache can't leak across tests. tests/test_health_config_cache.py asserts 50 concurrent callers produce exactly 1 repo call, and the cache expires after TTL.	2026-04-17 15:05:18 -04:00
anti	931f33fb06	perf: cache Docker daemon ping in /health (5s TTL) Creating a new docker.from_env() client per /health request opened a fresh unix-socket connection each time. Under load that's wasteful and hammers dockerd. Keep a module-level client + last-check timestamp; actually ping every 5 seconds, return cached state in between. Reset helper provided for tests.	2026-04-17 15:01:53 -04:00
anti	467511e997	db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL - aiomysql → asyncmy on both sides of the URL/import (faster, maintained). - Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE / PRE_PING for both SQLite and MySQL engines so stress runs can bump without code edits. - MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so concurrent uvicorn workers racing create_all() don't hit 'Table was skipped since its definition is being modified by concurrent DDL'. - sqlite & mysql repo get_log_histogram use the shared _session() helper instead of session_factory() for consistency with the rest of the repo. - SSE stream_events docstring updated to asyncmy.	2026-04-17 15:01:49 -04:00
anti	3945e72e11	perf: run bcrypt on a thread so it doesn't block the event loop verify_password / get_password_hash are CPU-bound and take ~250ms each at rounds=12. Called directly from async endpoints, they stall every other coroutine for that window — the single biggest single-worker bottleneck on the login path. Adds averify_password / ahash_password that wrap the sync versions in asyncio.to_thread. Sync versions stay put because _ensure_admin_user and tests still use them. 5 call sites updated: login, change-password, create-user, reset-password. tests/test_auth_async.py asserts parallel averify runs concurrently (~1x of a single verify, not 2x).	2026-04-17 14:52:22 -04:00
anti	9b59f8672e	chores: cleanup; added: viteconfig	2026-04-16 02:09:30 -04:00
anti	89099b903d	fix: resolve schemathesis and live test failures - Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode) - Add 400 response to all endpoints accepting JSON bodies (malformed input) - Add required 'title' field to schemathesis.toml for schemathesis 4.15+ - Add xdist_group markers to live tests with module-scoped fixtures to prevent xdist from distributing them across workers (fixture isolation)	2026-04-16 01:39:04 -04:00
anti	29578d9d99	fix: resolve all ruff and bandit lint/security issues - Remove unused Optional import (F401) in telemetry.py - Move imports above module-level code (E402) in web/db/models.py - Default API/web hosts to 127.0.0.1 instead of 0.0.0.0 (B104) - Add usedforsecurity=False to MD5 calls in JA3/HASSH fingerprinting (B324) - Annotate intentional try/except/pass blocks with nosec (B110) - Remove stale nosec comments that no longer suppress anything	2026-04-16 01:04:57 -04:00
anti	70d8ffc607	feat: complete OTEL tracing across all services with pipeline bridge and docs Extends tracing to every remaining module: all 23 API route handlers, correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp), profiler behavioral analysis, logging subsystem, engine, and mutator. Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords. Includes development/docs/TRACING.md with full span reference (76 spans), pipeline propagation architecture, quick start guide, and troubleshooting.	2026-04-16 00:58:08 -04:00
anti	c8f05df4d9	feat: overhaul behavioral profiler — multi-tool detection, improved classification, TTL OS fallback	2026-04-15 15:47:02 -04:00
anti	314e6c6388	fix: remove event-loop-blocking cold start; unify profiler to cursor-based incremental Cold start fetched all logs in one bulk query then processed them in a tight synchronous loop with no yields, blocking the asyncio event loop for seconds on datasets of 30K+ rows. This stalled every concurrent await — including the SSE stream generator's initial DB calls — causing the dashboard to show INITIALIZING SENSORS indefinitely. Changes: - Drop _cold_start() and get_all_logs_raw(); uninitialized state now runs the same cursor loop as incremental, starting from last_log_id=0 - Yield to the event loop after every _BATCH_SIZE rows (asyncio.sleep(0)) - Add SSE keepalive comment as first yield so the connection flushes before any DB work begins - Add Cache-Control/X-Accel-Buffering headers to StreamingResponse	2026-04-15 13:46:42 -04:00

1 2

68 Commits