Adds /api/v1/swarm-updates/{hosts,push,push-self,rollback} behind
require_admin. Reuses the existing UpdaterClient + tar_working_tree + the
per-host asyncio.gather pattern from api_deploy_swarm.py; tarball is
built exactly once per /push request and fanned out to every selected
worker. /hosts filters out decommissioned hosts and agent-only
enrollments (no updater bundle = not a target).
Connection drops during /update-self are treated as success — the
updater re-execs itself mid-response, so httpx always raises.
Pydantic models live in decnet/web/db/models.py (single source of
truth). 24 tests cover happy paths, rollback, transport failures,
include_self ordering (skip on rolled-back agents), validation, and
RBAC gating.
Adds the server-side wiring and frontend UI to surface files captured
by the SSH honeypot for a given attacker.
- New repository method get_attacker_artifacts (abstract + SQLModel
impl) that joins the attacker's IP to `file_captured` log rows.
- New route GET /attackers/{uuid}/artifacts.
- New router /artifacts/{decky}/{service}/{stored_as} that streams a
quarantined file back to an authenticated viewer.
- AttackerDetail grows an ArtifactDrawer panel with per-file metadata
(sha256, size, orig_path) and a download action.
- ssh service fragment now sets NODE_NAME=decky_name so logs and the
host-side artifacts bind-mount share the same decky identifier.
The per-request SELECT users WHERE uuid=? in require_role was the
hidden tax behind every authed endpoint — it kept _execute at ~60%
across the profile even after the page caches landed. Even /health
(with its DB and Docker probes cached) was still 52% _execute from
this one query.
- dependencies.py: 10s TTL cache on get_user_by_uuid, well below JWT
expiry. invalidate_user_cache(uuid) is called on password change,
role change, and user delete.
- api_get_config.py: 5s TTL cache on the admin branch's list_users()
(previously fetched every /config call). Invalidated on user
create/update/delete.
- api_change_pass.py + api_manage_users.py: invalidation hooks on
all user-mutating endpoints.
Round-2 follow-up: profile at 500c/u showed _execute still dominating
the uncached read endpoints (/bounty 76%, /logs/histogram 73%,
/deckies 56%). Same router-level TTL pattern as /stats — 5s window,
asyncio.Lock to collapse concurrent calls into one DB hit.
- /bounty: cache default unfiltered page (limit=50, offset=0,
bounty_type=None, search=None). Filtered requests bypass.
- /logs/histogram: cache default (interval_minutes=15, no filters).
Filtered / non-default interval requests bypass.
- /deckies: cache full response (endpoint takes no params).
- /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare,
1s was too short for bursts to coalesce at high concurrency.
Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT
attacker_ip) FROM logs; every /logs and /attackers call ran an
unfiltered count for the paginator. At 500 concurrent users these
serialize through aiosqlite's worker threads and dominate wall time.
Cache at the router layer (repo stays dialect-agnostic):
- /stats response: 5s TTL
- /logs total (only when no filters): 2s TTL
- /attackers total (only when no filters): 2s TTL
Filtered paths bypass the cache. Pattern reused from api_get_config
and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).
Only database, docker, and ingestion_worker now count as critical
(→ 503 unhealthy). attacker/sniffer/collector failures drop overall
status to degraded (still 200) so the dashboard doesn't panic when a
non-essential worker isn't running.
Adds BaseRepository.add_logs (default: loops add_log for backwards
compatibility) and a real single-session/single-commit implementation
on SQLModelRepository. Introduces DECNET_BATCH_SIZE (default 100) and
DECNET_BATCH_MAX_WAIT_MS (default 250) so the ingester can flush on
either a size or a time bound when it adopts the new method.
Ingester wiring is deferred to a later pass — the single-log path was
deadlocking tests when flushed during lifespan teardown, so this change
ships the DB primitive alone.
Locust hit /health and /config on every @task(3), so each request was
firing repo.get_total_logs() and two repo.get_state() calls against
aiosqlite — filling the driver queue for data that changes on the order
of seconds, not milliseconds.
Both caches follow the shape already used by the existing Docker cache:
- asyncio.Lock with double-checked TTL so concurrent callers collapse
into one DB hit per 1s window.
- _reset_* helpers called from tests/api/conftest.py::setup_db so the
module-level cache can't leak across tests.
tests/test_health_config_cache.py asserts 50 concurrent callers
produce exactly 1 repo call, and the cache expires after TTL.
verify_password / get_password_hash are CPU-bound and take ~250ms each
at rounds=12. Called directly from async endpoints, they stall every
other coroutine for that window — the single biggest single-worker
bottleneck on the login path.
Adds averify_password / ahash_password that wrap the sync versions in
asyncio.to_thread. Sync versions stay put because _ensure_admin_user and
tests still use them.
5 call sites updated: login, change-password, create-user, reset-password.
tests/test_auth_async.py asserts parallel averify runs concurrently (~1x
of a single verify, not 2x).
Root cause: test_schemathesis.py mutates decnet.web.auth.SECRET_KEY at
module-level import time, poisoning JWT verification for all other tests
in the same process — even when fuzz tests are deselected.
- Add pytest_ignore_collect hook in tests/api/conftest.py to skip
collecting test_schemathesis.py unless -m fuzz is selected
- Add --dist loadscope to addopts so xdist groups by module (protects
module-scoped fixtures in live tests)
- Remove now-unnecessary xdist_group markers from live test classes
The live test modules set DECNET_CONTRACT_TEST=true at module level,
which persisted across xdist workers and caused the mutate endpoint
to short-circuit before the mock was reached. Clear the env var in
affected tests with monkeypatch.delenv.
- Fixed CLI tests by patching local imports at source (psutil, os, Path).
- Fixed Collector tests by globalizing docker.from_env mock.
- Stabilized SSE stream tests via AsyncMock and immediate generator termination to prevent hangs.
- Achieved >80% coverage on CLI (84%), Collector (97%), and DB Repository (100%).
- Implemented SMTP Relay service tests (100%).
- Rebuild repo.engine and repo.session_factory per-test using unique
in-memory SQLite URIs — fixes KeyError: 'access_token' caused by
stale session_factory pointing at production DB
- Add @pytest.mark.fuzz to all Hypothesis and Schemathesis tests;
default run excludes them (addopts = -m 'not fuzz')
- Add missing fuzz tests to bounty, fleet, histogram, and repository
- Use tmp_path for state file in patch_state_file/mock_state_file to
eliminate file-path race conditions under xdist parallelism
- Set default addopts: -v -q -x -n logical (26 tests in ~7s)