Commit Graph

1062 Commits

Author SHA1 Message Date
9e8d0b0464 fix(ui): route palette drops + design-time remove through live API on active topologies
When topoStatus is active/degraded, editor.updateDecky enqueues into
the mutator queue and returns {kind:'enqueued'}.  The palette-drop
handler then short-circuits on that and never updates local state, so
a service dragged onto a deployed decky just vanishes — what ANTI saw
as 'no way to APPLY'.

Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's
service detail panel: enqueue + no local update = chip stays.

Both now route through liveAddService / liveRemoveService when the
topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services
directly and patching local state from the response.  Pending
topologies still queue through the mutator (correct: no live
containers to mutate).

Hoisted serviceRegistry / liveAddService / liveRemoveService above
the palette-drop callback so the deps array doesn't trip the const
TDZ at render time.
2026-04-28 23:38:37 -04:00
463877b8fc fix(ui): hit /topologies/ with trailing slash to keep bearer
FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and
the browser drops Authorization on the redirected URL — the topology
picker in the canary create modal was landing as 401 even for admins.
Hit the canonical (trailing-slash) path so the request resolves on the
first hop.
2026-04-28 23:18:39 -04:00
0e5484648f feat: forward decky.*.service.* on per-topology SSE stream
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:

* topology.{id}.>  — lifecycle (status, mutation.*) — unchanged.
* decky.>          — per-decky events, filtered by payload.topology_id
                     so a fleet decky sharing a name with a topology
                     decky doesn't leak across.

_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').

useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
2026-04-28 23:15:38 -04:00
e7d49d7237 feat(ui): live service add/remove on fleet DeckyCard
DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we
just shipped on the MazeNET Inspector.  Wired to POST/DELETE
/api/v1/deckies/{name}/services{,/svc}; the response's services list
flows back through onServicesChanged to update the parent's deckies
state without a refetch.

Gated on isAdmin && !decky.swarm — swarm deckies live on a remote
agent and the W3 endpoint runs docker compose locally, same gap as
the canary planter has for agent-pinned topologies.  Out of scope
here; flagged as a known limitation.

stopPropagation on the inline buttons + add-row container keeps the
card-level click (which selects the decky for inspection) from firing
on intra-row interactions.
2026-04-28 23:13:46 -04:00
1a631c9400 fix(ui): narrow services type for Inspector live-add picker
ObservedNode.services is the literal tuple ['*']; narrowing inside the
.filter() callback was tripping TS2345.  We already gate the live
controls on node.kind !== 'observed', so casting to readonly string[]
inside the filter is safe and keeps the discriminated union strict
elsewhere.
2026-04-28 23:11:39 -04:00
2fabcd1c29 feat(ui): live service add/remove on MazeNET Inspector
When the topology is active/degraded the Inspector switches services
chips into live controls: each chip gets a × button that DELETEs to
the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker
fed by useServiceRegistry().perDecky.

Pending topologies still use the existing design-time path
(onRemoveService → editor.updateDecky); the Inspector picks based on
topologyStatus, so an operator never accidentally hits a live API
call against a topology that isn't deployed yet.

The mutation handlers in MazeNET.tsx hit POST/DELETE
/api/v1/topologies/{id}/deckies/{name}/services{,/svc} and
optimistically apply the response's services list to local state.
Cross-tab reconciliation rides on the SSE forwarder shipped in the
follow-up commit.
2026-04-28 23:11:02 -04:00
06f208c86e feat: surface fleet_singleton flag on /topologies/services
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).

The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.

decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
2026-04-28 23:08:29 -04:00
4287e94deb feat(ui): file drops tab on CanaryTokens
CanaryTokens.tsx grows a third tab — File drops — alongside Tokens
and Blobs.  The page now covers every 'admin landed bytes on a decky'
operation in one place.

FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET
toggle, topology+decky picker, absolute-path validation matching the
backend (DeckyFileDropRequest rejects relative + ..-traversal), mode
+ mtime offset inputs, and a -1w preset for backdating.  FileReader →
data URL → strip prefix → POST /api/v1/deckies/files.

The list is local-only (localStorage, capped at 200 entries).  W2's
backend doesn't persist drops by design — the endpoint is for staging
payloads, not as an audit trail.  CLEAR LIST button on the tab; no
DELETE button on rows since the local entry doesn't track whether the
file is still there (an attacker may have moved it).

Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.
2026-04-28 23:06:53 -04:00
c942d4d333 feat(ui): scope canary tokens to MazeNET topology deckies
CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal.  In
topology mode we hydrate /topologies?status=active for the topology
picker, then GET /topologies/{id} on selection to repopulate the decky
picker — topology deckies have a different shape than fleet's /deckies
endpoint.

The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'),
and a third filter dropdown alongside state.  The drawer's metadata
section shows a Scope row with a clickable jump-link back to the
MazeNET view at the right topology.

CanaryTokenRow grows a topology_id field so the drawer/list can
discriminate without re-fetching.
2026-04-28 23:04:13 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
0bc4b05c73 feat(deckies): generic file drops on fleet + MazeNET deckies
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.

Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.

Path validation rejects relative paths and '..' traversal at the request
model layer.  Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
2026-04-28 22:43:34 -04:00
3fe999d706 feat(canary): allow custom canaries on MazeNET deckies via API
POST /api/v1/canary/tokens grows an optional topology_id field.  When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container.  Absent ⇒ fleet semantics, unchanged.

The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy).  GET /api/v1/canary/tokens accepts ?topology_id= as
a filter.  DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.

422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
2026-04-28 22:34:45 -04:00
5802de1f86 feat(canary): seed baseline canaries on MazeNET deckies
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.

The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
2026-04-28 22:30:11 -04:00
04b0637c24 feat(bounty): wire artifact download into BountyInspector drawer
The Vault page already shows file drops and stored mail (e3ddeb0) but
the inspector drawer had no download button — only the live-feed
ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD
RAW action to BountyInspector that fires when bounty_type=artifact,
hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's
own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling
and 400/403/404 error mapping.

Also widen the icon/label vocabulary: artifact bounties get FileText
(file drops) or Mail (message_stored) instead of the generic Package,
and the inspector header chip mirrors the change.
2026-04-28 22:03:58 -04:00
e3ddeb0395 feat(bounty): surface file drops and stored mail in the Vault
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.

Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.

Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
2026-04-28 19:42:54 -04:00
88f276e9e7 feat(collector): drop native unix daemon syslog from ingestion
sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon
all share the SSH/telnet decky containers and write to the same syslog
socket as DECNET's own emitters. Their output was being parsed and
ingested into the JSON stream, the dashboard, and the profiler — pure
noise: sshd's "Failed password for root from X" duplicates the
auth-helper's structured auth_attempt event, pam_unix repeats it again,
CRON/systemd say nothing about attacker behavior.

Drop these APP-NAMEs in _should_ingest before the JSON write and bus
publish. Raw .log file still captures everything for forensics. The
denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators
can extend it without code changes.
2026-04-28 19:21:39 -04:00
6055f9c837 fix(deckies): set MSGID=command on bash PROMPT_COMMAND syslog lines
Add --rfc5424 --msgid command to the logger invocation in SSH and telnet
decky bashrc. MSGID arrives as "command" instead of NIL, which is what
the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic
shipped in d4591b3 stays as a safety net for any future emitter that
forgets the flags or for inflight pre-rebuild containers.
2026-04-28 19:12:11 -04:00
d4591b38dc fix(profiler): aggregate bash PROMPT_COMMAND lines into attacker profile
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".

Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.
2026-04-28 19:09:41 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
8033137be6 dev(ci): simplified pipeline
All checks were successful
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 14s
CI / SAST (bandit) (push) Successful in 22s
CI / Dependency audit (pip-audit) (push) Successful in 25s
CI / Merge dev → testing (push) Successful in 10s
2026-04-28 18:16:26 -04:00
17097cc3dc chores(order): moved some markdown files to the development folder 2026-04-28 17:56:29 -04:00
89268f19fb dev(ci): modified ruff call to not check dev scripts/
All checks were successful
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 20s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Successful in 13m27s
CI / Test (Live) (3.11) (push) Successful in 1m17s
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
CI / Merge dev → testing (push) Successful in 11s
2026-04-28 17:47:28 -04:00
150e5eb5be docs(readme): add Ko-Fi button
Some checks failed
CI / Lint (ruff) (push) Failing after 16s
CI / SAST (bandit) (push) Successful in 24s
CI / Dependency audit (pip-audit) (push) Successful in 27s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-28 16:15:02 -04:00
f2e01d8ea6 dev(ci): modified ci to not run fuzzing tests; done locally 2026-04-28 16:13:32 -04:00
15b2e7ba5c refactor(db): split credentials.py into a credentials/ subpackage
Splits the 459-line credentials.py into two submixins plus a composing
CredentialsMixin in credentials/__init__.py:

  _core.py    (~190)  Credential capture: upsert, list, filters,
                      per-attacker / per-secret reads, attacker_uuid
                      backfill
  reuse.py    (~270)  CredentialReuse correlation: upsert, candidate
                      mining, list/get + the _enrich_with_secret helper
                      that lifts the printable/b64 from underlying rows

_merge_unique stays with reuse.py (its only caller).
_enrich_with_secret stays with reuse.py — it's an internal helper of
list_credential_reuses / get_credential_reuse_by_id, never called
from the capture path.
2026-04-28 16:05:57 -04:00
3d00de8fd3 refactor(db): split attackers.py into an attackers/ subpackage
Splits the 494-line attackers.py into five submixin files plus a
composing AttackersMixin in attackers/__init__.py:

  _core.py       (~95)   Attacker CRUD + _deserialize_attacker
  behavior.py    (~110)  AttackerBehavior + _deserialize_behavior
  sessions.py    (~50)   SessionProfile read/write
  smtp.py        (~70)   SmtpTarget per-attacker + cross-attacker views
  activity.py    (~190)  log-derived activity (commands, leaks,
                         artifacts, stored mail, session log, transcripts)

IdentitiesMixin.list_observations_for_identity calls
self._deserialize_attacker; MRO resolves it onto AttackersCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:46:28 -04:00
5e7d68fde3 refactor(db): split topology.py into a topology/ subpackage
Splits the 694-line topology.py into five submixin files plus a
composing TopologyMixin in topology/__init__.py:

  _core.py       (~225)  topologies CRUD + _assert_pending /
                         _check_and_bump_version concurrency guards
  lans.py        (~115)  LAN CRUD
  deckies.py     (~130)  topology decky CRUD + list_running_topology_deckies
  edges.py        (~80)  edge CRUD + status-event log
  mutations.py   (~165)  live reconciler queue (atomic claim + state writes)

Sibling submixins call self._assert_pending and
self._check_and_bump_version; MRO resolves them onto TopologyCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:16:42 -04:00
20e89eb0a6 refactor(db): extract TopologyMixin
Moves the 31 MazeNET topology methods (topologies CRUD, LANs, deckies,
edges, status events, mutation queue) into sqlmodel_repo/topology.py.
Includes _assert_pending and _check_and_bump_version concurrency
guards.

This is the last domain extraction; sqlmodel_repo/__init__.py is now
~165 lines: lifecycle (initialize/reinitialize/migrations), the admin
self-heal seed, get_state/set_state, and the mixin composition.
2026-04-28 15:11:14 -04:00
7483d01311 refactor(db): extract IdentitiesMixin and CampaignsMixin
Splits the AttackerIdentity and Campaign clustering reads/writes into
sqlmodel_repo/identities.py and sqlmodel_repo/campaigns.py.

Both call _deserialize_attacker (identities only) which resolves
through AttackersMixin via MRO.
2026-04-28 15:07:39 -04:00
912171d053 refactor(db): extract AttackersMixin
Moves the 19 attacker-domain methods (core CRUD, behavior, sessions,
smtp targets, log-derived activity views) plus the _deserialize_attacker
and _deserialize_behavior helpers into sqlmodel_repo/attackers.py.
2026-04-28 15:04:51 -04:00
7ba8bafcaa refactor(db): extract CredentialsMixin
Moves the 12 credential and credential-reuse methods (incl. the
_merge_unique and _enrich_with_secret helpers) into
sqlmodel_repo/credentials.py.
2026-04-28 15:00:04 -04:00
5b1af331b9 refactor(db): extract CanaryMixin
Moves the 13 canary blob/token/trigger methods into
sqlmodel_repo/canary.py.
2026-04-28 14:55:52 -04:00
03b3c8855c refactor(db): extract OrchestratorMixin
Moves the 9 orchestrator event/email log + prune methods into
sqlmodel_repo/orchestrator.py.
2026-04-28 14:54:20 -04:00
555cd13f09 refactor(db): extract RealismMixin
Moves the 8 synthetic-file + realism-config methods into
sqlmodel_repo/realism.py.
2026-04-28 14:52:59 -04:00
9b845269c9 refactor(db): extract LogsMixin
Moves the 8 log methods (incl. get_stats_summary aggregator) into
sqlmodel_repo/logs.py. get_log_histogram remains an abstract dialect
override point; sqlite/mysql subclasses still override it via MRO.
2026-04-28 14:51:35 -04:00
a0aeba5abc refactor(db): extract FleetMixin and promote JSON helpers
Moves the 6 fleet-decky methods (incl. cross-source list_running_deckies
aggregator) into sqlmodel_repo/fleet.py. _serialize_json_fields and
_deserialize_json_fields move to _helpers.py since they're shared
across fleet, topology, and canary.
2026-04-28 14:50:01 -04:00
d989cd0461 refactor(db): extract WebhooksMixin
Moves the 9 webhook-subscription methods (CRUD + delivery
bookkeeping) into sqlmodel_repo/webhooks.py.
2026-04-28 14:47:42 -04:00
167f140b0e refactor(db): extract BountiesMixin
Moves the 5 bounty methods plus the cross-table purge_logs_and_bounties
helper into sqlmodel_repo/bounties.py.
2026-04-28 14:46:39 -04:00
c6804d79b6 refactor(db): extract DeckiesMixin
Moves the 4 decky-shard CRUD methods into sqlmodel_repo/deckies.py.
2026-04-28 14:45:15 -04:00
eebf9e4c97 refactor(db): extract AuthMixin
Moves the 7 user CRUD methods into sqlmodel_repo/auth.py.
_ensure_admin_user stays in __init__.py so DECNET_ADMIN_PASSWORD
remains addressable at the module path tests already monkeypatch.
2026-04-28 14:43:49 -04:00
99adbebe75 refactor(db): extract SwarmMixin
Moves the 7 swarm-host CRUD methods into sqlmodel_repo/swarm.py.
2026-04-28 14:42:58 -04:00
85c914e754 refactor(db): extract AttackerIntelMixin
Moves upsert_attacker_intel, get_attacker_intel_by_uuid,
and get_unenriched_attackers into sqlmodel_repo/attacker_intel.py.
Composed onto SQLModelRepository via mixin inheritance.
2026-04-28 14:40:36 -04:00
e16f47ad24 refactor(db): extract _safe_session/_detach_close to _helpers.py
Module-level session helpers move into sqlmodel_repo/_helpers.py.
__init__.py re-exports them so external import paths
(decnet.web.db.sqlmodel_repo._safe_session) keep resolving.
2026-04-28 14:38:26 -04:00
4167345d51 refactor(db): convert sqlmodel_repo.py to a package
Pure rename — the old monolithic 3505-line file becomes
decnet/web/db/sqlmodel_repo/__init__.py. No code changes.
Subsequent commits will extract per-domain mixins out of __init__.py
to mirror the topical layout used by decnet/web/db/models/.
2026-04-28 14:37:18 -04:00
6d8c90777d chore: remove vulture-flagged dead code, add whitelist
- plain.py: drop `or True` short-circuit + unreachable return; drop now-unused _HASH_HINTS
- ingester.py: drop unused `current_position` param from _flush_batch
- vulture_whitelist.py: document remaining false positives (FastAPI Depends side-effects, IMAP uid_mode where UID==seq)
2026-04-28 14:30:12 -04:00
b994250ef6 dev(ci): added CVE-2026-3219 to ignore vulns; no fix is yet available
Some checks failed
CI / Lint (ruff) (push) Successful in 13s
CI / SAST (bandit) (push) Successful in 21s
CI / Dependency audit (pip-audit) (push) Successful in 34s
CI / Test (Standard) (3.11) (push) Successful in 13m21s
CI / Test (Live) (3.11) (push) Successful in 1m45s
CI / Test (Fuzz) (3.11) (push) Failing after 3h1m24s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-28 14:24:57 -04:00
b4adc7246f fixed: deleted line from pyproject
Some checks failed
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Successful in 24s
CI / Dependency audit (pip-audit) (push) Failing after 32s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-28 13:03:14 -04:00
674ac7dd13 test(db): cover BaseRepository.update_identity_fingerprints
DummyRepo couldn't instantiate — TLS-cert fingerprint rollup added a new
abstract method without a stub here. Add the override and a call site so
the abstract pass body is hit.
2026-04-28 13:01:37 -04:00
cc6abf7256 fix(tests/stress): eliminate 0-request flakes in locust runs
Three independent issues conspired to make stress tests record 0 requests:

1. Every virtual user did /auth/login in on_start. With 1000 users in a
   spike window, bcrypt-bound logins never finished and on_start failed
   for all users — aggregated requests stayed at 0. Pre-fetch a single
   admin token in the fixture (cached per-host) and pass it via
   DECNET_STRESS_TOKEN so locust users skip the login storm.

2. Locust exits non-zero on any request failure by default, causing
   run_locust to throw away an otherwise valid stats CSV. Pass
   --exit-code-on-error 0 so per-test assertions are the only fail gate.

3. test_stress_sustained ran two locust subprocesses against the same
   uvicorn. Phase 1's keep-alive connections wedged phase 2 into 0
   recorded requests ~2/3 of the time. Refactored stress_server into a
   start_stress_server() context manager and gave each phase its own
   uvicorn.

Stable 3/3 on full suite, 3/3 on test_stress_sustained alone.
2026-04-28 13:01:11 -04:00
681931d9bb docs(roadmap): tick certificate details and three sibling roadmap items 2026-04-28 11:41:17 -04:00