1491 Commits

Author SHA1 Message Date
af615f8d44 Merge release/1.1.1: ttp test fixes (v1.1.1)
Corrects stale confidence_max ceiling tests + documented topics set.
Test-only patch, no production change.
v1.1.1
2026-06-18 19:23:48 -04:00
d1974ca6f6 release: bump to v1.1.1 (test-only patch)
Corrects stale confidence_max ceiling tests + documented-topics set.
No production code change.
2026-06-18 19:23:39 -04:00
a26dfe4d47 fix(ttp): correct stale clip tests to ceiling semantics + document ATTACKER_FINGERPRINTED topic
confidence_max is a ceiling (min(base, ceiling)), not a multiplier — the
ASVS pass fixed this (BUG-8: min(base, base*ceiling) -> min(base, ceiling)),
but 4 lifter clip tests still encoded the old base*ceiling math (0.45/0.4/
0.35) and were masked by the make test-web bundle error fail-fast. All four
now assert the 0.5 ceiling. Separately, test_topics_matches_documented_set
lacked attacker.fingerprinted, which worker.py legitimately subscribes to
(JARM/HASSH/tcpfp/ipv6_leak -> TTP tagging). Located via turbovec + git pickaxe.

(cherry picked from commit f83b467c35649a06fa36f4b350e6666379cd71cb)
2026-06-18 19:22:54 -04:00
3a3392bdee Merge release/1.1: worker consolidation (v1.1.0)
batch + cpu supervisor groups, verified live: 2.57GB -> 1.83GB (-737MB).
Prefork deferred to 1.2.0.
v1.1.0
2026-06-18 18:52:53 -04:00
c7d5f3a086 release: bump to v1.1.0; add CHANGELOG
Worker consolidation release. batch + cpu supervisor groups (verified
live, -737MB / 2.57GB->1.83GB). Prefork (process-model change) deferred
to 1.2.0.
2026-06-18 18:52:31 -04:00
bce2c1940c feat(1.1): supervise cpu group with ProcessPoolExecutor kernel offload
Hosts clusterer/campaign-clusterer/attribution/reuse-correlate in one
process. The two O(n^2) connected-components kernels (cluster_observations,
cluster_identities) offload to ONE shared forkserver pool via decnet.offload
.run_kernel, so they run in parallel instead of serialising under the GIL.

- offload.run_kernel: pool when installed + offload_if holds, else inline.
  Standalone workers and all tests run inline => behaviour unchanged
  (424 clustering/correlation tests green).
- offload_if gates on input size (>=256) to skip pickle cost on small passes.
- forkserver (not fork): supervisor is multithreaded via bus clients.
- attribution/reuse co-located but not offloaded yet (lighter; same run_kernel
  path extends to them if profiling shows contention).
- systemd unit Conflicts= the 4 units it replaces; no docker/raw-socket priv.
2026-06-17 17:35:42 -04:00
6d7d2c0e24 chore: stop tracking development/ (internal design notes) 2026-06-17 17:29:21 -04:00
3df9770cec docs(1.1): batch group VERIFIED LIVE — 509MB -> 129MB (-380MB / 75%)
Controlled unit swap on mothership: single PID hosts all 4 workers,
shared repo + 30s reconcile pass confirmed working, no crashes. Live
delta beat the floor estimate. Batch group complete.
2026-06-17 17:28:18 -04:00
b1cda1b015 fix(1.1): add decnet-enrich.service to batch supervisor Conflicts=
enrich is a batch-group member; its individual unit must also be mutually
exclusive with the supervisor. Unit auto-renders via init.py glob of
deploy/decnet-*.service.j2 — no installer list change needed.
2026-06-17 17:26:09 -04:00
5b56c66bb5 docs(1.1): measured batch-group RAM delta (-350MB / ~70%)
4 live batch workers = 509MB; consolidated startup floor = 118.5MB
(imports 102.5 + repo pool 15.7 once vs 4x + bus 0.3). Side-effect-free
measurement (no mutation tick). Floor is a lower bound; live adds modest
per-worker working set. Exact live number pending controlled unit swap.
2026-06-17 16:57:21 -04:00
805e2b33fc docs(1.1): mark C5 batch group shipped, live-swap verification pending 2026-06-17 16:50:43 -04:00
3a46864f30 feat(1.1): decnet supervise batch group + systemd unit (C5)
Hosts reconcile/enrich/orchestrate/mutate in one process via the
supervision primitive: one import floor, one shared repo/DB pool instead
of 4. Static group registry (membership is architectural, not a knob);
factories lazy-import only the hosted workers. systemd unit Conflicts=
the individual units it replaces and documents the union-of-privileges
cost. Worker code unchanged — any member is extractable by editing _build_specs.
2026-06-17 16:50:09 -04:00
12aaa9d820 feat(1.1): in-process worker supervision primitive (C5)
supervise(): per-worker restart loop with exponential backoff (in-process
Restart=on-failure). run_group(): hosts workers as concurrent independently-
supervised tasks — one crash never cancels siblings (deliberately NOT
asyncio.TaskGroup, whose all-or-nothing cancel breaks isolation). SIGTERM/
SIGINT → graceful cancel-and-await. Tests cover restart, clean-exit,
crash-isolation, shutdown, empty group.
2026-06-17 16:48:32 -04:00
23075bcdcd docs(1.1): correct grouping to co-residency reality
forwarder/listener are role-split swarm singletons, not co-resident with
the herd — drop from grouping. Real master-resident batch group shares
the repo singleton (one DB pool when consolidated). Stage 1 = batch:
reconcile/enrich/orchestrate/mutate. webhook/canary deferred standalone.
2026-06-17 16:47:13 -04:00
fc43909221 docs(1.1): consolidation design — supervise by failure domain
HOW to consolidate: supervision-loop primitive (not TaskGroup, whose
all-or-nothing cancel breaks isolation); group by failure domain +
resource profile keeping per-group cgroup limits; every worker remains
config-extractable. Recommend process-groups now (~18->~9 units),
evaluate prefork+gc.freeze CoW on 3.14 as the higher-ceiling follow-on.
2026-06-17 16:40:19 -04:00
87eb986467 docs(1.1): correct strategy — consolidation is primary lever (C3)
Measurement after C2: all 25 CLI modules transitively pull the SQLModel
ORM, and the table chain is a hub (one table loads the whole registry).
Most workers genuinely use the DB, so lazy imports only help the 2-3
DB-less workers. Consolidation (pay the 86MB floor once for the herd)
is the reliable ~600MB win — promoted from fallback to primary.
2026-06-17 16:36:04 -04:00
4e2b1cdaf3 perf(1.1): lazy topology.generate re-export (C2)
topology/__init__ eagerly imported generator -> allocator -> repository ->
the full SQLModel ORM. Defer via PEP 562 __getattr__ so importing the
package doesn't drag the DB layer into DB-less workers. Public API
(from decnet.topology import generate) unchanged. Guard test locks it in.
2026-06-17 16:35:30 -04:00
825d7d72c9 docs(1.1): RAM footprint analysis + release plan
Fleet resident set ~2.57GB across 18 workers; ~1.5GB is the 86MB import
floor paid 18x. Pinned root cause: topology/__init__ eager re-export of
generate drags the full SQLModel ORM (26 tables, ~38MB) into every worker.
2026-06-17 16:32:54 -04:00
70e566db42 Merge feat/pro-cli-surface: pro CLI/daemon extension surface 2026-06-17 15:21:07 -04:00
62f5fb652e feat(pro): add pro CLI/daemon extension surface
The core CLI scans decnet/pro/cli/ and calls each module's register(app),
registered before the master-only gate so pro commands are mode-filtered like
the rest. Lets the Professional tier add commands and standalone daemon entry
points (decnet pro-<cmd> serve, supervised by a systemd unit). No-op in the
Community build (no decnet.pro). Test asserts the shipped pro group registers
when mounted; skips otherwise.
2026-06-17 15:21:06 -04:00
dd1b754f65 Merge feat/pro-extension-surfaces: multi-surface pro extension points 2026-06-17 15:02:28 -04:00
a47f99c449 feat(pro): generalize pro tier to multi-surface extension points
Move the pro mount decnet/services/pro/ -> decnet/pro/ so the Professional tier
can contribute to more than honeypots. The core wires each surface only when
decnet/pro/ is present (absence stays the entitlement gate):

* services  — registry scans decnet/pro/services/ (was decnet/services/pro/)
* API routes — decnet/pro/routes.py exposes ROUTERS, mounted under /api/v1
* web pages  — Vite aliases @pro to the pro frontend (community -> empty stub),
               App.tsx maps proRoutes into <Route>s, Layout renders a
               PROFESSIONAL nav group; both tree-shake out of the community build

Frontend gate mirrors the existing VITE_DECNET_DEVELOPER tree-shake pattern.
Tests: registry + router seams (backend), empty-stub contract (frontend).
2026-06-17 15:02:28 -04:00
80c92a6f80 Merge feat/open-core-tiers: community/professional tier split + dual-license 2026-06-17 13:23:53 -04:00
777606681e fix(tests): drop illegal @pytest.mark.anyio on anyio_backend fixture
Newer pytest raises 'Marks cannot be applied to fixtures' instead of ignoring
it. The async test methods already carry @pytest.mark.anyio, which is what
selects the backend; the fixture must not.
2026-06-17 13:22:35 -04:00
d90bc81060 feat(services): open-core community/professional tier split
Pro-tier honeypots load from an optional decnet/services/pro/ subpackage that
the registry auto-discovers when present; the Community build omits it, so the
directory's absence IS the entitlement gate (no runtime licence check). Recurse
subclasses so a pro service may extend a community one. Exclude pro from the
community wheel and git-ignore the path (it lives in the private
decnet-professional repo).

Add LICENSING.md documenting the dual-license: AGPL-3.0-or-later core plus a
commercial EULA for the Professional tier.
2026-06-17 13:22:35 -04:00
09b6a832ee docs(readme): add logo banner + badges to header 2026-06-16 19:13:17 -04:00
e95acbd4f2 Merge dev into main for v1.0.0 release 2026-06-16 19:04:04 -04:00
36353026a6 release: bump to v1.0.0; add PyPI metadata; stop globbing decnet_web into packages
Some checks failed
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Failing after 22s
CI / SAST (bandit) (push) Successful in 30s
CI / Merge dev → testing (push) Has been skipped
v1.0.0
2026-06-16 18:59:48 -04:00
a5cb20051f test(mazenet): add useMazeApi tests 2026-06-16 18:56:13 -04:00
a9c3f42ef9 feat(mazenet): topology editor updates; refine mutator ops materialisation 2026-06-16 18:55:20 -04:00
c9e4bf4022 feat(clustering): link identities by keystroke-rhythm proximity
Campaign clusterer gains a keystroke edge: when two identities'
kd_digraph_simhash centroids are within KD_HAMMING_MAX bits, a graded
weight (1.0 at identical, fading to 0 at the cutoff) feeds the campaign
graph. Supporting tier (0.6) — a typing match plus temporal overlap
reaches threshold, but typing alone never merges (FP guard against
coarse, noisy terminal timing).

Projects the column through IdentityFeatures + from_identity_row.
2026-06-16 17:09:42 -04:00
869d1eabb7 feat(clustering): roll session digraph SimHashes into identity centroid
The identity clusterer folds an identity's per-session
motor.digraph_simhash observations into one 8-byte bitwise-majority
centroid (denoises per-session jitter) and writes it to
AttackerIdentity.kd_digraph_simhash via update_identity_fingerprints —
the orphaned column is now populated. list_identities_for_clustering
projects it so the campaign clusterer can read it.

Extends the repo abstract + DummyRepo stub/coverage.
2026-06-16 17:05:34 -04:00
66c73ce59d feat(profiler): extract motor.digraph_simhash keystroke biometric
Per-session 64-bit SimHash of inter-keystroke digraph flight times:
walk single-char input events, accumulate flight time per (c1,c2),
bucket the median, Charikar-SimHash the bucketed pairs. Locality-
sensitive so the same typist is Hamming-close across sessions; pastes
and think-pauses break the chain; silent below the sample-size floor.

New shared decnet/util/simhash.py (simhash64/hamming64/bytes helpers).
Registered as a conditional Tier-A primitive (count 37->38); requires
behave-shell>=0.1.2.
2026-06-16 16:59:57 -04:00
372375194c refactor(db): run Alembic at boot, retire ad-hoc _migrate_* helpers
initialize() now delegates to _apply_schema(): real boots run
'alembic upgrade head' (schema owned by the migration history); tests
(DECNET_TESTING=1) keep create_all, which is faster and needs no upgrade
path. MySQL wraps the upgrade in the existing GET_LOCK advisory lock so
concurrent uvicorn workers don't race on DDL.

Deletes the three _migrate_* crimes (attackers-table legacy drop +
GeoIP backfill, TEXT->MEDIUMTEXT widening) — all now handled by the
baseline migration and the _BIG_TEXT model variants. Drops the test
file that only exercised the deleted helpers; adds tests pinning the
alembic-vs-create_all gate and guarding that every model table is in
the migration head.
2026-06-16 16:31:10 -04:00
ef4d67cbef build(db): add Alembic scaffolding + baseline migration
Introduce Alembic at v1. Migrations live inside the package
(decnet/web/db/migrations) so they ship with installs; alembic.ini at the
repo root drives the CLI. env.py is async and dual-backend, selecting the
engine from DECNET_DB_TYPE (mirroring db/factory.py) and reusing the app's
own connection when run programmatically.

The baseline captures all 39 tables. _BIG_TEXT round-trips as
Text().with_variant(MEDIUMTEXT, 'mysql'), so both backends get the right
column type from the migration. kd_digraph_simhash gains a sqlite BLOB
variant: BINARY(8) reflects as NUMERIC on SQLite and would otherwise trip
'alembic check' forever.
2026-06-16 16:30:29 -04:00
4f141c1a54 feat(web): stage live MazeNET edits behind an UPDATE button
Live topology edits fired one mutation per canvas action. That coupled
each edit to an immediate enqueue+apply, which (post-serialization)
raced the SSE refetch and duplicated optimistic placeholders, and gave
the user no chance to assemble a coherent changeset (add a net AND
bridge it) before any of it landed.

Live edits now STAGE: each editor primitive records its op and returns
immediately; the optimistic placeholders callers already draw are the
staged preview. The action button reads UPDATE (n) when live (DEPLOY
when pending) and flushes the batch through the slice-1 submit queue —
sequential, version-cursored, each awaited to a terminal state, stopping
loudly on the first failure with the unapplied remainder kept for retry.
REFRESH becomes DISCARD (n) to drop the batch. SSE refetch is paused
during a commit so per-mutation applied events don't wipe still-staged
placeholders mid-batch; one refetch reconciles at the end.

Also fix _dropArchetype, which bailed without an optimistic node on the
staged path, leaving a decky added to an uncommitted LAN invisible until
UPDATE.
2026-06-16 12:59:57 -04:00
f18bfee746 fix(web): serialize live topology mutations + surface failures loudly
Live MazeNET edits fired their mutations fire-and-forget: each canvas
action enqueued immediately and never awaited the result. Two failures
followed from that:

- expected_version is bumped at ENQUEUE (not at apply), so two ops fired
  back-to-back raced — the second carried a stale version and 409'd.
  Edits only worked when hand-paced (an SSE refetch landed between them).
- A failed mutation degrades the topology, but the only signal was a 4s
  toast, so the user saw DEGRADED with no cause.

useTopologyEditor now routes every live op through a serialized submit
queue: one enqueue in flight at a time (submission order preserved), an
optimistic expected_version cursor advanced per enqueue so back-to-back
ops (e.g. reparent's detach+attach) don't need a refetch between them,
and each mutation awaited to a terminal state. A 'failed' row throws
MutationFailedError, which the page pins as a persistent UPDATE FAILED
banner instead of a vanishing toast.

Slice 1 of the live-edit rework; stage+UPDATE-button batching and louder
backend materialisation reporting to follow.
2026-06-16 12:46:09 -04:00
5505de782f feat(web): wire local-decky teardown to DELETE /deckies/{name}
The Fleet UI only showed TEARDOWN for swarm-pinned deckies (POST
/swarm/hosts/{uuid}/teardown). Local deckies had no delete control though
the API now exposes DELETE /deckies/{name}.

teardown() branches on swarm vs local; the card's two-step arm/CONFIRM
button renders for any admin, keyed td:${host_uuid ?? 'local'}:${name}.
2026-06-16 12:15:48 -04:00
0c10869e26 feat(web): DELETE /deckies/{name} single-decky teardown endpoint
The Fleet module had no delete — neither UI nor API — though the engine
capability existed (engine.teardown(decky_id=...), exposed only via
`decnet teardown --id`). Wire it to HTTP.

DELETE /deckies/{name} (admin-gated, 204). Synchronous: a single decky's
compose stop/rm is quick, so it's awaited off-thread rather than the
202+lifecycle path deploy/mutate use for slow builds. The single-decky
teardown never touches the host macvlan interface, so it needs no extra
CAP_NET_ADMIN.

State consistency: engine.teardown removes the containers and the
fleet_deckies row but leaves the decky in decnet-state.json. Left as is, the
reconciler would see "present in JSON, absent from DB" and re-INSERT the row,
resurrecting the decky. So the handler prunes it from both decnet-state.json
and the DB deployment key after teardown; deleting the last decky clears
state entirely (DecnetConfig.deckies has min_length=1).

Route ordering: the dynamic DELETE /deckies/{decky_name} is registered AFTER
the fixed /deckies/* routes (Starlette matches in registration order), so it
no longer shadows DELETE /deckies/files (file-drop).

Tests cover 401/403/404/422, single-delete pruning, and last-decky clear.
2026-06-16 12:07:10 -04:00
8db593a544 test(api): repair pre-existing rotted tests (SSE ticket flow, password policy)
These had been red since the changes they cover landed — invisible because
the pre-commit gate runs mypy/ruff/bandit/pip-audit but NOT pytest, so failing
tests don't block commits and quietly accumulate.

- SSE stream/events auth migrated from ?token=<jwt> to a single-use ?ticket=
  (commit efb4e49d). Three tests still passed a raw JWT as ?token= and got
  401. Updated to mint a ticket via POST /auth/sse-ticket and pass ?ticket=
  (attacker events, topology events, /stream).
- The user-creation password policy is min_length=12; the RBAC admin-access
  test still used a 10-char password and was rejected. Bumped to a valid one.
2026-06-16 12:06:56 -04:00
9eb2803d04 chore(deps): bump cryptography/python-multipart/starlette for CVEs
pip-audit flagged fixable advisories in the web stack:
- cryptography  -> >=48.0.1  (GHSA-537c-gmf6-5ccf)
- python-multipart -> >=0.0.31 (CVE-2026-53538/53539/53540)
- starlette (transitive via fastapi) -> add direct floor >=1.3.1
  (CVE-2026-48817/48818/54282/54283)

Venv synced to cryptography 49.0.0, python-multipart 0.0.32, starlette
1.3.1; full tests/api/ suite green against the bump. Also drops the stray
browser-use[core] dev dep (the browser-use skill uses a global CLI; the
package is imported nowhere in DECNET).
2026-06-16 12:06:20 -04:00
207494f41e fix(web): duplicated deploy log lines + cancelled auto-close in DeployWizard
Two defects exposed after the deploy-success loop fix (verified live):

1. Duplicated / skipped transcript lines. The placeholder-log interval did
   `setLog(prev => [...prev, msgs[i]])` then `i++`. React 18 auto-batches
   setInterval updaters, so the updater ran after i had advanced and read the
   wrong index — skipping some lines ([NET], [SENSE]) and duplicating others
   ([TLS]). Fixed by capturing `const line = msgs[i]` before scheduling the
   update. A placeholderStartedRef also gates the effect to one run per deploy
   (reset in startDeploy) as defense-in-depth against re-render churn.

2. Wizard never closed on success. The completedRef guard combined with the
   auto-close effect's cleanup was self-defeating: a re-render inside the
   700ms window (e.g. the [OK] terminal-log append) ran the cleanup, clearing
   the pending close, and the guard then blocked rescheduling — so onComplete
   never fired. The timer now lives in a ref cleared only on unmount, so a
   scheduled close always fires exactly once regardless of re-renders.

Adds a regression test that a re-render during the close countdown does not
cancel the close. Verified end-to-end against the live instance: all 8 log
lines render once in order, the wizard auto-closes, and /deckies does not
storm.
2026-06-13 00:52:39 -04:00
2a9d1989b6 fix(web): stop runaway deploy-success loop in DeployWizard
After a successful deploy the wizard hammered GET /deckies and stacked
"DEPLOYED" toasts unbounded (~1.4/s, forever). Two compounding causes:

1. The auto-close effect's dep array includes onComplete, which the parent
   passes as an inline arrow (new reference every render). onComplete calls
   refresh(), re-rendering the parent, producing a new onComplete ref,
   re-running the effect, which reschedules onComplete — a feedback loop.
2. DeployWizard stayed permanently mounted (open only toggled a child
   overlay), so its hooks kept running with lifecycleDone===true after close.

Fix both: a completedRef guard makes the auto-close fire exactly once per
deploy regardless of effect re-runs (reset in startDeploy), and the parent
now mounts the wizard only while open so closing tears down its hooks and
clears the latent "reopen re-completes stale rows" path.

Lifecycle polling itself was never the runaway — it stops cleanly at
terminal status; the bounded ~25-poll build window is expected.

Adds a regression test asserting onComplete fires once across a simulated
re-render storm.
2026-06-13 00:19:39 -04:00
ab1151ee7f fix(fleet): read existing fleet from fleet_deckies, not State["deployment"] (BUG-2)
The web deploy collision-guard read the existing fleet from the DB
State["deployment"] key, while the UI/get_deckies() read decnet-state.json.
A fleet established via CLI/seed lands in neither path the guard consulted,
so existing_deckies was empty, the additive guard ran blind, and the
reconciler tore the running fleet down to the single submitted decky
(BUG-2: silent fleet wipe, HTTP 202, no warning).

Converge both reads on fleet_deckies — the engine-mirrored table written on
every deploy/teardown (CLI and web), which fleet/reconciler.py already
documents as the store the orchestrator, dashboard, and REST API see. Each
row's decky_config column is a full DeckyConfig dump, so it rehydrates
losslessly into the collision-guard input. The handler also commits the
intended fleet to fleet_deckies synchronously so rapid sequential deploys
read a current fleet and the dashboard observes the new shape immediately.

State["deployment"] is retained for now — the mutate handlers and the
mutator engine still coordinate through it; consolidating them is tracked
in development/ADR-001-FLEET-SOURCE-OF-TRUTH.md (open question 7).

Tests seed fleet_deckies directly (also modelling the CLI-seeded scenario)
rather than chaining real deploys through the skipped contract-test path.
2026-06-12 23:52:20 -04:00
408810b3e2 chore(security): pin TLSv1.2 floor on control-plane mTLS clients
Follow-up to V9.1.4 (which covered only the syslog forwarder/listener): set
ctx.minimum_version = TLSVersion.TLSv1_2 on the remaining DECNET-owned mTLS
client contexts — AgentClient (_build_client + _fetch_peer_fingerprint),
UpdaterClient (_build_client + _fetch_peer_fingerprint), and the updater
executor's worker context. Pure hardening, no behavior change for TLS1.2+
peers (confirmed by the existing mTLS round-trip suites).

Deliberately EXCLUDED — hardening these would be counterproductive:
- templates/https/server.py, templates/rdp/server.py: honeypot listeners,
  where looking weak/old is part of the deception.
- prober/tlscert.py: outbound TLS fingerprinting prober, which must speak
  whatever the attacker's target offers.

Added a floor-assertion test (spies httpx.AsyncClient to capture the real
verify= context).
2026-06-12 19:06:50 -04:00
efe4e49de6 fix(web): restore SSE streams via single-use ticket flow
The V3.1.1 backend change moved SSE auth off ?token=<JWT> onto a single-use
?ticket=, but the dashboard was never updated, so every live stream 401'd
('Could not validate credentials'). Add mintSseTicket() (POST /auth/sse-ticket
with the Bearer JWT, returns an opaque 60s single-use ticket) and refactor all
stream consumers to mint a fresh ticket at the top of each connect() — initial
and every reconnect — then open EventSource with ?ticket=. A reused single-use
ticket would 401-loop, so re-mint-per-connect is required.

Covers Dashboard /stream, LiveLogs, and the attacker/identity/campaign/
orchestrator/topology hooks. connect() is now async with an unmount guard
(cancelled flag checked after the await, before opening the stream); on a mint
401 the connect is skipped and the axios logout interceptor takes over.
2026-06-12 19:00:15 -04:00
593492411c feat(web): live password-strength checklist on change-password
The change-password form let the browser submit short passwords the API
then rejected with an opaque 'Schema structural violation' 400. Add a pure
validateNewPassword() util (>=12 chars, <=72 bytes, >=3 of 4 character
classes — constants tweakable) and a live ✓/✗ checklist above the submit
button so the user sees exactly what's missing. Submit is gated on
validity + confirm-match, so the form can no longer reach that 400.

- Fix minLength 8->12 on the Login change-password inputs and the UsersTab
  admin-reset guard (both lagged the API's min_length=12).
- Light-mode: render the checklist box fully white with black text (the
  neon-on-dark styling read as muddy grey); ✓/✗ icons keep a green/red cue.
- Advisory UX only — the API min_length=12 remains the enforcement boundary;
  character-class complexity is not server-enforced.
2026-06-12 18:59:46 -04:00
721122a7ef chore(types): enable warn_return_any and cast all no-any-return sites
Turn on mypy warn_return_any (pyproject) and resolve the 84 resulting
[no-any-return] errors across 43 files with typing.cast() at the return
sites — runtime no-ops that make the declared return type explicit where a
dependency (SQLAlchemy scalar/first/one, httpx .json(), subprocess, docker
SDK) hands back Any. No behavior change: no DTO/table field types altered, no
validation/coercion calls added, every cast reflects the true runtime type.

Locks in return-type strictness so the class of bug where a function silently
widens to Any can't regress. mypy decnet/ clean; adversarially verified
behavior-preserving (84 casts 1:1 with prior returns).

Bump tornado 6.5.5 -> 6.5.7 (CVE-2026-49854, transitive via snakeviz).
2026-06-12 18:21:22 -04:00
337520c7ad fix(security): close INFO ASVS findings — secret echo, TLS floor, mandatory tarball SHA, CORS/Content-Type guards, BUG-17
- V7.1.3: env known-insecure-default error no longer echoes the rejected secret value.
- V9.1.4: syslog-over-TLS forwarder + listener pin minimum_version=TLSv1_2.
- V12.1.2: updater tarball SHA-256 verification is now mandatory and fail-closed —
  /update and /update-self reject a missing digest (400), the executor rejects
  missing/mismatched digests before extract/apply. Every push path supplies it.
- V13.1.4: reject a wildcard '*' in DECNET_CORS_ORIGINS at startup.
- V13.1.5: enforce application/json on JSON write endpoints (415 otherwise),
  exempting multipart upload routes.
- BUG-17: SSE error log records the user uuid, not the resume cursor.

Also completes V2.1.7 consistently: the attacker-injectable PYTEST* env bypass is
replaced with explicit DECNET_TESTING=1 in the three remaining sites
(env.validate_public_binding, config logging, mysql url builder).

Tests added for every fix; unanimous adversarial review (no update-outage risk —
all push paths verified to send the digest).
2026-06-10 13:50:06 -04:00
245975a6dd fix(security): close LOW ASVS findings — env bypass, SSE/deployment authz, CN fail-close, password byte-limit, exception leaks, BUG-12..16
Auth/session (V2.1.7, V4.1.5, V4.1.6, V2.1.4/V2.1.5):
- env secret validation no longer bypassed by attacker-injectable PYTEST* env;
  gated on explicit DECNET_TESTING=1 (set only in conftest).
- must_change_password now enforced on the SSE header-JWT path, not just ticket mint.
- GET /system/deployment-mode requires viewer auth (was leaking role + topology size).
- CreateUser/ResetUser passwords min_length=12; passwords >72 bytes rejected
  explicitly instead of bcrypt silently truncating.

Swarm ingestion (V9.1.3, BUG-16):
- Log listener hard-rejects peers with unparseable/empty cert CN (fail closed,
  ingests nothing) instead of tagging 'unknown'.
- Shutdown handlers no longer swallow real errors (narrowed to CancelledError).

Info leakage (V7.1.2, V14.1.2):
- Exception text sanitized on swarm-update, health, tarpit, realism, file-drop,
  blank-topology endpoints (raw tc/docker stderr, DB/Docker errors logged
  server-side, generic detail returned). pyproject license corrected to AGPL-3.0.

Correctness (BUG-12..16):
- BUG-12 atomic credential upsert (UNIQUE constraint + IntegrityError retry,
  consistent principal_key canonicalization).
- BUG-13 rule-tail watermark uses >= with seen-id dedup (no same-second drop).
- BUG-14 worker wake cleared before wait (no lost wake during tick).
- BUG-15 intel gather tolerates an unexpected provider raise.
- BUG-16 see above.

Already-closed (verified, no change): V2.1.6, V5.1.3, V9.1.2. Accept-risk +
documented: V2.1.8 cache window, V3.1.3 idle timeout. Tests added for every fix;
unanimous adversarial review after two refute-fix rounds.
2026-06-10 13:27:14 -04:00