Commit Graph

1503 Commits

Author SHA1 Message Date
2ca6533666 fix(topology): anchor compose path to run dir, stop install-dir litter
_topology_compose_path returned a CWD-relative Path, so every
deploy/mutate/dry-run wrote decnet-topology-<id8>-compose.yml into the
process CWD (the install dir). Teardown computed the same relative path
against its own CWD, so when it differed the unlink() missed the orphan
and files accumulated forever.

Anchor to $DECNET_RUN_DIR (default /var/lib/decnet/topologies, tempdir
fallback) so write and teardown always agree regardless of CWD.
2026-06-18 21:24:00 -04:00
bf66e875a5 chore: drop decnet.tar build artifact, gitignore it 2026-06-18 21:16:29 -04:00
b0bf31a31e feat(topology): scan-based creation wizard option (Pro contract + wiring)
Adds the @pro ScanImport contract (ProScanImportProps/ProScanImport) and
a null community stub, and slots a third SCAN-BASED card into
CreateTopologyWizard, gated on the pro panel being present so it
tree-shakes out of the community build. The scan->topology importer
itself ships in decnet/pro v1.2.0. CHANGELOG updated under [1.2.0].
2026-06-18 20:36:09 -04:00
d7a2b5b9cf Merge release/1.2: prefork consolidation (v1.2.0)
Prefork supervisor (decnet.prefork) + 'decnet fleet heavy' (profiler+ttp,
CoW-shared, ~412MB Pss vs 661MB). ATT&CK bundle -> decnet/data/ (19.1).
Removed 10 per-worker unit templates superseded by the supervisor groups
and the heavy fleet.
v1.2.0
2026-06-18 19:44:17 -04:00
c918538f35 release: bump to v1.2.0; finalize CHANGELOG
Prefork worker consolidation (decnet.prefork + decnet fleet heavy),
ATT&CK 19.1 relocation to decnet/data/, and removal of the 10 per-worker
unit templates superseded by the supervisor groups + heavy fleet.
2026-06-18 19:43:53 -04:00
beaa604811 chore(1.2): remove per-worker unit templates superseded by consolidation
The batch/cpu supervisor groups + heavy fleet replace 10 per-worker units
(reconciler/enrich/orchestrator/mutator/clusterer/campaign-clusterer/
attribution/reuse-correlator/profiler/ttp). Removed their deploy/*.service.j2
templates and rewired decnet.target to the 3 consolidated units. Dropped
test_orchestrator_unit.py (tested a removed unit). CLI commands (decnet ttp,
mutate, …) stay for manual runs; new units' Conflicts= still name the old
units defensively for hosts mid-migration.
2026-06-18 19:43:10 -04:00
7b0ff127c3 docs(1.2): heavy fleet verified live — ~412MB Pss vs 661MB; prefork helps base-floor-bound workers 2026-06-18 19:39:15 -04:00
419172ecfb docs(1.2): changelog — decnet fleet prefork command 2026-06-18 19:32:38 -04:00
fcc9a9aad1 feat(1.2): decnet fleet — prefork master for the heavy worker tier
Wires the prefork primitive into a CLI command. 'decnet fleet heavy' imports
the shared base floor once in the master, then forks profiler + ttp as
CoW-sharing child processes (own process/GIL, full isolation, shared ~71MB
floor). DB-only tier => systemd unit carries no extra privilege (prefork's
privilege-union cost is nil for this fleet). Unit Conflicts= the profiler/ttp
units it replaces. Heavy per-worker state (ATT&CK/ML) still loads per-child;
warming it in the master to share is deferred until a live RSS measurement
shows the big object graph CoW-shares rather than refcount-dirties.
2026-06-18 19:32:27 -04:00
1a765854ec fix(1.2): relocate ATT&CK bundle to decnet/data/, bump 19.0 -> 19.1
Bundle pointer moved from repo root to decnet/data/ (with LICENSE.txt),
gitignored + fetched on demand (51MB, MITRE-licensed). Version pin bumped
19.0->19.1 with the new sha256; license unchanged. All _REPO_BUNDLE test
constants repointed. Fixes test-web failures after the repo-root bundle
was deleted.
2026-06-18 19:25:50 -04:00
a5e11f7d86 chore(1.2): open 1.2.0 dev cycle (version 1.2.0.dev0, CHANGELOG Unreleased) 2026-06-18 19:25:11 -04:00
74096b6df0 feat(1.2): prefork supervisor primitive + tests (C, CoW gate passed)
CoW measurement on CPython 3.14: forked idle child keeps ~71MB shared,
dirties ~1MB private; working child ~26MB. PEP 683 immortal objects keep
code/module pages clean so gc.freeze() is unnecessary (freeze==nofreeze).

prefork.run_fleet: master imports the base floor once, forks one child
per worker (own process/GIL, CoW-shared floor), reaps + restarts with
backoff, graceful SIGTERM->SIGKILL shutdown. Not yet wired to a command
(that lands when 1.2 picks the target worker set).
2026-06-18 19:24:15 -04:00
af615f8d44 Merge release/1.1.1: ttp test fixes (v1.1.1)
Corrects stale confidence_max ceiling tests + documented topics set.
Test-only patch, no production change.
v1.1.1
2026-06-18 19:23:48 -04:00
d1974ca6f6 release: bump to v1.1.1 (test-only patch)
Corrects stale confidence_max ceiling tests + documented-topics set.
No production code change.
2026-06-18 19:23:39 -04:00
a26dfe4d47 fix(ttp): correct stale clip tests to ceiling semantics + document ATTACKER_FINGERPRINTED topic
confidence_max is a ceiling (min(base, ceiling)), not a multiplier — the
ASVS pass fixed this (BUG-8: min(base, base*ceiling) -> min(base, ceiling)),
but 4 lifter clip tests still encoded the old base*ceiling math (0.45/0.4/
0.35) and were masked by the make test-web bundle error fail-fast. All four
now assert the 0.5 ceiling. Separately, test_topics_matches_documented_set
lacked attacker.fingerprinted, which worker.py legitimately subscribes to
(JARM/HASSH/tcpfp/ipv6_leak -> TTP tagging). Located via turbovec + git pickaxe.

(cherry picked from commit f83b467c35649a06fa36f4b350e6666379cd71cb)
2026-06-18 19:22:54 -04:00
3a3392bdee Merge release/1.1: worker consolidation (v1.1.0)
batch + cpu supervisor groups, verified live: 2.57GB -> 1.83GB (-737MB).
Prefork deferred to 1.2.0.
v1.1.0
2026-06-18 18:52:53 -04:00
c7d5f3a086 release: bump to v1.1.0; add CHANGELOG
Worker consolidation release. batch + cpu supervisor groups (verified
live, -737MB / 2.57GB->1.83GB). Prefork (process-model change) deferred
to 1.2.0.
2026-06-18 18:52:31 -04:00
bce2c1940c feat(1.1): supervise cpu group with ProcessPoolExecutor kernel offload
Hosts clusterer/campaign-clusterer/attribution/reuse-correlate in one
process. The two O(n^2) connected-components kernels (cluster_observations,
cluster_identities) offload to ONE shared forkserver pool via decnet.offload
.run_kernel, so they run in parallel instead of serialising under the GIL.

- offload.run_kernel: pool when installed + offload_if holds, else inline.
  Standalone workers and all tests run inline => behaviour unchanged
  (424 clustering/correlation tests green).
- offload_if gates on input size (>=256) to skip pickle cost on small passes.
- forkserver (not fork): supervisor is multithreaded via bus clients.
- attribution/reuse co-located but not offloaded yet (lighter; same run_kernel
  path extends to them if profiling shows contention).
- systemd unit Conflicts= the 4 units it replaces; no docker/raw-socket priv.
2026-06-17 17:35:42 -04:00
6d7d2c0e24 chore: stop tracking development/ (internal design notes) 2026-06-17 17:29:21 -04:00
3df9770cec docs(1.1): batch group VERIFIED LIVE — 509MB -> 129MB (-380MB / 75%)
Controlled unit swap on mothership: single PID hosts all 4 workers,
shared repo + 30s reconcile pass confirmed working, no crashes. Live
delta beat the floor estimate. Batch group complete.
2026-06-17 17:28:18 -04:00
b1cda1b015 fix(1.1): add decnet-enrich.service to batch supervisor Conflicts=
enrich is a batch-group member; its individual unit must also be mutually
exclusive with the supervisor. Unit auto-renders via init.py glob of
deploy/decnet-*.service.j2 — no installer list change needed.
2026-06-17 17:26:09 -04:00
5b56c66bb5 docs(1.1): measured batch-group RAM delta (-350MB / ~70%)
4 live batch workers = 509MB; consolidated startup floor = 118.5MB
(imports 102.5 + repo pool 15.7 once vs 4x + bus 0.3). Side-effect-free
measurement (no mutation tick). Floor is a lower bound; live adds modest
per-worker working set. Exact live number pending controlled unit swap.
2026-06-17 16:57:21 -04:00
805e2b33fc docs(1.1): mark C5 batch group shipped, live-swap verification pending 2026-06-17 16:50:43 -04:00
3a46864f30 feat(1.1): decnet supervise batch group + systemd unit (C5)
Hosts reconcile/enrich/orchestrate/mutate in one process via the
supervision primitive: one import floor, one shared repo/DB pool instead
of 4. Static group registry (membership is architectural, not a knob);
factories lazy-import only the hosted workers. systemd unit Conflicts=
the individual units it replaces and documents the union-of-privileges
cost. Worker code unchanged — any member is extractable by editing _build_specs.
2026-06-17 16:50:09 -04:00
12aaa9d820 feat(1.1): in-process worker supervision primitive (C5)
supervise(): per-worker restart loop with exponential backoff (in-process
Restart=on-failure). run_group(): hosts workers as concurrent independently-
supervised tasks — one crash never cancels siblings (deliberately NOT
asyncio.TaskGroup, whose all-or-nothing cancel breaks isolation). SIGTERM/
SIGINT → graceful cancel-and-await. Tests cover restart, clean-exit,
crash-isolation, shutdown, empty group.
2026-06-17 16:48:32 -04:00
23075bcdcd docs(1.1): correct grouping to co-residency reality
forwarder/listener are role-split swarm singletons, not co-resident with
the herd — drop from grouping. Real master-resident batch group shares
the repo singleton (one DB pool when consolidated). Stage 1 = batch:
reconcile/enrich/orchestrate/mutate. webhook/canary deferred standalone.
2026-06-17 16:47:13 -04:00
fc43909221 docs(1.1): consolidation design — supervise by failure domain
HOW to consolidate: supervision-loop primitive (not TaskGroup, whose
all-or-nothing cancel breaks isolation); group by failure domain +
resource profile keeping per-group cgroup limits; every worker remains
config-extractable. Recommend process-groups now (~18->~9 units),
evaluate prefork+gc.freeze CoW on 3.14 as the higher-ceiling follow-on.
2026-06-17 16:40:19 -04:00
87eb986467 docs(1.1): correct strategy — consolidation is primary lever (C3)
Measurement after C2: all 25 CLI modules transitively pull the SQLModel
ORM, and the table chain is a hub (one table loads the whole registry).
Most workers genuinely use the DB, so lazy imports only help the 2-3
DB-less workers. Consolidation (pay the 86MB floor once for the herd)
is the reliable ~600MB win — promoted from fallback to primary.
2026-06-17 16:36:04 -04:00
4e2b1cdaf3 perf(1.1): lazy topology.generate re-export (C2)
topology/__init__ eagerly imported generator -> allocator -> repository ->
the full SQLModel ORM. Defer via PEP 562 __getattr__ so importing the
package doesn't drag the DB layer into DB-less workers. Public API
(from decnet.topology import generate) unchanged. Guard test locks it in.
2026-06-17 16:35:30 -04:00
825d7d72c9 docs(1.1): RAM footprint analysis + release plan
Fleet resident set ~2.57GB across 18 workers; ~1.5GB is the 86MB import
floor paid 18x. Pinned root cause: topology/__init__ eager re-export of
generate drags the full SQLModel ORM (26 tables, ~38MB) into every worker.
2026-06-17 16:32:54 -04:00
70e566db42 Merge feat/pro-cli-surface: pro CLI/daemon extension surface 2026-06-17 15:21:07 -04:00
62f5fb652e feat(pro): add pro CLI/daemon extension surface
The core CLI scans decnet/pro/cli/ and calls each module's register(app),
registered before the master-only gate so pro commands are mode-filtered like
the rest. Lets the Professional tier add commands and standalone daemon entry
points (decnet pro-<cmd> serve, supervised by a systemd unit). No-op in the
Community build (no decnet.pro). Test asserts the shipped pro group registers
when mounted; skips otherwise.
2026-06-17 15:21:06 -04:00
dd1b754f65 Merge feat/pro-extension-surfaces: multi-surface pro extension points 2026-06-17 15:02:28 -04:00
a47f99c449 feat(pro): generalize pro tier to multi-surface extension points
Move the pro mount decnet/services/pro/ -> decnet/pro/ so the Professional tier
can contribute to more than honeypots. The core wires each surface only when
decnet/pro/ is present (absence stays the entitlement gate):

* services  — registry scans decnet/pro/services/ (was decnet/services/pro/)
* API routes — decnet/pro/routes.py exposes ROUTERS, mounted under /api/v1
* web pages  — Vite aliases @pro to the pro frontend (community -> empty stub),
               App.tsx maps proRoutes into <Route>s, Layout renders a
               PROFESSIONAL nav group; both tree-shake out of the community build

Frontend gate mirrors the existing VITE_DECNET_DEVELOPER tree-shake pattern.
Tests: registry + router seams (backend), empty-stub contract (frontend).
2026-06-17 15:02:28 -04:00
80c92a6f80 Merge feat/open-core-tiers: community/professional tier split + dual-license 2026-06-17 13:23:53 -04:00
777606681e fix(tests): drop illegal @pytest.mark.anyio on anyio_backend fixture
Newer pytest raises 'Marks cannot be applied to fixtures' instead of ignoring
it. The async test methods already carry @pytest.mark.anyio, which is what
selects the backend; the fixture must not.
2026-06-17 13:22:35 -04:00
d90bc81060 feat(services): open-core community/professional tier split
Pro-tier honeypots load from an optional decnet/services/pro/ subpackage that
the registry auto-discovers when present; the Community build omits it, so the
directory's absence IS the entitlement gate (no runtime licence check). Recurse
subclasses so a pro service may extend a community one. Exclude pro from the
community wheel and git-ignore the path (it lives in the private
decnet-professional repo).

Add LICENSING.md documenting the dual-license: AGPL-3.0-or-later core plus a
commercial EULA for the Professional tier.
2026-06-17 13:22:35 -04:00
09b6a832ee docs(readme): add logo banner + badges to header 2026-06-16 19:13:17 -04:00
e95acbd4f2 Merge dev into main for v1.0.0 release 2026-06-16 19:04:04 -04:00
36353026a6 release: bump to v1.0.0; add PyPI metadata; stop globbing decnet_web into packages
Some checks failed
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 15s
CI / Dependency audit (pip-audit) (push) Failing after 22s
CI / SAST (bandit) (push) Successful in 30s
CI / Merge dev → testing (push) Has been skipped
v1.0.0
2026-06-16 18:59:48 -04:00
a5cb20051f test(mazenet): add useMazeApi tests 2026-06-16 18:56:13 -04:00
a9c3f42ef9 feat(mazenet): topology editor updates; refine mutator ops materialisation 2026-06-16 18:55:20 -04:00
c9e4bf4022 feat(clustering): link identities by keystroke-rhythm proximity
Campaign clusterer gains a keystroke edge: when two identities'
kd_digraph_simhash centroids are within KD_HAMMING_MAX bits, a graded
weight (1.0 at identical, fading to 0 at the cutoff) feeds the campaign
graph. Supporting tier (0.6) — a typing match plus temporal overlap
reaches threshold, but typing alone never merges (FP guard against
coarse, noisy terminal timing).

Projects the column through IdentityFeatures + from_identity_row.
2026-06-16 17:09:42 -04:00
869d1eabb7 feat(clustering): roll session digraph SimHashes into identity centroid
The identity clusterer folds an identity's per-session
motor.digraph_simhash observations into one 8-byte bitwise-majority
centroid (denoises per-session jitter) and writes it to
AttackerIdentity.kd_digraph_simhash via update_identity_fingerprints —
the orphaned column is now populated. list_identities_for_clustering
projects it so the campaign clusterer can read it.

Extends the repo abstract + DummyRepo stub/coverage.
2026-06-16 17:05:34 -04:00
66c73ce59d feat(profiler): extract motor.digraph_simhash keystroke biometric
Per-session 64-bit SimHash of inter-keystroke digraph flight times:
walk single-char input events, accumulate flight time per (c1,c2),
bucket the median, Charikar-SimHash the bucketed pairs. Locality-
sensitive so the same typist is Hamming-close across sessions; pastes
and think-pauses break the chain; silent below the sample-size floor.

New shared decnet/util/simhash.py (simhash64/hamming64/bytes helpers).
Registered as a conditional Tier-A primitive (count 37->38); requires
behave-shell>=0.1.2.
2026-06-16 16:59:57 -04:00
372375194c refactor(db): run Alembic at boot, retire ad-hoc _migrate_* helpers
initialize() now delegates to _apply_schema(): real boots run
'alembic upgrade head' (schema owned by the migration history); tests
(DECNET_TESTING=1) keep create_all, which is faster and needs no upgrade
path. MySQL wraps the upgrade in the existing GET_LOCK advisory lock so
concurrent uvicorn workers don't race on DDL.

Deletes the three _migrate_* crimes (attackers-table legacy drop +
GeoIP backfill, TEXT->MEDIUMTEXT widening) — all now handled by the
baseline migration and the _BIG_TEXT model variants. Drops the test
file that only exercised the deleted helpers; adds tests pinning the
alembic-vs-create_all gate and guarding that every model table is in
the migration head.
2026-06-16 16:31:10 -04:00
ef4d67cbef build(db): add Alembic scaffolding + baseline migration
Introduce Alembic at v1. Migrations live inside the package
(decnet/web/db/migrations) so they ship with installs; alembic.ini at the
repo root drives the CLI. env.py is async and dual-backend, selecting the
engine from DECNET_DB_TYPE (mirroring db/factory.py) and reusing the app's
own connection when run programmatically.

The baseline captures all 39 tables. _BIG_TEXT round-trips as
Text().with_variant(MEDIUMTEXT, 'mysql'), so both backends get the right
column type from the migration. kd_digraph_simhash gains a sqlite BLOB
variant: BINARY(8) reflects as NUMERIC on SQLite and would otherwise trip
'alembic check' forever.
2026-06-16 16:30:29 -04:00
4f141c1a54 feat(web): stage live MazeNET edits behind an UPDATE button
Live topology edits fired one mutation per canvas action. That coupled
each edit to an immediate enqueue+apply, which (post-serialization)
raced the SSE refetch and duplicated optimistic placeholders, and gave
the user no chance to assemble a coherent changeset (add a net AND
bridge it) before any of it landed.

Live edits now STAGE: each editor primitive records its op and returns
immediately; the optimistic placeholders callers already draw are the
staged preview. The action button reads UPDATE (n) when live (DEPLOY
when pending) and flushes the batch through the slice-1 submit queue —
sequential, version-cursored, each awaited to a terminal state, stopping
loudly on the first failure with the unapplied remainder kept for retry.
REFRESH becomes DISCARD (n) to drop the batch. SSE refetch is paused
during a commit so per-mutation applied events don't wipe still-staged
placeholders mid-batch; one refetch reconciles at the end.

Also fix _dropArchetype, which bailed without an optimistic node on the
staged path, leaving a decky added to an uncommitted LAN invisible until
UPDATE.
2026-06-16 12:59:57 -04:00
f18bfee746 fix(web): serialize live topology mutations + surface failures loudly
Live MazeNET edits fired their mutations fire-and-forget: each canvas
action enqueued immediately and never awaited the result. Two failures
followed from that:

- expected_version is bumped at ENQUEUE (not at apply), so two ops fired
  back-to-back raced — the second carried a stale version and 409'd.
  Edits only worked when hand-paced (an SSE refetch landed between them).
- A failed mutation degrades the topology, but the only signal was a 4s
  toast, so the user saw DEGRADED with no cause.

useTopologyEditor now routes every live op through a serialized submit
queue: one enqueue in flight at a time (submission order preserved), an
optimistic expected_version cursor advanced per enqueue so back-to-back
ops (e.g. reparent's detach+attach) don't need a refetch between them,
and each mutation awaited to a terminal state. A 'failed' row throws
MutationFailedError, which the page pins as a persistent UPDATE FAILED
banner instead of a vanishing toast.

Slice 1 of the live-edit rework; stage+UPDATE-button batching and louder
backend materialisation reporting to follow.
2026-06-16 12:46:09 -04:00
5505de782f feat(web): wire local-decky teardown to DELETE /deckies/{name}
The Fleet UI only showed TEARDOWN for swarm-pinned deckies (POST
/swarm/hosts/{uuid}/teardown). Local deckies had no delete control though
the API now exposes DELETE /deckies/{name}.

teardown() branches on swarm vs local; the card's two-step arm/CONFIRM
button renders for any admin, keyed td:${host_uuid ?? 'local'}:${name}.
2026-06-16 12:15:48 -04:00