DECNET

Author	SHA1	Message	Date
anti	f2b3393669	chore: relicense to AGPL-3.0-or-later and add SPDX headers Replaces LICENSE (GPLv3 -> AGPLv3) and prepends `SPDX-License-Identifier: AGPL-3.0-or-later` to every source file across decnet/, decnet_web/, tests/, scripts/, and tools/. Rationale: closes the GPLv3 ASP loophole so any party operating a modified DECNET as a network service must offer their modified source. Personal copyright (Samuel Paschuan) + inbound=outbound contributions make a future unilateral relicense infeasible. - LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt) - COPYRIGHT: project copyright notice - tools/add_spdx_headers.py: idempotent header injector (shebang- and PEP 263-aware) Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh). No behavior change; comments only.	2026-05-22 21:04:16 -04:00
anti	4743c8f733	feat(api): /deckies/deploy and /mutate become 202 fire-and-forget This is the unblock for the wizard hang. Both endpoints used to run docker compose synchronously inside the HTTP handler -- on master (unihost) or via asyncio.gather of worker /deploy POSTs at 600s timeout each (swarm) -- blocking every other API request. New flow: 1. Commit the new config shape to repo state (fast). 2. Create one DeckyLifecycle row per decky (status=pending). 3. Spawn asyncio.create_task(run_deploy / run_mutate) -- the lifecycle runner drives rows through running -> succeeded\|failed and emits decky.<name>.lifecycle on the bus. 4. Return 202 with {lifecycle_ids: [...]}. Wizard polls GET /deckies/lifecycle?ids=... (next commit). mutator/engine.py gains pick_new_services() -- shared between the async API path and the watch-loop's synchronous mutate_decky(). DeployResponse grows lifecycle_ids[]. The old dispatch_decnet_config helper still exists for the CLI swarm-deploy command path; it just isn't called from the API handler anymore. Test changes: 200 -> 202, drop dispatch_decnet_config mocks (handler no longer calls it), assert lifecycle_ids in response + committed state matches expectations.	2026-05-22 16:40:55 -04:00
anti	ade8bbe30a	feat(agent): real worker-side /mutate with master swarm dispatch - Implement /mutate handler: load_state, update services + last_mutated, save_state, write_compose, compose up -d via asyncio.to_thread. 404 for missing state / unknown decky_id. dry_run short-circuits before any side effect. - Add AgentClient.mutate(decky_id, services, *, dry_run=False) using _TIMEOUT_DEPLOY (compose up can pull/build, exceeds control timeout). - mutator/engine.py: in swarm mode with decky.host_uuid set, resolve worker via _resolve_swarm_host and dispatch through AgentClient.mutate instead of writing a compose file on master. Master-resident deckies (unihost mode, or swarm with host_uuid=None) keep the local path.	2026-05-22 16:14:46 -04:00
anti	ee24a7551f	fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests	2026-05-01 02:07:53 -04:00
anti	b9684254f0	fix(types): T5 — narrow AsyncClient\|None with inline if; rename loop variable t→task to avoid no-redef	2026-05-01 01:53:10 -04:00
anti	fc1f0914b7	refactor(topology): introduce TopologyRepository protocol with DTO return types Replace repo: BaseRepository with a structural TopologyRepository protocol in persistence.py and allocator.py. All read methods now return typed DTOs (TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating silent field-shape regressions across the topology subsystem. TopologySummary gains email_personas and language_default so api_personas.py can continue reading those fields via attribute access. hydrate() converts DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable working-state function dict-based at its boundary. All production callers (router handlers, mutator, CLI, heartbeat) migrated from dict/get access to attribute access. 134 tests pass.	2026-04-30 23:51:41 -04:00
anti	d314470d7f	fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts TopologyDecky rows where state='running'. No code path was flipping that state away from the default 'pending', so the count read 0/N even when every container was running fine — the dashboard was lying. Two complementary fixes: 1. deploy_topology — after the post-deploy compose ps verification, reconcile each TopologyDecky.state from the corresponding base container's docker state. running → 'running'; anything else → 'failed'. Reuses the ps_rows already gathered for the ACTIVE-vs-DEGRADED status decision; no extra docker hit. 2. apply_add_decky — _materialise_decky_spawn now returns True/False; on True the row is updated to state='running' before _assert_valid_after. Catches the case where a decky added via the live mutator queue stays at 'pending' indefinitely (the deployer's reconcile only runs on a fresh deploy_topology pass). Existing topology deckies in active topologies will still read as 'pending' until the next deploy_topology runs, since this is forward-only. An operator-side fix is to teardown + redeploy or run the (forthcoming) reconcile-on-startup pass.	2026-04-29 11:09:32 -04:00
anti	57e527534c	fix(mutator): auto-fall-back to legacy builder when buildx wedges live decky add apply_add_decky's compose-up was hard-failing whenever the operator's ~/.docker/buildx/activity/ landed on a read-only mount — the wedge detection in _compose_with_retry correctly refuses to retry (would just leak more mounts), but for live materialisation we don't want a wedged buildx state to abort an admin's mutation. ANTI hit it on adding decky-a977: 'failed to update builder last activity time: ... read-only file system → buildx wedge detected → returned non-zero'. _compose_up_with_buildkit_fallback wraps _compose_with_retry: on a CalledProcessError whose stderr matches both wedge signatures (_BUILDX_WEDGE_SIGNATURE + _BUILDX_EROFS_SIGNATURE), it logs a warning with the manual recovery steps + retries once with DOCKER_BUILDKIT=0 set. The legacy non-buildx builder doesn't use the activity dir and isn't affected. Wired into the two paths that pass --build: * _materialise_decky_spawn (apply_add_decky) * _materialise_decky_services_diff (apply_update_decky service add) _materialise_decky_recreate_base doesn't build — it just recreates a container from an existing image — so it's not affected. Operator-facing log message points at the manual fix (rm -rf ~/.docker/buildx/activity + docker buildx create) so they can recover at their leisure; we don't ATTEMPT the recovery because the activity dir might be RO for a reason (zfs/btrfs snapshot, etc.) that an automated rm would be wrong to fight.	2026-04-29 10:59:04 -04:00
anti	892219ec87	feat(mutator): refuse forwards_l3 promotion on non-DMZ deckies apply_update_decky's flip path now refuses to promote a decky to gateway unless its home LAN is a DMZ. The compose generator publishes host ports for forwards_l3=True; a non-DMZ gateway would shadow the host's port space without anything legitimately able to reach the service. Same posture as the existing 'forwards_l3 flip on live requires force=true' guard — refused before any DB write so a bad mutation leaves zero side-effects. The check is intentionally NOT a standing _RULES invariant — the codebase uses forwards_l3 for two semantics: 1. Generic L3 forwarding (internal bridge deckies routing between their multi-home LANs). The generator writes this on internal bridges via bridge_forward_probability; legitimately non-DMZ. 2. DMZ gateway (host-port publisher). Only meaningful on DMZ. Standing validation can't enforce DMZ-homing without breaking case 1. The guard fires only on the explicit user-driven flip path where the operator's intent is unambiguously case 2. Generator output and internal-bridge attachments bypass the check. check_gateway_homed_in_dmz lives in validate.py for callers that want the explicit form (and for the test surface), but is not a standing rule — comment in _RULES explains the asymmetry.	2026-04-29 00:38:51 -04:00
anti	a27e3f5e0f	fix(tests+mutator): unbreak the docker-shadow test env + let mutator delete from active Two related fixes that came out of running the W5 tests locally: 1. tests/__init__.py — empty file, makes 'tests/' a package so pytest stops inserting it into sys.path. Without it, 'tests/docker/' (the docker-image test category) shadowed the installed docker SDK on every engine-touching test in the repo: module 'docker' has no attribute 'DockerClient' Pytest's default --import-mode=prepend was the culprit; making tests/ a package is the cheapest fix and doesn't change --import-mode for the whole tree. 2. delete_topology_decky / delete_topology_edge / delete_lan grow an 'enforce_pending: bool = True' kwarg. Default preserves the HTTP CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the 409 for free). apply_remove_decky / apply_detach_decky / apply_remove_lan now pass enforce_pending=False — the mutator queue is the live-editing surface and has its own active-topology gating; the repo's pending-only guard was for design-time CRUD that mustn't bypass it. Without this, apply_remove_decky was silently broken on active topologies pre-W5; W5's new test surfaced it on first run. 10/10 new W5 tests pass; 58/58 across mutator + topology suites.	2026-04-29 00:24:17 -04:00
anti	98c929894c	feat(mutator): selective materialisation for apply_update_decky + tests apply_update_decky now discriminates three sub-cases: * services list changed → diff old vs new and call _materialise_decky_services_diff (compose up -d for added, stop + rm -f for removed). Mirrors services_live's pattern but doesn't import it — mutator-routed mutations carry a different bus surface (mutation.applied) than the direct API path (decky.<name>.service_added). * forwards_l3 flipped → port publishing changes, which docker can only apply at container-create time. Gated on payload['force'] is true; default raises MutationError so a half-thinking operator can't stomp a live decky. When force=true, _materialise_decky_recreate_base does compose up -d --no-deps --force-recreate. Pre-checked BEFORE the DB write so a refused mutation leaves zero side-effects. * coord-only (x/y) → DB only, no docker work. Ships tests/mutator/test_ops_materialisation.py with focused coverage for every new helper: add_decky/remove_decky/attach_decky/ detach_decky/update_decky/update_lan paths against an active topology, with compose primitives + docker SDK mocked at the source modules so the helpers' lazy imports pick up the stubs. Also covers the pending-topology skip and the force-flag gating.	2026-04-29 00:18:20 -04:00
anti	e3afec4e70	feat(mutator): live network.disconnect for apply_detach_decky Symmetric to apply_attach_decky — after deleting the multi-home edge from the DB, calls the docker SDK to drop the base container's interface in the now-detached LAN. Service containers lose visibility automatically (they share the base's netns). Idempotency: 'not connected' / 'no such' APIError is logged at info and treated as success.	2026-04-29 00:15:39 -04:00
anti	f347a3a736	feat(mutator): live network.connect for apply_attach_decky After the DB writes that record the multi-home edge, calls the docker SDK directly to add an interface to the base container's netns: client.networks.get(<topology bridge>).connect(<base>, ipv4_address=ip) Non-destructive — the base keeps running, no recreate. Service containers automatically see the new interface because they share the base's netns via network_mode: service:<base>. Idempotency: docker APIError with 'already' / 'endpoint exists' is logged at info and treated as success. Other errors log + leave the DB row in place; an operator retry will hit the same path.	2026-04-29 00:15:11 -04:00
anti	eed55619cb	feat(mutator): live teardown for apply_remove_decky Captures the decky's name and services list before delete_topology_decky runs (the helper needs both as compose targets even though the DB row is gone), then calls _materialise_decky_remove which stops + rm -f's the base + per-service containers via 'docker compose stop / rm -f'. Re-renders the per-topology compose AFTER the stop/rm so a future 'compose up -d' on the file doesn't try to bring the decky back.	2026-04-29 00:14:44 -04:00
anti	8c06190e69	feat(mutator): live spawn for apply_add_decky + shared materialisation helpers Adds _materialise_decky_{spawn,remove,connect,disconnect,services_diff,recreate_base} helpers alongside the existing _materialise_lan_change. Each follows the same skip rules: bail when topology is not active/degraded, when agent-pinned, or when docker calls fail (logged, not re-raised — DB remains source of truth). apply_add_decky now calls _materialise_decky_spawn after the DB writes. The helper: * re-renders the per-topology compose so it lists the new decky; * runs 'compose up -d --no-deps --build <decky_base> <decky>-<svc>...' in a worker thread (matches engine/services_live's pattern). Service container targets are filtered through get_service() so fleet_singleton services are skipped — they don't have per-decky compose entries. Gateway (forwards_l3=True) deckies need no special-case here; the compose generator already emits the host 'ports:' block for them. Subsequent commits wire the other apply_* ops to the matching helpers. Tests for the full set ship in the workstream's last commit.	2026-04-29 00:14:18 -04:00
anti	578cdf9e2e	fix(mutator): reject hostile apply_update_lan changes on live topologies subnet and is_dmz are pinned at deploy time — live deckies bind to the bridge with IPs allocated from the old subnet, and is_dmz flips the docker network's internal flag which can't be changed while containers are attached. Today the op happily wrote the new value into the DB and left docker on the old one, drifting the two surfaces. apply_update_lan now raises MutationError when topology status is active or degraded and the patch touches subnet or is_dmz. Coord (x/y) and rename updates still pass through; renames don't currently have a live caller and the bridge's docker name keys off the lan name in the renderer, so the next deploy will reconcile. This matches the posture taken by _materialise_lan_change for live LAN add/remove (commit `472c84b`).	2026-04-29 00:12:44 -04:00
anti	472c84b9c8	fix(mutator): materialise live LAN add/remove on docker, not just the DB apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted the topology_lans row but never created or destroyed the docker bridge network. Adding a LAN to a deployed topology silently did nothing on the substrate side; any decky later attached to it had nowhere to bind. Both ops now call a shared _materialise_lan_change helper after the DB write. When the topology is active/degraded and not pinned to a swarm agent, the helper: * creates / removes the docker bridge network (internal=True for non-DMZ LANs, mirroring engine/deployer.deploy_topology), * re-renders the per-topology compose file so future redeploys reflect the change. Failures are logged, not re-raised — the DB row stays as source of truth so an operator can retry without leaking inconsistent state. Agent-pinned topologies are skipped; the next agent push reconciles. apply_add_decky / apply_attach_decky have the same gap and are not fixed here — multi-homing a running container needs careful recreate-vs-network-connect handling and is its own commit. Without those, dropping a decky into a freshly-added LAN still won't spawn a container; only the LAN itself is now live.	2026-04-29 00:00:02 -04:00
anti	862e4dbb31	merge: testing → main (reconcile 2-week divergence)	2026-04-28 18:36:00 -04:00
anti	b2e4706a14	Refactor: implemented Repository Factory and Async Mutator Engine. Decoupled storage logic and enforced Dependency Injection across CLI and Web API. Updated documentation. Some checks failed CI / Lint (ruff) (push) Successful in 12s Details CI / SAST (bandit) (push) Successful in 13s Details CI / Dependency audit (pip-audit) (push) Successful in 22s Details CI / Test (Standard) (3.11) (push) Failing after 54s Details CI / Test (Standard) (3.12) (push) Successful in 1m35s Details CI / Test (Live) (3.11) (push) Has been skipped Details CI / Test (Fuzz) (3.11) (push) Has been skipped Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-12 07:48:17 -04:00
anti	f78104e1c8	fix: resolve all ruff lint errors and SQLite UNIQUE constraint issue Ruff fixes (20 errors → 0): - F401: Remove unused imports (DeckyConfig, random_hostname, IniConfig, COMPOSE_FILE, sys, patch) across cli.py, mutator/engine.py, templates/ftp, templates/rdp, test_mysql.py, test_postgres.py - F541: Remove extraneous f-prefixes on strings with no placeholders in templates/imap, test_ftp_live, test_http_live - E741: Rename ambiguous variable 'l' to descriptive names (line, entry, part) across conftest.py, test_ftp_live, test_http_live, test_mongodb_live, test_pop3, test_ssh SQLite fix: - Change _initialize_sync() admin seeding from SELECT-then-INSERT to INSERT OR IGNORE, preventing IntegrityError when admin user already exists from a previous run	2026-04-12 02:17:50 -04:00
anti	c384a3103a	refactor: separate engine, collector, mutator, and fleet into independent subpackages - decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed - decnet/collector/ — Docker log streaming (moved from web/collector.py) - decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code) - decnet/fleet.py — shared decky-building logic extracted from cli.py Cross-contamination eliminated: - web router no longer imports from decnet.cli - mutator no longer imports from decnet.cli - cli no longer imports from decnet.web - _kill_api() moved to cli (process management, not engine concern) - _compose_with_retry duplicate removed from mutator	2026-04-12 00:26:22 -04:00

21 Commits