DECNET

Author	SHA1	Message	Date
anti	6725197d58	test(web): transcripts API + attacker-transcripts router coverage Paging, truncation surfacing, admin gate, path traversal, sid-regex and decky-mismatch rejection for /transcripts; mirror coverage for /attackers/{uuid}/transcripts. Flips the Session Recording box in the roadmap (sessrec pty relay now shipping end-to-end).	2026-04-21 23:11:40 -04:00
anti	6e522c5a55	feat(web): transcripts API + repository lookups Adds get_attacker_transcripts (mirror of artifacts for session_recorded logs) and get_session_log for sid→shard resolution. New /api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events out of the shared JSONL day-shard via an mtime-keyed byte-offset index — never scans the whole shard per request. New /api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both endpoints admin-gated.	2026-04-21 23:06:39 -04:00
anti	8f25ff677f	feat(engine,api): add orphan topology resource reaper Topology rows deleted without a proper teardown leave Docker containers and bridge networks behind, holding IPAM pools that cause 403 "Pool overlaps" on the next deploy at the same subnet. - engine/reaper.py walks the local Docker daemon, extracts the 8-char topology prefix from every decnet_t_* resource, and force-removes containers + networks whose prefix is not in the repo. - POST /api/v1/topologies/reap-orphans (admin-only) returns a report of live/orphan prefixes and what was removed. - Resources belonging to live topologies are never touched; per-resource errors are captured without aborting the sweep.	2026-04-21 22:13:44 -04:00
anti	c266d1b6e3	feat(mutator,web): add_decky op — create-and-attach in one mutation apply_attach_decky requires an existing decky, so the MazeNET editor had no way to grow a live topology: creating a new decky on active topologies 409'd on the direct-CRUD createDecky call. - Backend: new apply_add_decky that creates the decky row + its home-LAN edge atomically, auto-allocating an IP if none pinned. Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS Literal + CLI help text. - Tests: 3 new ops tests (happy path, duplicate-name rejection, missing-LAN rejection) plus dispatch coverage update. - Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending routes through createDecky + attachEdge as before; active routes through a single add_decky enqueue. MazeNET.tsx drag-archetype, duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the composite so active topologies stop 409'ing on new-decky drops.	2026-04-21 20:13:39 -04:00
anti	cbb394a160	feat(ingester): publish system.log per committed batch (DEBT-031 worker 6) Ingester connects the bus at startup, emits a batch-committed summary (component/flushed/position) after each successful _flush_batch. Zero- row flushes are suppressed so the topic stays meaningful. Complements the collector's per-line system.log publishes: collector signals ingress, ingester signals DB-persisted progress. Federation forwarder (worker 8) will subscribe to the batch-committed leaf to trigger its upstream push. Bus stays optional: publish_safely swallows failures, get_bus() can return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully functional.	2026-04-21 16:58:49 -04:00
anti	f611e7363b	feat(mutator,web): live topology mutation pipeline backend (DEBT-030) Wire the mutator and web API into the service bus so live-topology edits flow sub-second from enqueue to UI: - Mutator publishes every state transition on the bus (mutation.applying /applied/failed + topology.status). Fire-and-forget; DB stays source of truth. - Mutator watch loop subscribes to topology.*.mutation.enqueued and wakes early via asyncio.Event — the 10s poll becomes a fallback heartbeat, not the primary dispatch trigger. - POST /topologies/{id}/mutations publishes mutation.enqueued after the DB write succeeds. - New GET /topologies/{id}/events SSE route: snapshot on connect (status + in-flight mutations), live forwards topology.{id}.> bus events, 15s keepalive. ?token= auth mirrors /stream. - New decnet/bus/app.py — process-wide lazy bus singleton for the API, closed cleanly on lifespan shutdown.	2026-04-21 14:38:25 -04:00
anti	071312fc0c	feat(web/api): expose archetype catalog endpoint /api/v1/topologies/archetypes returns the archetype registry (slug, display name, description, preferred services/distros, nmap_os fingerprint) so the frontend wizard can render a live catalog instead of hardcoding a copy.	2026-04-21 10:24:01 -04:00
anti	542637c0dc	feat(web/api): support PATCH on proxy and CORS The web bundle proxy handled GET/POST/PUT/DELETE but not PATCH or preflight OPTIONS, which broke browser calls to PATCH endpoints behind the static-bundle server. CORS middleware had the same gap.	2026-04-21 10:23:55 -04:00
anti	12e18b75db	feat(swarm): expose needs_resync on TopologySummary + upsert record_error Two small observability follow-ups to the phase-1 agent/topology wiring: TopologySummary now carries needs_resync so operators can see the heartbeat's resync flag via the topology list/detail API without dropping into the DB. TopologyStore.record_error becomes an upsert — when a docker/compose failure fires during the first materialise (put() never reached), we still land a marker row so GET /topology/state surfaces the error and the next heartbeat carries an empty applied_version_hash. That empty hash is what master's heartbeat check relies on to flag the topology for resync instead of assuming the apply succeeded.	2026-04-21 01:41:30 -04:00
anti	e8f9c955b3	feat(swarm): heartbeat-driven topology resync for agent-pinned deployments Agent heartbeats now carry an applied-topology snapshot. The master heartbeat handler compares the reported version_hash against what canonical_hash yields for the hydrated topology pinned to that host and flags Topology.needs_resync on divergence (or when the agent reports no topology at all while master expects one). The mutator watch loop gains reconcile_agent_resyncs, which re-pushes the current hydrated blob via AgentClient.apply_topology without touching status, then clears the flag on success. Push failures leave the flag set so the next tick retries.	2026-04-21 01:35:12 -04:00
anti	5a0cf5d7c8	feat(topology): add target_host_uuid to pin topologies to swarm agents Adds the `target_host_uuid` FK on `Topology` plus wiring through the two create endpoints (`POST /topologies`, `POST /topologies/blank`). Validates the mode/host pair: `mode='agent'` now requires a known, routable host; `mode='unihost'` must leave the field unset. Surfaced on `TopologySummary` so list/detail responses expose it. Purely additive at the schema level — existing unihost flows unchanged (field defaults to `NULL`). Step 1 of the agent <-> topology integration.	2026-04-21 01:19:45 -04:00
anti	b261e8e5fa	feat(topology): add teardown endpoint + UI button Active/degraded/failed/deploying topologies cannot be deleted without first transitioning to torn_down, but the UI had no way to trigger that. Add POST /topologies/{id}/teardown mirroring the deploy endpoint (background task, 202 Accepted), and a click-to-arm TEARDOWN button on the topology list card that shows whenever the row is in a teardown-eligible state.	2026-04-20 23:41:37 -04:00
anti	be4e1b1891	feat(mazenet): auto-bridge new LANs to the DMZ gateway When a non-DMZ LAN is created via POST /lans, look up the topology's gateway (decky with forwards_l3=True attached to the DMZ) and insert an edge binding it to the new LAN. The gateway becomes multi-homed to every internal LAN automatically, so DMZ_ORPHAN cannot arise from ordinary editor use. Also fixes delete_lan: the home-decky guard used scalar_one_or_none, which blew up when the gateway already had >1 'other' LAN edge. Switch to scalars().first() — we only need to know some other edge exists, not a unique one.	2026-04-20 23:07:19 -04:00
anti	cc9765e54e	fix(mazenet): drop fictional host-mode on DMZ gateway stub POST /topologies/blank seeded the gateway decky with archetype=host-gateway + network_mode=host, but neither was wired: no compose writer reads network_mode and host-gateway is not a real archetype. Replace with archetype=deaddeck + forwards_l3=true so the gateway is a normal multi-homed bridge decky, consistent with how compose.py interprets forwards_l3 (sysctl + NET_ADMIN). Edge marked is_bridge=true, forwards_l3=true so downstream readers (generator, compose, validator) see a real bridge attachment.	2026-04-20 23:06:54 -04:00
anti	d06b04221f	feat(api/topology): live mutation queue endpoints (POST/GET /mutations)	2026-04-20 19:38:55 -04:00
anti	ff0b2efbb0	feat(api/topology): pending-only child CRUD for LANs, deckies, edges	2026-04-20 19:37:16 -04:00
anti	999113e3c3	feat(api/topology): POST/DELETE/deploy endpoints for MazeNET topologies	2026-04-20 19:34:35 -04:00
anti	38db76dd14	fix(api): document 400 on topology read endpoints for schemathesis contract DECNET's app-level RequestValidationError handler remaps structural 422→400, including query/path constraint violations (limit bounds, the next-subnet base pattern, etc.). Schemathesis fuzzing will drive those code paths and fail response_schema_conformance unless 400 is declared in responses={}. Adds the entry to every phase-3 read route.	2026-04-20 18:30:32 -04:00
anti	f182c98ffa	feat(api): phase 3 step 2 — topology read endpoints (list/get/status/catalog) GET /api/v1/topologies — paginated list with status filter. Extends repo.list_topologies() to accept limit/offset and adds count_topologies() for the total envelope field. GET /api/v1/topologies/{id} — hydrated TopologyDetail; 404 if missing. GET /api/v1/topologies/{id}/status-events — audit trail, limit-capped. Catalog helpers for the phase-4 canvas UI: * GET /topologies/services — full service catalog. * GET /topologies/next-subnet?base=172.20 — wraps SubnetAllocator against reserved_subnets across non-torn-down topologies. * GET /topologies/{id}/lans/{lan_id}/next-ip — IPAllocator pre-seeded with existing decky IPs in that LAN. All read routes are viewer-or-admin. Sub-routers are included in an order that keeps literal catalog paths (/services, /next-subnet) from being shadowed by the /{topology_id} trie branch.	2026-04-20 18:25:33 -04:00
anti	2379b2aeda	feat(api): phase 3 step 1 — topology request/response models + router skeleton Add Pydantic DTOs in decnet/web/db/models.py covering every phase-3 endpoint shape: TopologyGenerateRequest, TopologySummary/Detail, child create/update requests, MutationEnqueueRequest (Literal op guard), MutationRow with JSON-payload decoder, validation/version/not-editable error envelopes, and the three catalog responses. Create decnet/web/router/topology/ as an import-safe package exporting topology_router (prefix /topologies) — sub-routers land step-by-step in subsequent commits. Mount under the main api router alongside swarm_mgmt. tests/api/topology/test_models.py pins repo-dict ↔ DTO parity so future repo-row drift breaks the contract test before the endpoints.	2026-04-20 18:16:30 -04:00
anti	a76b9ecdf9	feat(mazenet): step 7 — topology_mutations queue + mutator reconciler Adds the live-mutation pipeline for active/degraded topologies: * TopologyMutation table with composite index (state, topology_id) so the watch-loop guard query stays O(log n). * claim_next_mutation is a single atomic UPDATE ... WHERE state='pending' so racing reconcilers deterministically pick one winner; losers see rowcount=0 and skip. * reconcile_topologies drains pending rows per live topology, applies via decnet.mutator.ops.dispatch, and on failure marks the mutation failed + transitions topology to degraded. * run_watch_loop gains a gated branch: flat-fleet mutate_all runs every tick unchanged; the reconciler only enters when the cheap has_pending_topology_mutation guard returns True. * apply_* ops re-check hard invariants (names, IP collisions, subnet overlap, known services, service_config shape) after every mutation so the repo never lands in an invalid state. * CLI: 'decnet topology mutate' / 'mutations' subcommands.	2026-04-20 18:02:37 -04:00
anti	91df57d36b	feat(topology): pending-only mutation repo methods with cascade + guards MazeNET phase 2 step 6. Equips the repo layer with the CRUD the web editor needs before deploy. - TopologyNotEditable exception: raised when a pending-only method hits a non-pending topology. The intent is "free-form edits stop at deploy; the mutator (step 7) takes over for live topologies." - _assert_pending helper checks status inside the session. - update_lan / update_topology_decky accept enforce_pending=True for pre-deploy callers (existing internal callers default to False so behavior is unchanged). - delete_lan: cascades edges; refuses if any decky has only one edge (= this LAN is its home) to prevent orphans. - delete_topology_decky: cascades edges. - delete_topology_edge: bare-bones removal. All four mutators accept expected_version for optimistic concurrency. Existing tests continue to pass (no behavior change for persist/deploy).	2026-04-20 17:50:29 -04:00
anti	9afaac7612	feat(topology): nullable layout coords on LAN + TopologyDecky MazeNET phase 2 step 5. Pure storage — the generator emits None for x/y and the web canvas fills them in later. No logic changes; no compose, deploy, or validator impact.	2026-04-20 17:48:29 -04:00
anti	e475c0957e	feat(topology): optimistic concurrency via Topology.version + expected_version MazeNET phase 2 step 4. Readies the repo layer for concurrent editors (web canvas + CLI + mutator) without lost-write races. - Topology.version: monotonically bumped on supervised child-row writes. - VersionConflict exception carries {current, expected} for the UI. - _check_and_bump_version helper reads Topology in the same session, compares against expected_version, raises on mismatch, bumps on match. Commit happens in the caller's existing transaction so check+bump+write are atomic per mutation. - add_lan / update_lan / add_topology_decky / update_topology_decky / add_topology_edge accept expected_version=None by default, preserving every existing caller's behavior. When expected_version is None, no check runs and version stays put — internal callers (persist) that don't care about concurrency keep working unchanged.	2026-04-20 17:47:28 -04:00
anti	47cd200e1d	feat(mazenet): repo methods for topology/LAN/decky/edge/status events Adds topology CRUD to BaseRepository (NotImplementedError defaults) and implements them in SQLModelRepository: create/get/list/delete topologies, add/update/list LANs and TopologyDeckies, add/list edges, plus an atomic update_topology_status that appends a TopologyStatusEvent in the same transaction. Cascade delete sweeps children before the topology row. Covered by tests/topology/test_repo.py (roundtrip, per-topology name uniqueness, status event log, cascade delete, status filter) and an extension to tests/test_base_repo.py for the NotImplementedError surface.	2026-04-20 16:43:49 -04:00
anti	096a35b24a	feat(mazenet): add topology schema to models.py Introduces five new SQLModel tables for MazeNET (nested deception topologies): Topology, LAN, TopologyDecky, TopologyEdge, and TopologyStatusEvent. DeckyShard is intentionally not touched — TopologyDecky is a purpose-built sibling for MazeNET's lifecycle (topology-scoped UUIDs, per-topology name uniqueness). Part of MazeNET v1 (nested self-container network-of-networks).	2026-04-20 16:40:10 -04:00
anti	8a2876fe86	fix(api): document missing HTTP status codes on router endpoints All checks were successful CI / Lint (ruff) (push) Successful in 16s Details CI / SAST (bandit) (push) Successful in 18s Details CI / Dependency audit (pip-audit) (push) Successful in 26s Details CI / Test (Standard) (3.11) (push) Successful in 2m41s Details CI / Test (Live) (3.11) (push) Successful in 1m6s Details CI / Test (Fuzz) (3.11) (push) Successful in 1h9m14s Details CI / Finalize Merge to Main (push) Has been skipped Details CI / Merge dev → testing (push) Successful in 12s Details CI / Prepare Merge to Main (push) Has been skipped Details Schemathesis was failing CI on routes that returned status codes not declared in their OpenAPI responses= dicts. Adds the missing codes across swarm_updates, swarm_mgmt, swarm, fleet and attackers routers. Also adds 400 to every POST/PUT/PATCH that accepts a JSON body — Starlette returns 400 on malformed/non-UTF8 bodies before FastAPI's 422 validation runs, which schemathesis fuzzing trips every time. No handler logic changed.	2026-04-20 15:25:02 -04:00
anti	af9d59d3ee	fixed(api): documentation	2026-04-20 13:20:42 -04:00
anti	2febd921bc	fix(models): added lenght validation to the common name, which per RFC 5280 must be max =< 64	2026-04-20 01:26:07 -04:00
anti	5b70a34c94	fix(routes): added undocumented responses	2026-04-20 01:23:07 -04:00
anti	d1b7e94325	fix(swarm): inject peer cert into ASGI scope for uvicorn <= 0.44 Uvicorn's h11/httptools HTTP protocols don't populate scope['extensions']['tls'], so /swarm/heartbeat's per-request cert pinning was 403ing every call despite CERT_REQUIRED validating the cert at handshake. Patch RequestResponseCycle.__init__ on both protocol modules to read the peer cert off the asyncio transport and write DER bytes into scope['extensions']['tls']['client_cert_chain']. Importing the module from swarm_api.py auto-installs the patch in the swarmctl uvicorn worker before any request is served.	2026-04-19 22:09:11 -04:00
anti	e411063075	feat(swarm): ship host_uuid + swarmctl-port in agent enroll bundle The rendered /etc/decnet/decnet.ini now carries host-uuid and swarmctl-port in [agent], which config_ini seeds into DECNET_HOST_UUID and DECNET_SWARMCTL_PORT. Gives the worker a stable self-identity for the heartbeat loop — the INI never has to be rewritten because cert pinning is the real gate (a rotated UUID with a matching CA-signed cert would still be blocked by SHA-256 fingerprint mismatch against the stored SwarmHost row). Also adds DECNET_MASTER_HOST so the agent can find the swarmctl URL via the INI's existing master-host key.	2026-04-19 21:44:23 -04:00
anti	148e51011c	feat(swarm): agent→master heartbeat with per-host cert pinning New POST /swarm/heartbeat on the swarm controller. Workers post every ~30s with the output of executor.status(); the master bumps SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh DeckyConfig snapshot and runtime-derived state (running/degraded). Security: CA-signed mTLS alone is not sufficient — a decommissioned worker's still-valid cert could resurrect ghost shards. The endpoint extracts the presented peer cert (primary: scope["extensions"]["tls"], fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it to the SwarmHost.client_cert_fingerprint stored for the claimed host_uuid. Extraction is factored into _extract_peer_fingerprint so tests can exercise both uvicorn scope shapes and the both-unavailable fail-closed path without mocking uvicorn's TLS pipeline. Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel impl reuses the indexed client_cert_fingerprint column).	2026-04-19 21:37:15 -04:00
anti	3ebd206bca	feat(swarm): persist DeckyConfig snapshot per shard + enrich list API Dispatch now writes the full serialised DeckyConfig into DeckyShard.decky_config (plus decky_ip as a cheap extract), so the master can render the same rich per-decky card the local-fleet view uses — hostname, distro, archetype, service_config, mutate_interval, last_mutated — without round-tripping to the worker on every page render. DeckyShardView gains the corresponding fields; the repository flattens the snapshot at read time. Pre-migration rows keep working (fields fall through as None/defaults). Columns are additive + nullable so SQLModel.metadata.create_all handles the change on both SQLite and MySQL. Backfill happens organically on the next dispatch or (in a follow-up) agent heartbeat.	2026-04-19 21:29:45 -04:00
anti	14250cacad	feat(swarm): self-destruct agent on decommission Decommissioning a worker from the dashboard (or swarm controller) now asks the agent to wipe its own install before the master forgets it. The agent stops decky containers + every decnet-* systemd unit, then deletes /opt/decnet, /etc/systemd/system/decnet-, /var/lib/decnet/, and /usr/local/bin/decnet. Logs under /var/log are preserved. The reaper runs as a detached /tmp script (start_new_session=True) so it survives the agent process being killed. Self-destruct dispatch is best-effort — a dead worker doesn't block master-side cleanup.	2026-04-19 20:47:09 -04:00
anti	9d68bb45c7	feat(web): async teardowns — 202 + background task, UI allows parallel queue Teardowns were synchronous all the way through: POST blocked on the worker's docker-compose-down cycle (seconds to minutes), the frontend locked tearingDown to a single string so only one button could be armed at a time, and operators couldn't queue a second teardown until the first returned. On a flaky worker that meant staring at a spinner for the whole RTT. Backend: POST /swarm/hosts/{uuid}/teardown returns 202 the instant the request is validated. Affected shards flip to state='tearing_down' synchronously before the response so the UI reflects progress immediately, then the actual AgentClient call + DB cleanup run in an asyncio.create_task (tracked in a module-level set to survive GC and to be drainable by tests). On failure the shard flips to 'teardown_failed' with the error recorded — nothing is re-raised, since there's no caller to catch it. Frontend: swap tearingDown / decommissioning from 'string \| null' to 'Set<string>'. Each button tracks its own in-flight state; the poll loop picks up the final shard state from the backend. Multiple teardowns can now be queued without blocking each other.	2026-04-19 20:30:56 -04:00
anti	07ec4bc269	fix(fleet): INI fully replaces prior decky state on redeploy Submitting an INI with a single [decky1] was silently redeploying the deckies from the previous deploy too. POST /deckies/deploy merged the new INI into the stored DecnetConfig by name, so a 1-decky INI on top of a prior 3-decky run still pushed 3 deckies to the worker. Those stale decky2/decky3 kept their old IPs, collided on the parent NIC, and the agent failed with 'Address already in use' — the deploy the operator never asked for. The INI is the source of truth for which deckies exist this deploy. Full replace: config.deckies = list(new_decky_configs). Operators who want to add more deckies should list them all in the INI. Update the deploy-limit test to reflect the new replace semantics, and add a regression test asserting prior state is dropped.	2026-04-19 20:24:29 -04:00
anti	df18cb44cc	fix(swarm): don't paint healthy deckies as failed when a shard-sibling fails docker compose up is partial-success-friendly — a build failure on one service doesn't roll back the others. But the master was catching the agent's 500 and tagging every decky in the shard as 'failed' with the same error message. From the UI that looked like all three deckies died even though two were live on the worker. On dispatch exception, probe the agent's /status to learn which deckies actually have running containers, and upsert per-decky state accordingly. Only fall back to marking the whole shard failed if the status probe itself is unreachable. Enhance agent.executor.status() to include a 'runtime' map keyed by decky name with per-service container state, so the master has something concrete to consult.	2026-04-19 20:11:08 -04:00
anti	e8e11b2896	feat(web-ui): show decky IP on SwarmDeckies, drop compose-hash column Operators want to know what address to poke when triaging a swarm decky; the compose-hash column was debug scaffolding that never paid off. DeckyShard has no IP column (the deploy-time IP lives on DecnetConfig), so the list endpoint resolves it at read time by joining shards against the stored deployment state by decky_name. Missing lookups render as "—" rather than erroring — the list stays useful even after a master restart that hasn't persisted a config yet.	2026-04-19 19:48:27 -04:00
anti	5dad1bb315	feat(swarm): remote teardown API + UI (per-decky and per-host) Agents already exposed POST /teardown; the master was missing the plumbing to reach it. Add: - POST /api/v1/swarm/hosts/{uuid}/teardown — admin-gated. Body {decky_id: str\|null}: null tears the whole host, a value tears one decky. On worker failure the master returns 502 and leaves DB shards intact so master and agent stay aligned. - BaseRepository.delete_decky_shard(name) + sqlmodel impl for per-decky cleanup after a single-decky teardown. - SwarmHosts page: "Teardown all" button (keeps host enrolled). - SwarmDeckies page: per-row "Teardown" button. Also exclude setuptools' build/ staging dir from the enrollment tarball — `pip install -e` on the master generates build/lib/decnet_web/node_modules and the bundle walker was leaking it to agents. Align pyproject's bandit exclude with the git-hook invocation so both skip decnet/templates/.	2026-04-19 19:39:28 -04:00
anti	2bef3edb72	feat(swarm): unbundle master-only code from agent tarball + sync systemd units on update Agents now ship with collector/prober/sniffer as systemd services; mutator, profiler, web, and API stay master-only (profiler rebuilds attacker profiles against the master DB — no per-host DB exists). Expand _EXCLUDES to drop the full decnet/web, decnet/mutator, decnet/profiler, and decnet_web trees from the enrollment bundle. Updater now calls _heal_path_symlink + _sync_systemd_units after rotation so fleets pick up new unit files and /usr/local/bin/decnet tracks the shared venv without a manual reinstall. daemon-reload runs once per update when any unit changed. Fix _service_registry matchers to accept systemd-style /usr/local/bin/decnet cmdlines (psutil returns a list — join to string before substring-checking) so agent-mode `decnet status` reports collector/prober/sniffer correctly.	2026-04-19 19:19:17 -04:00
anti	6d7877c679	feat(swarm): per-host microservices as systemd units, mutator off agents Previously `decnet status` on an agent showed every microservice as DOWN because deploy's auto-spawn was unihost-scoped and the agent CLI gate hid the per-host commands. Now: - collect, probe, profiler, sniffer drop out of MASTER_ONLY_COMMANDS (they run per-host; master-side work stays master-gated). - mutate stays master-only (it orchestrates swarm-wide respawns). - decnet/mutator/ excluded from agent tarballs — never invoked there. - decnet/web exclusion tightened: ship db/ + auth.py + dependencies.py (profiler needs the repo singleton), drop api.py, swarm_api.py, ingester.py, router/, templates/. - Four new systemd unit templates (decnet-collector/prober/profiler/ sniffer) shipped in every enrollment tarball. - enroll_bootstrap.sh enables + starts all four alongside agent and forwarder at install time. - updater restarts the aux units on code push so they pick up the new release (best-effort — legacy enrollments without the units won't fail the update). - status table hides Mutator + API rows in agent mode.	2026-04-19 18:58:48 -04:00
anti	ee9ade4cd5	feat(enroll): strip master API and frontend from agent tarball Agents never run the FastAPI master app (decnet/web/) or serve the React frontend (decnet_web/) — they run decnet.agent, decnet.updater, and decnet.forwarder, none of which import decnet.web. Shipping the master tree bloats every enrollment payload and needlessly widens the worker's attack surface. Excluded paths are unreachable on the worker (all cli.py imports of decnet.web are inside master-only command bodies that the agent-mode gate strips). Tests assert neither tree leaks into the tarball.	2026-04-19 18:47:03 -04:00
anti	dad29249de	fix(updater): align bootstrap layout with updater; log update phases The bootstrap was installing into /opt/decnet/.venv with an editable `pip install -e .`, and /usr/local/bin/decnet pointed there. The updater writes releases to /opt/decnet/releases/active/ with a shared venv at /opt/decnet/venv — a parallel tree nothing on the box actually runs. Result: updates appeared to succeed (release dir rotated, SHA changed) but systemd kept executing the untouched bootstrap code. Changes: - Bootstrap now installs directly into /opt/decnet/releases/active with the shared venv at /opt/decnet/venv and /opt/decnet/current symlinked. Same layout the updater rotates in and out of. - /usr/local/bin/decnet -> /opt/decnet/venv/bin/decnet. - run_update / run_update_self heal /usr/local/bin/decnet on every push so already-enrolled hosts recover on the next update instead of needing a re-enroll. - run_update / run_update_self now log each phase (receive, extract, pip install, rotate, restart, probe) so the updater log actually shows what happened.	2026-04-19 18:39:11 -04:00
anti	a0a241f65d	feat(enroll): decnet-updater now runs under systemd, not a --daemon fork Bootstrap used to end with `decnet updater --daemon` which forks and detaches — invisible to systemctl, no auto-restart, dies on reboot. Ships a decnet-updater.service template matching the pattern of the other units (Restart=on-failure, log to /var/log/decnet/decnet.updater.log, certs from /etc/decnet/updater, install tree at /opt/decnet), bundles it alongside agent/forwarder/engine units, and the installer now `systemctl enable --now`s it when --with-updater is set.	2026-04-19 18:19:24 -04:00
anti	5df995fda1	feat(enroll): opt-in IPvlan per-agent for Wi-Fi-bridged VMs Wi-Fi APs bind one MAC per associated station, so VirtualBox/VMware guests bridged over Wi-Fi rotate the VM's DHCP lease when Docker's macvlan starts emitting container-MAC frames through the vNIC. Adds a `use_ipvlan` toggle on the Agent Enrollment tab (mirrors the updater daemon checkbox): flips the flag on SwarmHost, bakes `ipvlan=true` into the agent's decnet.ini, and `_worker_config` forces ipvlan=True on the per-host shard at dispatch. Safe no-op on wired/bare-metal agents.	2026-04-19 17:57:45 -04:00
anti	6d7567b6bb	fix(fleet): reset stale host_uuid on carried-over deckies before dispatch Deckies merged in from a prior deployment's saved state kept their original host_uuid — which dispatch_decnet_config then 404'd on if that host had since been decommissioned or re-enrolled at a different uuid. Before round-robin assignment, drop any host_uuid that isn't in the live swarm_hosts set so orphaned entries get reassigned instead of exploding with 'unknown host_uuid'.	2026-04-19 06:27:34 -04:00
anti	dbaccde143	fix(swarm-updates): offload tarball build to worker thread tar_working_tree (walks repo + gzips several MB) and detect_git_sha (shells out) were called directly on the event loop, so /swarm-updates/push and /swarm-updates/push-self froze every other request until the tarball was ready. Wrap both in asyncio.to_thread.	2026-04-19 06:21:27 -04:00
anti	b883f24ba2	fix(engine): pin docker compose project name to avoid empty-basename failure systemd daemons run with WorkingDirectory=/ by default; docker compose derives the project name from basename(cwd), which is empty at '/', and aborts with 'project name must not be empty'. Pass -p decnet explicitly so the project name is independent of cwd, and set WorkingDirectory=/opt/decnet on the three DECNET units so compose artifacts (decnet-compose.yml, build contexts) also land in the install dir.	2026-04-19 06:17:30 -04:00
anti	79db999030	feat(fleet): auto-swarm deploy — shard across enrolled workers when master POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence: when the caller is a master with at least one reachable swarm host, round- robin host_uuids are assigned over new deckies and the config is dispatched via AgentClient. Falls back to local docker-compose otherwise. Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config so both endpoints share the same shard/dispatch/persist path. Adds GET /system/deployment-mode for the UI to show 'will shard across N hosts' vs 'will deploy locally' before the operator clicks deploy.	2026-04-19 06:09:08 -04:00

1 2 3 4

168 Commits