DECNET

Author	SHA1	Message	Date
anti	5f4005c47a	feat(tarpit): port-selective tc netem tarpit mode with live log events - GET/POST/DELETE /api/v1/deckies/{name}/tarpit (admin write, viewer GET) - get_container_veth() + get_container_pid() in network.py via iflink/ip-link - TarpitRule SQLModel table + TarpitMixin repo (upsert/get/delete/list) - Background tarpit_watcher_worker: polls /proc/{pid}/net/tcp every 15s, emits tarpit_enter/tarpit_exit log events (edge-triggered, with duration) - tarpit_enabled/tarpit_disabled logs on operator POST/DELETE actions	2026-04-29 18:49:42 -04:00
anti	2fc5f1bdc5	feat(canary): auto-deregister fingerprint slug after first valid beacon Once a fingerprint canary's HTTP beacon passes all 4 validation layers and the trigger row lands, the token is immediately set to state=revoked and canary.<id>.revoked is published on the bus. The slug lookup is tightened to only return planted tokens, so subsequent requests to the same URL silently return the transparent GIF without persisting anything (stealth posture preserved). Plain http/dns canaries with no fingerprint_nonce are not affected. Changes: - sqlmodel_repo/canary.py: add state == "planted" filter to get_canary_token_by_slug so revoked slugs resolve to None - worker.py: after record_canary_trigger, if parsed_fp survived all layers and token has a fingerprint_nonce, call update_canary_token_state("revoked") + publish CANARY_REVOKED; errors are best-effort (trigger row already landed) - test_worker_http.py: assert state=revoked in test_fp_valid_nonce_persists; new test_fp_deregisters_slug_after_valid_hit (second hit records nothing); new test_plain_http_canary_not_deregistered (env_file stays planted)	2026-04-29 17:49:31 -04:00
anti	b26dd8f529	feat(canary): API-trashing defense — 4-layer fingerprint validation Adds per-mint nonce gating, structural shape validation, mint UUID consistency checks, and a per-(token, IP) rate limiter to the canary worker so attackers who extract a canary from a decky filesystem cannot poison fingerprint forensics by replaying or forging ?d= submissions. Changes: base.py fingerprint_nonce: Optional[str] added to CanaryArtifact so generators can surface the nonce to the cultivator without coupling the generator directly to DB code. obfuscator.py nonce_for(callback_token, mint_uuid): HMAC-SHA256 keyed on DECNET_CANARY_FINGERPRINT_SECRET, truncated to 16 hex chars. FingerprintSecretMissing raised at mint time if env var is unset. render_fingerprint_js() now accepts nonce= and substitutes MINT_NONCE. fingerprint_payload.js New MINT_NONCE placeholder. Appended as &k= on all beacon URLs (bare-open, single-shot, chunked). Using &k= avoids colliding with &n= (chunk total). fingerprint_html.py / fingerprint_svg.py Derive nonce via nonce_for() and pass to render_fingerprint_js(). Set artifact.fingerprint_nonce so the cultivator can persist it. cultivator.py Passes fingerprint_nonce into create_canary_token() when present on the artifact; NULL for all non-fingerprint generators. canary.py (model) fingerprint_nonce: Optional[str] = Field(default=None, max_length=16) added to CanaryToken. None for non-fingerprint tokens. worker.py _extract_fingerprint now returns (meta_dict, parsed_fp) tuple. _record_hit accepts parsed_fp + raw_nonce and runs 4 layers after token lookup: nonce match, shape check, mint UUID consistency, rate limit. Each failure sets _fp_invalid_* flag and drops structured _fp. Trigger row always lands regardless. tests/canary/conftest.py Session-scoped autouse fixture sets DECNET_CANARY_FINGERPRINT_SECRET so fingerprint generator and worker tests work offline. tests 5 new worker HTTP tests and 2 new generator tests covering each validation layer.	2026-04-29 17:41:04 -04:00
anti	f86dc79990	feat(canary): ship Node helper with wheel + install-toolchain CLI The fingerprint canaries' obfuscator shells out to a Node helper that require()s javascript-obfuscator. Without this commit, a fresh pip install decnet would land the .py modules but not the .js helper / package.json, and there'd be no documented way to provision Node side. * pyproject.toml - extend tool.setuptools.package-data to ship canary/_obfuscate_helper.js, canary/fingerprint_payload.js, and canary/package.json with the wheel. * decnet/cli/canary.py - new "decnet canary-install-toolchain" subcommand. Resolves decnet.canary.__file__'s dir, runs npm install --omit=dev there, exits non-zero with a clear message if npm is missing or install fails. Idempotent - safe to call every API service start. * deploy/decnet-api.service.j2 - non-fatal ExecStartPre that calls the new subcommand. Leading '-' so a missing Node toolchain only degrades fingerprint canaries (loud at mint time) without keeping the API from booting. * tests/canary/test_cli.py - registration smoke test, missing-npm exit path, and a mocked-subprocess test asserting the right argv and cwd land on npm. Realism cultivator already has a broad except Exception around cultivate() in scheduler.py:195-211, so a missing toolchain on a host running the realism tick degrades to an inert noise file with no extra plumbing.	2026-04-29 16:53:27 -04:00
anti	907ade9142	feat(realism): wire fingerprint_html/svg through taxonomy + UI The two new fingerprint canary generators existed at the API level since `f64e78f` but weren't visible to the realism engine or the operator-facing dashboard. Threads them through every place that enumerates canary content classes. Backend: * realism/taxonomy.py - two new ContentClass members (CANARY_FINGERPRINT_HTML, CANARY_FINGERPRINT_SVG); enum is wire-visible (synthetic_files.content_class column + bus discrim) so we add at the bottom, never reorder. * canary/cultivator.py - class-to-generator dispatch, kind mapping (both http), and default placement paths (~/Documents/asset_directory.html and network_topology.svg). * realism/naming.py + bodies.py - _name_canary / _body_canary entries. * realism/planner.py - added to _DEFAULT_CANARY_CLASS_WEIGHTS and the _CANARY_CLASSES classification set. Frontend: * decnet_web/src/realism/labels.ts - display labels. * decnet_web/src/components/RealismConfig/RealismConfig.tsx - default canary weight rows so operators see them in the realism config UI. * decnet_web/src/components/SyntheticFiles/SyntheticFiles.tsx - added to the CONTENT_CLASSES allow-list so filter dropdowns show them. Also: re-applied the nosec B404/B603 markers on canary/obfuscator.py; the first commit's pre-commit autoformatter stripped them. Tests: extended tests/realism/test_taxonomy.py's stability assertion to include the two new values. Full canary + realism suites pass (362 / 2 skipped).	2026-04-29 16:44:03 -04:00
anti	de6d5cd1a8	fix(canary): include fingerprint_* in KNOWN_GENERATORS stability test	2026-04-29 16:26:09 -04:00
anti	dd807bc55e	feat(canary): worker decodes ?d=/?o=/?s=&i=&n=&d= fingerprint params The fingerprint payload beacons fingerprint data as base64url JSON in GET query params: ?o=1 for the bare-open beacon, ?d=<blob> for a single-shot dump, or ?s/i/n/d=<chunk> for chunked dumps. Until now those params were buried inside request_path; consumers had to parse the URL themselves. Worker now extracts them in _extract_fingerprint and merges into raw_headers under reserved _fp* keys: * _fp_open — bare-open marker * _fp — decoded fingerprint dict (single-shot path) * _fp_sid/idx/total/chunk — chunked metadata + raw base64 (reassembly is a downstream concern, not the worker's job) * _fp_decode_error / _fp_oversize — failure markers for trash dumps Per-chunk size capped at 8KB so an attacker spamming /c/<known_slug> can't inflate trigger rows indefinitely. Decode failures degrade gracefully — the trigger row still records the hit, just with a _fp_decode_error flag instead of structured fingerprint data. Tests cover the single-shot decode, bare-open flag, chunked metadata, malformed input, and oversize drop paths.	2026-04-29 16:25:17 -04:00
anti	f64e78f78c	feat(canary): fingerprint_html + fingerprint_svg generators Two new synthesised-artifact generators that bake the obfuscated fingerprint payload into plausible-looking decoy files: * fingerprint_html — a mundane "Internal Asset Directory" page with a small table of fake hosts; the obfuscated payload is inlined at the bottom of <body>. Visible content (row pool slice, sync timestamp) also varies per mint via SHA-256-derived stable ints, so two extracted canaries don't diff to zero even on the rendered surface. * fingerprint_svg — standalone SVG with an embedded <script> CDATA block. SVG <script> only fires for top-level loads / <object> / <iframe>; <img>-referenced renders are safely inert. Both derive the mint UUID via uuid.uuid5 from the callback token, so re-mints are byte-identical (preserving the generator determinism contract) AND the same token produces the same mint UUID across HTML and SVG variants — the worker can correlate beacons across artifact shapes. Wired into the factory + KNOWN_GENERATORS, default placement paths under ~/Documents/asset_directory.html and ~/Documents/network_topology.svg for both linux and windows personas. Tests cover determinism, per-token divergence, structural validity (DOCTYPE/SVG headers), and that the beacon URL stays inside the obfuscated string array (not in plaintext). The two new entries skip in test_generators.py when Node toolchain is absent so bare CI checkouts still pass.	2026-04-29 16:22:18 -04:00
anti	12cd7ad9cb	feat(canary): per-mint JS obfuscator wrapper + fingerprint payload Adds the load-bearing primitives for obfuscated browser-fingerprinting canaries. Step 3 (HTML/SVG generators) and step 4 (worker-side fingerprint ingestion) build on top of these. * decnet/canary/obfuscator.py - javascript-obfuscator wrapper. Seed and polymorphic config bits both derive from the callback token, so output is byte-identical for the same mint (preserving the generator determinism contract from base.py) and structurally distinct across mints. * decnet/canary/fingerprint_payload.js - port of canary-self-test.html with the rendering UI stripped. Two placeholders (BEACON_URL, MINT_UUID) substituted before obfuscation. MVP beacon strategy: bare-open GET pixel first, then base64url-encoded fingerprint as query params on subsequent GETs (chunked above ~6KB) so the existing worker records hits before step-4 lands. * decnet/canary/_obfuscate_helper.js - Node subprocess helper that reads code+options JSON from stdin and writes obfuscated JS to stdout. Vendored javascript-obfuscator under decnet/canary/. * tests/canary/test_obfuscator.py - determinism, per-mint divergence, template substitution, Node syntax check, error path.	2026-04-29 16:16:37 -04:00
anti	eefab020d4	fix(swarm): propagate service mutations to worker agent via shard re-dispatch Add/remove/update_config on a fleet decky living on a swarm worker — and on an agent-pinned topology — used to run the master's local docker-compose only, which has no containers for the remote decky. The mutation persisted on master and silently no-op'd on the worker. - Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host shard from master state and call dispatch_decnet_config — same proven path as POST /swarm/deploy. Skip local _compose (no containers to touch). - Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology (existing helper) to push the latest hydrated blob to the worker. - Local-only deckies: behaviour unchanged. - Tests: 5 new in tests/engine/test_services_live_swarm.py covering all three mutations on a swarm fleet decky (no local _compose, dispatch fires with the right host's deckies), plus apply=False save-only path (no dispatch), plus regression that local-only fleet add still runs local compose. Bus signal `decky.{name}.service_config_changed` keeps publishing as an audit trail; it is not the propagation trigger.	2026-04-29 12:51:16 -04:00
anti	94b06ee862	feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector - DeckyServiceAddRequest gains an optional `config: dict` field, validated against the service's config_schema before any state mutation (400 on bad type, no half-written rows). - Engine: add_service threads `config` into _add_topology_service / _add_fleet_service, persisting validated cfg to decky_config.service_config BEFORE compose regen so the first `up -d --build` materialises the env on the new container. No follow-up apply needed. - Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by: * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component) * MazeNET Inspector's ADD SERVICE picker * MazeNET palette drag-drop onto a deployed decky Empty-schema services short-circuit to a one-click add (no modal flash). Operator can cancel; errors surface in the modal. - Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent on bad types, back-compat empty-config. - Drive-by: fix stale repo-method names in test_services_live.py (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper, service.added → service_added topic).	2026-04-29 12:44:47 -04:00
anti	77ceb9d6f3	feat(services): config schemas for the rest of the registry + textarea base64 transport - Declarative config_schema on RDP, Telnet, MySQL, Redis, SMTP, SMTP_Relay matching the keys each service already reads at compose time. - TODO marker on the 19 services that accept service_cfg but never read it, so future contributors know where to plug schemas in. - Wizard base64-wraps all textarea values at INI emit (DeckyFleet buildIni); validate_cfg detects the b64: sentinel and decodes back to UTF-8. Plain raw strings still pass through for direct API submitters. - HTTPS image entrypoint accepts PEM content or path in TLS_CERT/TLS_KEY: detects a BEGIN header, writes content to /opt/tls/, and re-exports the on-disk path so server.py keeps reading paths. - Tests cover schema/compose alignment for each new service plus textarea base64 round-trip (incl. UTF-8) and HTTPS PEM end-to-end.	2026-04-29 12:23:56 -04:00
anti	d8fa7cc73d	feat(ui): per-service config in the deploy wizard's CONFIGURATION step Setting a password, banner or TLS material AFTER deployment forces a container recreate on every change. The deploy wizard now lets the operator set service config up-front so the initial build has the right env from the start. Mechanics: - Extracted the schema-driven field rendering out of ServiceConfigForm into a standalone ServiceConfigFields component (no API/buttons, just inputs + onChange). ServiceConfigForm now delegates to it. - Wizard step 2 (CONFIGURATION) renders one accordion block per selected service; clicking a service reveals its schema-driven inputs and a 'N set' badge tracks how many overrides are populated. Removing a service (back to step 1) drops its config so the INI doesn't carry orphans. - _buildIni emits one [<prefix>.<svc>] group subsection per service with at least one override. The INI loader's prefix-matcher applies it to every ${prefix}-NN decky in the batch, so one block covers all clones. - Multi-line string values (PEM textareas etc.) are escaped as \n on the way into INI; downstream consumers re-expand.	2026-04-29 12:08:17 -04:00
anti	97260daf8d	fix(ui): make .info-banner usable inside the deploy-wizard modal PersonaGeneration.css scopes .info-banner under .persona-gen-root, which doesn't match elements rendered inside the Modal portal — so the wizard's CONFIGURATION-step banner I just added rendered as plain text. Add a page-unscoped .info-banner rule in DeckyFleet.css with the same visual treatment (faint bg, violet left rule) so any modal context picks it up.	2026-04-29 12:01:42 -04:00
anti	8d3f5c646a	fix(network): accept CAP_NET_ADMIN in lieu of euid==0 for macvlan setup The systemd unit grants AmbientCapabilities=CAP_NET_ADMIN so the API service can program host-side macvlan/ipvlan interfaces without running as root, but setup_host_macvlan/_ipvlan rejected with euid!=0 before even trying — making web-driven 'decnet deploy' impossible under the privilege model the unit advertises. Replace _require_root with _require_net_admin, which reads CapEff from /proc/self/status and accepts the cap (bit 12) as well as euid==0. No libcap dep — pure /proc parse.	2026-04-29 11:56:40 -04:00
anti	5912608f78	fix(ui): wizard CONFIGURATION step + drop bogus --archetype custom preview The CONFIGURATION step had a stale disabled placeholder textarea ("per-service overrides") from before the schema-driven Inspector landed. Replaced with a one-line info banner pointing at the Inspector, which is now where per-service config actually lives. The DEPLOY step's CLI preview was rendering '--archetype custom' when pickMode==='services', but no such archetype is registered — only the preset archetypes plus 'services' (free-form list). Drop the --archetype line entirely in the services-mode preview so the rendered command reflects what the API actually receives.	2026-04-29 11:56:29 -04:00
anti	ba0e7ca476	style(ui): rebuild ServiceConfigForm in inspector terminal vocabulary Previous CSS lived in DeckyFleet.css only, so when the form rendered inside MazeNET Inspector the inputs fell back to browser defaults (white-on-white, oversized labels, mismatched buttons). New ServiceConfigForm.css ships with the component itself: small uppercase tracking-1 labels at 0.6rem (matches kvs .k), dark transparent inputs with violet focus, matrix-green text inside inputs, custom select chevron, dedicated svc-cfg-btn that visually mirrors maze-btn.small, password reveal toggle, and a 96px label column so labels never wrap into the input. Help text drops to 0.58rem dim under the input. Works identically in both surfaces.	2026-04-29 11:50:35 -04:00
anti	e51666ee14	fix(ui): stop ServiceConfigForm from re-fetching schema every render The schema useEffect depended on currentConfig, which the parent passes as a fresh `{}` literal on every render — referentially new each time, so the effect re-ran and the GET /services/.../schema hammered the server. Schema fetch now only depends on serviceSlug; form seeding from currentConfig moved to a separate effect keyed on JSON-stringified config so a real change reseeds but referential churn doesn't.	2026-04-29 11:48:20 -04:00
anti	bd7f2dfaed	feat(ui): schema-driven ServiceConfigForm in Fleet & MazeNET inspectors ServiceConfigForm.tsx fetches /topologies/services/{slug}/schema and renders typed inputs (string/password/int/bool/textarea/enum) with reveal toggles for secrets. SAVE persists via PUT (no restart); APPLY persists + force-recreates the service container after a confirm dialog (matches the forwards_l3 pattern). Mounts: - DeckyFleet DeckyCard: clicking a service tag toggles the form below the EXPOSED row, gated on liveServicesEnabled (admin + non-swarm). - MazeNET Inspector: renders the form above REMOVE SERVICE when a service is selected on a non-observed decky. UI test plan is manual — no jsdom test infra in decnet_web yet.	2026-04-29 11:41:43 -04:00
anti	75b1ce3a31	feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology - GET /topologies/services/{name}/schema serves the declared ServiceConfigField metadata so the Inspector can auto-render forms. - PUT /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the validated dict (DB + compose); container untouched (Save). - POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive). - New engine helper update_service_config wires both fleet and topology paths through the existing _persist_fleet_change / _rerender_topology_compose machinery; emits decky.<name>.service_config_changed on the bus.	2026-04-29 11:38:06 -04:00
anti	54b1fbed14	feat(services): declarative config_schema on BaseService + SSH/HTTP/HTTPS descriptors ServiceConfigField dataclass + BaseService.validate_cfg coerce/drop submitted service_cfg dicts against per-service typed schemas. SSH/HTTP/HTTPS now declare the keys they already read in compose_fragment, so the upcoming Inspector form has metadata to render from instead of hardcoded inputs per service.	2026-04-29 11:28:53 -04:00
anti	d314470d7f	fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts TopologyDecky rows where state='running'. No code path was flipping that state away from the default 'pending', so the count read 0/N even when every container was running fine — the dashboard was lying. Two complementary fixes: 1. deploy_topology — after the post-deploy compose ps verification, reconcile each TopologyDecky.state from the corresponding base container's docker state. running → 'running'; anything else → 'failed'. Reuses the ps_rows already gathered for the ACTIVE-vs-DEGRADED status decision; no extra docker hit. 2. apply_add_decky — _materialise_decky_spawn now returns True/False; on True the row is updated to state='running' before _assert_valid_after. Catches the case where a decky added via the live mutator queue stays at 'pending' indefinitely (the deployer's reconcile only runs on a fresh deploy_topology pass). Existing topology deckies in active topologies will still read as 'pending' until the next deploy_topology runs, since this is forward-only. An operator-side fix is to teardown + redeploy or run the (forthcoming) reconcile-on-startup pass.	2026-04-29 11:09:32 -04:00
anti	57e527534c	fix(mutator): auto-fall-back to legacy builder when buildx wedges live decky add apply_add_decky's compose-up was hard-failing whenever the operator's ~/.docker/buildx/activity/ landed on a read-only mount — the wedge detection in _compose_with_retry correctly refuses to retry (would just leak more mounts), but for live materialisation we don't want a wedged buildx state to abort an admin's mutation. ANTI hit it on adding decky-a977: 'failed to update builder last activity time: ... read-only file system → buildx wedge detected → returned non-zero'. _compose_up_with_buildkit_fallback wraps _compose_with_retry: on a CalledProcessError whose stderr matches both wedge signatures (_BUILDX_WEDGE_SIGNATURE + _BUILDX_EROFS_SIGNATURE), it logs a warning with the manual recovery steps + retries once with DOCKER_BUILDKIT=0 set. The legacy non-buildx builder doesn't use the activity dir and isn't affected. Wired into the two paths that pass --build: * _materialise_decky_spawn (apply_add_decky) * _materialise_decky_services_diff (apply_update_decky service add) _materialise_decky_recreate_base doesn't build — it just recreates a container from an existing image — so it's not affected. Operator-facing log message points at the manual fix (rm -rf ~/.docker/buildx/activity + docker buildx create) so they can recover at their leisure; we don't ATTEMPT the recovery because the activity dir might be RO for a reason (zfs/btrfs snapshot, etc.) that an automated rm would be wrong to fight.	2026-04-29 10:59:04 -04:00
anti	892219ec87	feat(mutator): refuse forwards_l3 promotion on non-DMZ deckies apply_update_decky's flip path now refuses to promote a decky to gateway unless its home LAN is a DMZ. The compose generator publishes host ports for forwards_l3=True; a non-DMZ gateway would shadow the host's port space without anything legitimately able to reach the service. Same posture as the existing 'forwards_l3 flip on live requires force=true' guard — refused before any DB write so a bad mutation leaves zero side-effects. The check is intentionally NOT a standing _RULES invariant — the codebase uses forwards_l3 for two semantics: 1. Generic L3 forwarding (internal bridge deckies routing between their multi-home LANs). The generator writes this on internal bridges via bridge_forward_probability; legitimately non-DMZ. 2. DMZ gateway (host-port publisher). Only meaningful on DMZ. Standing validation can't enforce DMZ-homing without breaking case 1. The guard fires only on the explicit user-driven flip path where the operator's intent is unambiguously case 2. Generator output and internal-bridge attachments bypass the check. check_gateway_homed_in_dmz lives in validate.py for callers that want the explicit form (and for the test surface), but is not a standing rule — comment in _RULES explains the asymmetry.	2026-04-29 00:38:51 -04:00
anti	c002c5a4f1	feat(ui): forwards_l3 toggle in Inspector with destructive-recreate confirm W5's apply_update_decky now accepts a forwards_l3 flip on a live topology only when payload['force'] is true (the unforced flip raises MutationError to keep half-thinking operators from killing in-container state). Until this commit there was no UI surface that could even submit such a flip. Inspector grows a 'PROMOTE TO GATEWAY' / 'DEMOTE GATEWAY' button when a (non-observed) decky is selected. The handler: * On pending topologies → submits via editor.updateDecky immediately. No confirm dialog; no live containers to disturb. * On active/degraded topologies → window.confirm() explaining the destructive base recreate ('In-container state is lost; active sessions to it drop'), then submits with extras.force=true. useTopologyEditor.updateDecky grows an optional extras arg that threads force: true into the queued mutation payload. The pending CRUD path ignores it (no force needed when no containers exist). MazeNET.tsx wires a toggleGateway callback that handles the optimistic local state update, surfaces an enqueue toast on the active path, and lets the SSE forwarder reconcile when mutation.applied lands.	2026-04-29 00:29:46 -04:00
anti	a27e3f5e0f	fix(tests+mutator): unbreak the docker-shadow test env + let mutator delete from active Two related fixes that came out of running the W5 tests locally: 1. tests/__init__.py — empty file, makes 'tests/' a package so pytest stops inserting it into sys.path. Without it, 'tests/docker/' (the docker-image test category) shadowed the installed docker SDK on every engine-touching test in the repo: module 'docker' has no attribute 'DockerClient' Pytest's default --import-mode=prepend was the culprit; making tests/ a package is the cheapest fix and doesn't change --import-mode for the whole tree. 2. delete_topology_decky / delete_topology_edge / delete_lan grow an 'enforce_pending: bool = True' kwarg. Default preserves the HTTP CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the 409 for free). apply_remove_decky / apply_detach_decky / apply_remove_lan now pass enforce_pending=False — the mutator queue is the live-editing surface and has its own active-topology gating; the repo's pending-only guard was for design-time CRUD that mustn't bypass it. Without this, apply_remove_decky was silently broken on active topologies pre-W5; W5's new test surfaced it on first run. 10/10 new W5 tests pass; 58/58 across mutator + topology suites.	2026-04-29 00:24:17 -04:00
anti	98c929894c	feat(mutator): selective materialisation for apply_update_decky + tests apply_update_decky now discriminates three sub-cases: * services list changed → diff old vs new and call _materialise_decky_services_diff (compose up -d for added, stop + rm -f for removed). Mirrors services_live's pattern but doesn't import it — mutator-routed mutations carry a different bus surface (mutation.applied) than the direct API path (decky.<name>.service_added). * forwards_l3 flipped → port publishing changes, which docker can only apply at container-create time. Gated on payload['force'] is true; default raises MutationError so a half-thinking operator can't stomp a live decky. When force=true, _materialise_decky_recreate_base does compose up -d --no-deps --force-recreate. Pre-checked BEFORE the DB write so a refused mutation leaves zero side-effects. * coord-only (x/y) → DB only, no docker work. Ships tests/mutator/test_ops_materialisation.py with focused coverage for every new helper: add_decky/remove_decky/attach_decky/ detach_decky/update_decky/update_lan paths against an active topology, with compose primitives + docker SDK mocked at the source modules so the helpers' lazy imports pick up the stubs. Also covers the pending-topology skip and the force-flag gating.	2026-04-29 00:18:20 -04:00
anti	e3afec4e70	feat(mutator): live network.disconnect for apply_detach_decky Symmetric to apply_attach_decky — after deleting the multi-home edge from the DB, calls the docker SDK to drop the base container's interface in the now-detached LAN. Service containers lose visibility automatically (they share the base's netns). Idempotency: 'not connected' / 'no such' APIError is logged at info and treated as success.	2026-04-29 00:15:39 -04:00
anti	f347a3a736	feat(mutator): live network.connect for apply_attach_decky After the DB writes that record the multi-home edge, calls the docker SDK directly to add an interface to the base container's netns: client.networks.get(<topology bridge>).connect(<base>, ipv4_address=ip) Non-destructive — the base keeps running, no recreate. Service containers automatically see the new interface because they share the base's netns via network_mode: service:<base>. Idempotency: docker APIError with 'already' / 'endpoint exists' is logged at info and treated as success. Other errors log + leave the DB row in place; an operator retry will hit the same path.	2026-04-29 00:15:11 -04:00
anti	eed55619cb	feat(mutator): live teardown for apply_remove_decky Captures the decky's name and services list before delete_topology_decky runs (the helper needs both as compose targets even though the DB row is gone), then calls _materialise_decky_remove which stops + rm -f's the base + per-service containers via 'docker compose stop / rm -f'. Re-renders the per-topology compose AFTER the stop/rm so a future 'compose up -d' on the file doesn't try to bring the decky back.	2026-04-29 00:14:44 -04:00
anti	8c06190e69	feat(mutator): live spawn for apply_add_decky + shared materialisation helpers Adds _materialise_decky_{spawn,remove,connect,disconnect,services_diff,recreate_base} helpers alongside the existing _materialise_lan_change. Each follows the same skip rules: bail when topology is not active/degraded, when agent-pinned, or when docker calls fail (logged, not re-raised — DB remains source of truth). apply_add_decky now calls _materialise_decky_spawn after the DB writes. The helper: * re-renders the per-topology compose so it lists the new decky; * runs 'compose up -d --no-deps --build <decky_base> <decky>-<svc>...' in a worker thread (matches engine/services_live's pattern). Service container targets are filtered through get_service() so fleet_singleton services are skipped — they don't have per-decky compose entries. Gateway (forwards_l3=True) deckies need no special-case here; the compose generator already emits the host 'ports:' block for them. Subsequent commits wire the other apply_* ops to the matching helpers. Tests for the full set ship in the workstream's last commit.	2026-04-29 00:14:18 -04:00
anti	578cdf9e2e	fix(mutator): reject hostile apply_update_lan changes on live topologies subnet and is_dmz are pinned at deploy time — live deckies bind to the bridge with IPs allocated from the old subnet, and is_dmz flips the docker network's internal flag which can't be changed while containers are attached. Today the op happily wrote the new value into the DB and left docker on the old one, drifting the two surfaces. apply_update_lan now raises MutationError when topology status is active or degraded and the patch touches subnet or is_dmz. Coord (x/y) and rename updates still pass through; renames don't currently have a live caller and the bridge's docker name keys off the lan name in the renderer, so the next deploy will reconcile. This matches the posture taken by _materialise_lan_change for live LAN add/remove (commit `472c84b`).	2026-04-29 00:12:44 -04:00
anti	2731b2608b	fix(ui): keep multi-homed deckies in their home LAN on rehydrate list_topology_edges has no ORDER BY, so SQL row order is undefined. After apply_attach_decky added a bridge edge to a second LAN, on refetch the bridge edge could come back first — firstLanFor then picked it as the decky's home and the visualization 'teleported' the decky into the other LAN (the bug ANTI saw immediately after connecting two deckies across LANs). Hydration now prefers the non-bridge edge (is_bridge=false) as home. apply_add_decky writes is_bridge=false for the original edge; apply_attach_decky writes is_bridge=true for subsequent multi-homing edges. Picking the non-bridge edge is stable across row reordering. Two-pass implementation: pass 1 sets pinned homes (DMZ for gateways, non-bridge for others); pass 2 fills any gap with the first edge (legacy rows where is_bridge was never written).	2026-04-29 00:01:29 -04:00
anti	472c84b9c8	fix(mutator): materialise live LAN add/remove on docker, not just the DB apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted the topology_lans row but never created or destroyed the docker bridge network. Adding a LAN to a deployed topology silently did nothing on the substrate side; any decky later attached to it had nowhere to bind. Both ops now call a shared _materialise_lan_change helper after the DB write. When the topology is active/degraded and not pinned to a swarm agent, the helper: * creates / removes the docker bridge network (internal=True for non-DMZ LANs, mirroring engine/deployer.deploy_topology), * re-renders the per-topology compose file so future redeploys reflect the change. Failures are logged, not re-raised — the DB row stays as source of truth so an operator can retry without leaking inconsistent state. Agent-pinned topologies are skipped; the next agent push reconciles. apply_add_decky / apply_attach_decky have the same gap and are not fixed here — multi-homing a running container needs careful recreate-vs-network-connect handling and is its own commit. Without those, dropping a decky into a freshly-added LAN still won't spawn a container; only the LAN itself is now live.	2026-04-29 00:00:02 -04:00
anti	bbed52a962	fix(bus): topic segments can't contain dots — service.added → service_added Bus topic segments are NATS-style tokens and the validator at bus/topics.py:402 rejects '.', '*', '>', whitespace. My W3 constants 'service.added' / 'service.removed' tripped this on every live add/remove call: ValueError: topic segment 'service.added' may not contain '.', ... Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'. Aligned the SSE forwarder's name mapping (decky.<name>.service_added → SSE event 'decky.service_added') and the frontend's useTopologyStream listener + MazeNET.tsx event handler. Also updated the wiki entry with a note about the underscore.	2026-04-28 23:53:25 -04:00
anti	d595240f55	fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash deploy_topology was flipping to ACTIVE the moment 'compose up -d' returned 0, but compose returns 0 as soon as containers are started. A service that crashes on boot (port bind failure, bad image, missing entrypoint) left the topology row sitting at ACTIVE indefinitely while half the substrate was dead. After compose returns, we now run 'compose ps --all --format json', parse the newline-delimited per-container rows, and downgrade to DEGRADED with a reason listing the first eight unhealthy containers if anything isn't in state='running'. Operators see real state on the topology page instead of an optimistic flag. _compose_ps swallows compose-level errors (returns []) so an unrelated docker hiccup doesn't gate the success path — the existing in-flight exception path still catches genuine deploy failures with FAILED.	2026-04-28 23:39:50 -04:00
anti	9e8d0b0464	fix(ui): route palette drops + design-time remove through live API on active topologies When topoStatus is active/degraded, editor.updateDecky enqueues into the mutator queue and returns {kind:'enqueued'}. The palette-drop handler then short-circuits on that and never updates local state, so a service dragged onto a deployed decky just vanishes — what ANTI saw as 'no way to APPLY'. Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's service detail panel: enqueue + no local update = chip stays. Both now route through liveAddService / liveRemoveService when the topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services directly and patching local state from the response. Pending topologies still queue through the mutator (correct: no live containers to mutate). Hoisted serviceRegistry / liveAddService / liveRemoveService above the palette-drop callback so the deps array doesn't trip the const TDZ at render time.	2026-04-28 23:38:37 -04:00
anti	463877b8fc	fix(ui): hit /topologies/ with trailing slash to keep bearer FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and the browser drops Authorization on the redirected URL — the topology picker in the canary create modal was landing as 401 even for admins. Hit the canonical (trailing-slash) path so the request resolves on the first hop.	2026-04-28 23:18:39 -04:00
anti	0e5484648f	feat: forward decky..service. on per-topology SSE stream The /topologies/{id}/events SSE proxy now subscribes to two bus patterns concurrently and merges them through a bounded asyncio.Queue: * topology.{id}.> — lifecycle (status, mutation.) — unchanged. decky.> — per-decky events, filtered by payload.topology_id so a fleet decky sharing a name with a topology decky doesn't leak across. _sse_name_for routes 'decky.<name>.service.added' to the SSE event name 'decky.service.added' (kept the prefix so the frontend doesn't collide with topology lifecycle events that share leaf names like 'status'). useTopologyStream surfaces the two new event names; MazeNET.tsx's onStreamEvent optimistically patches the matching node's services list so a second tab reflects shape changes without a refetch.	2026-04-28 23:15:38 -04:00
anti	e7d49d7237	feat(ui): live service add/remove on fleet DeckyCard DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we just shipped on the MazeNET Inspector. Wired to POST/DELETE /api/v1/deckies/{name}/services{,/svc}; the response's services list flows back through onServicesChanged to update the parent's deckies state without a refetch. Gated on isAdmin && !decky.swarm — swarm deckies live on a remote agent and the W3 endpoint runs docker compose locally, same gap as the canary planter has for agent-pinned topologies. Out of scope here; flagged as a known limitation. stopPropagation on the inline buttons + add-row container keeps the card-level click (which selects the decky for inspection) from firing on intra-row interactions.	2026-04-28 23:13:46 -04:00
anti	1a631c9400	fix(ui): narrow services type for Inspector live-add picker ObservedNode.services is the literal tuple ['*']; narrowing inside the .filter() callback was tripping TS2345. We already gate the live controls on node.kind !== 'observed', so casting to readonly string[] inside the filter is safe and keeps the discriminated union strict elsewhere.	2026-04-28 23:11:39 -04:00
anti	2fabcd1c29	feat(ui): live service add/remove on MazeNET Inspector When the topology is active/degraded the Inspector switches services chips into live controls: each chip gets a × button that DELETEs to the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker fed by useServiceRegistry().perDecky. Pending topologies still use the existing design-time path (onRemoveService → editor.updateDecky); the Inspector picks based on topologyStatus, so an operator never accidentally hits a live API call against a topology that isn't deployed yet. The mutation handlers in MazeNET.tsx hit POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc} and optimistically apply the response's services list to local state. Cross-tab reconciliation rides on the SSE forwarder shipped in the follow-up commit.	2026-04-28 23:11:02 -04:00
anti	06f208c86e	feat: surface fleet_singleton flag on /topologies/services Adds a fleet_singletons array to ServiceCatalogResponse so per-decky add UIs can filter out services like LLMNR that run once fleet-wide (and would 422 server-side at the live add endpoint). The existing 'services: list[str]' field is unchanged for back-compat with MazeNET/useMazeApi.ts:257; the new field is additive. decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a module-scoped cache (registry only changes on BYOS install / plugin drop, neither of which happens mid-session) and exposes a precomputed .perDecky list so consumers don't need to re-derive the diff.	2026-04-28 23:08:29 -04:00
anti	4287e94deb	feat(ui): file drops tab on CanaryTokens CanaryTokens.tsx grows a third tab — File drops — alongside Tokens and Blobs. The page now covers every 'admin landed bytes on a decky' operation in one place. FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET toggle, topology+decky picker, absolute-path validation matching the backend (DeckyFileDropRequest rejects relative + ..-traversal), mode + mtime offset inputs, and a -1w preset for backdating. FileReader → data URL → strip prefix → POST /api/v1/deckies/files. The list is local-only (localStorage, capped at 200 entries). W2's backend doesn't persist drops by design — the endpoint is for staging payloads, not as an audit trail. CLEAR LIST button on the tab; no DELETE button on rows since the local entry doesn't track whether the file is still there (an attacker may have moved it). Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.	2026-04-28 23:06:53 -04:00
anti	c942d4d333	feat(ui): scope canary tokens to MazeNET topology deckies CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal. In topology mode we hydrate /topologies?status=active for the topology picker, then GET /topologies/{id} on selection to repopulate the decky picker — topology deckies have a different shape than fleet's /deckies endpoint. The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'), and a third filter dropdown alongside state. The drawer's metadata section shows a Scope row with a clickable jump-link back to the MazeNET view at the right topology. CanaryTokenRow grows a topology_id field so the drawer/list can discriminate without re-fetching.	2026-04-28 23:04:13 -04:00
anti	6ac8cac908	feat(deckies): live service add/remove without full redeploy decnet.engine.services_live exposes add_service / remove_service for both fleet and topology decky scopes. The host's _compose() wrapper already supported per-service targeting (up --no-deps -d <svc>, stop, rm -f); what was missing was the orchestration around it: * add: validate against decnet.services.registry (rejects unknown + fleet_singleton); persist the new services list; re-render the per-scope compose file (so future redeploys reflect the change); run docker compose up -d --no-deps --build <decky>-<svc>. * remove: stop + rm -f the service container; persist; re-render compose so a future up -d doesn't bring it back. Both publish decky.<name>.service.added / .removed on the bus, with the post-mutation services list. Topic constants added to decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored). Four new admin endpoints: * POST/DELETE /api/v1/deckies/{name}/services{,/svc} * POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc} ServiceMutationError messages are mapped at the API boundary to 404 (decky/topology missing), 409 (idempotency violation), 422 (unknown or fleet_singleton service).	2026-04-28 22:51:42 -04:00
anti	0bc4b05c73	feat(deckies): generic file drops on fleet + MazeNET deckies Extracts the docker-exec-with-base64-stdin pattern out of canary/planter and orchestrator/drivers/ssh into a shared decnet.decky_io package. Both consumers now delegate; the canary planter test still proves the contract end-to-end. Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops. Container resolution is shared with the canary path: topology_id absent means fleet (<name>-ssh), present routes through resolve_decky_container which picks <name>-ssh when the topology decky exposes ssh, else the topology base container decnet_t_<id8>_<name>. Path validation rejects relative paths and '..' traversal at the request model layer. Bad base64 → 400; unknown topology → 404; decky not in topology → 422; docker exec failure → 409.	2026-04-28 22:43:34 -04:00
anti	3fe999d706	feat(canary): allow custom canaries on MazeNET deckies via API POST /api/v1/canary/tokens grows an optional topology_id field. When present, the server hydrates the topology, validates the named decky is in it, and resolves the docker container via planter.resolve_topology_container — <name>-ssh if the decky exposes ssh, else the topology base container. Absent ⇒ fleet semantics, unchanged. The token row gets a nullable topology_id column (no migration helper per pre-v1 policy). GET /api/v1/canary/tokens accepts ?topology_id= as a filter. DELETE re-resolves the container at revoke time so a redeployed topology is still reachable. 422 when the named decky isn't in the topology; 404 when the topology itself doesn't exist.	2026-04-28 22:34:45 -04:00
anti	5802de1f86	feat(canary): seed baseline canaries on MazeNET deckies Topology deploys now plant the configured canary baseline set on every decky in the topology, mirroring the fleet-deploy hook. Containers are resolved via resolve_topology_container — <decky>-ssh when the decky exposes an ssh service, else the topology base container decnet_t_<id8>_<decky>. The planter's plant/revoke/seed_baseline grow an optional container= kwarg; default preserves the fleet <name>-ssh resolution.	2026-04-28 22:30:11 -04:00
anti	04b0637c24	feat(bounty): wire artifact download into BountyInspector drawer The Vault page already shows file drops and stored mail (`e3ddeb0`) but the inspector drawer had no download button — only the live-feed ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD RAW action to BountyInspector that fires when bounty_type=artifact, hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling and 400/403/404 error mapping. Also widen the icon/label vocabulary: artifact bounties get FileText (file drops) or Mail (message_stored) instead of the generic Package, and the inspector header chip mirrors the change.	2026-04-28 22:03:58 -04:00

1 2 3 4 5 ...

257 Commits