Commit Graph

250 Commits

Author SHA1 Message Date
f64e78f78c feat(canary): fingerprint_html + fingerprint_svg generators
Two new synthesised-artifact generators that bake the obfuscated
fingerprint payload into plausible-looking decoy files:

* fingerprint_html — a mundane "Internal Asset Directory" page with a
  small table of fake hosts; the obfuscated payload is inlined at the
  bottom of <body>. Visible content (row pool slice, sync timestamp)
  also varies per mint via SHA-256-derived stable ints, so two
  extracted canaries don't diff to zero even on the rendered surface.
* fingerprint_svg — standalone SVG with an embedded <script> CDATA
  block. SVG <script> only fires for top-level loads / <object> /
  <iframe>; <img>-referenced renders are safely inert.

Both derive the mint UUID via uuid.uuid5 from the callback token, so
re-mints are byte-identical (preserving the generator determinism
contract) AND the same token produces the same mint UUID across HTML
and SVG variants — the worker can correlate beacons across artifact
shapes.

Wired into the factory + KNOWN_GENERATORS, default placement paths
under ~/Documents/asset_directory.html and ~/Documents/network_topology.svg
for both linux and windows personas. Tests cover determinism, per-token
divergence, structural validity (DOCTYPE/SVG headers), and that the
beacon URL stays inside the obfuscated string array (not in plaintext).
The two new entries skip in test_generators.py when Node toolchain is
absent so bare CI checkouts still pass.
2026-04-29 16:22:18 -04:00
12cd7ad9cb feat(canary): per-mint JS obfuscator wrapper + fingerprint payload
Adds the load-bearing primitives for obfuscated browser-fingerprinting
canaries. Step 3 (HTML/SVG generators) and step 4 (worker-side
fingerprint ingestion) build on top of these.

* decnet/canary/obfuscator.py - javascript-obfuscator wrapper. Seed
  and polymorphic config bits both derive from the callback token, so
  output is byte-identical for the same mint (preserving the generator
  determinism contract from base.py) and structurally distinct across
  mints.
* decnet/canary/fingerprint_payload.js - port of canary-self-test.html
  with the rendering UI stripped. Two placeholders (BEACON_URL,
  MINT_UUID) substituted before obfuscation. MVP beacon strategy:
  bare-open GET pixel first, then base64url-encoded fingerprint as
  query params on subsequent GETs (chunked above ~6KB) so the existing
  worker records hits before step-4 lands.
* decnet/canary/_obfuscate_helper.js - Node subprocess helper that
  reads code+options JSON from stdin and writes obfuscated JS to
  stdout. Vendored javascript-obfuscator under decnet/canary/.
* tests/canary/test_obfuscator.py - determinism, per-mint divergence,
  template substitution, Node syntax check, error path.
2026-04-29 16:16:37 -04:00
eefab020d4 fix(swarm): propagate service mutations to worker agent via shard re-dispatch
Add/remove/update_config on a fleet decky living on a swarm worker — and on
an agent-pinned topology — used to run the master's local docker-compose only,
which has no containers for the remote decky. The mutation persisted on master
and silently no-op'd on the worker.

- Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host
  shard from master state and call dispatch_decnet_config — same proven path
  as POST /swarm/deploy. Skip local _compose (no containers to touch).
- Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology
  (existing helper) to push the latest hydrated blob to the worker.
- Local-only deckies: behaviour unchanged.
- Tests: 5 new in tests/engine/test_services_live_swarm.py covering all
  three mutations on a swarm fleet decky (no local _compose, dispatch fires
  with the right host's deckies), plus apply=False save-only path (no
  dispatch), plus regression that local-only fleet add still runs local compose.

Bus signal `decky.{name}.service_config_changed` keeps publishing as an
audit trail; it is not the propagation trigger.
2026-04-29 12:51:16 -04:00
94b06ee862 feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
  against the service's config_schema before any state mutation (400 on
  bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
  _add_fleet_service, persisting validated cfg to decky_config.service_config
  BEFORE compose regen so the first `up -d --build` materialises the env on
  the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
    * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
    * MazeNET Inspector's ADD SERVICE picker
    * MazeNET palette drag-drop onto a deployed decky
  Empty-schema services short-circuit to a one-click add (no modal flash).
  Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
  on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
  (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
  service.added → service_added topic).
2026-04-29 12:44:47 -04:00
77ceb9d6f3 feat(services): config schemas for the rest of the registry + textarea base64 transport
- Declarative config_schema on RDP, Telnet, MySQL, Redis, SMTP, SMTP_Relay
  matching the keys each service already reads at compose time.
- TODO marker on the 19 services that accept service_cfg but never read it,
  so future contributors know where to plug schemas in.
- Wizard base64-wraps all textarea values at INI emit (DeckyFleet
  buildIni); validate_cfg detects the b64: sentinel and decodes back to
  UTF-8. Plain raw strings still pass through for direct API submitters.
- HTTPS image entrypoint accepts PEM content or path in TLS_CERT/TLS_KEY:
  detects a BEGIN header, writes content to /opt/tls/, and re-exports
  the on-disk path so server.py keeps reading paths.
- Tests cover schema/compose alignment for each new service plus
  textarea base64 round-trip (incl. UTF-8) and HTTPS PEM end-to-end.
2026-04-29 12:23:56 -04:00
d8fa7cc73d feat(ui): per-service config in the deploy wizard's CONFIGURATION step
Setting a password, banner or TLS material AFTER deployment forces a
container recreate on every change. The deploy wizard now lets the
operator set service config up-front so the initial build has the
right env from the start.

Mechanics:
- Extracted the schema-driven field rendering out of ServiceConfigForm
  into a standalone ServiceConfigFields component (no API/buttons,
  just inputs + onChange).  ServiceConfigForm now delegates to it.
- Wizard step 2 (CONFIGURATION) renders one accordion block per
  selected service; clicking a service reveals its schema-driven
  inputs and a 'N set' badge tracks how many overrides are populated.
  Removing a service (back to step 1) drops its config so the INI
  doesn't carry orphans.
- _buildIni emits one [<prefix>.<svc>] group subsection per service
  with at least one override.  The INI loader's prefix-matcher
  applies it to every ${prefix}-NN decky in the batch, so one block
  covers all clones.
- Multi-line string values (PEM textareas etc.) are escaped as \n
  on the way into INI; downstream consumers re-expand.
2026-04-29 12:08:17 -04:00
97260daf8d fix(ui): make .info-banner usable inside the deploy-wizard modal
PersonaGeneration.css scopes .info-banner under .persona-gen-root,
which doesn't match elements rendered inside the Modal portal —
so the wizard's CONFIGURATION-step banner I just added rendered
as plain text.

Add a page-unscoped .info-banner rule in DeckyFleet.css with the
same visual treatment (faint bg, violet left rule) so any modal
context picks it up.
2026-04-29 12:01:42 -04:00
8d3f5c646a fix(network): accept CAP_NET_ADMIN in lieu of euid==0 for macvlan setup
The systemd unit grants AmbientCapabilities=CAP_NET_ADMIN so the API
service can program host-side macvlan/ipvlan interfaces without
running as root, but setup_host_macvlan/_ipvlan rejected with euid!=0
before even trying — making web-driven 'decnet deploy' impossible
under the privilege model the unit advertises.

Replace _require_root with _require_net_admin, which reads CapEff
from /proc/self/status and accepts the cap (bit 12) as well as
euid==0. No libcap dep — pure /proc parse.
2026-04-29 11:56:40 -04:00
5912608f78 fix(ui): wizard CONFIGURATION step + drop bogus --archetype custom preview
The CONFIGURATION step had a stale disabled placeholder textarea
("per-service overrides") from before the schema-driven Inspector
landed. Replaced with a one-line info banner pointing at the Inspector,
which is now where per-service config actually lives.

The DEPLOY step's CLI preview was rendering '--archetype custom' when
pickMode==='services', but no such archetype is registered — only the
preset archetypes plus 'services' (free-form list). Drop the
--archetype line entirely in the services-mode preview so the rendered
command reflects what the API actually receives.
2026-04-29 11:56:29 -04:00
ba0e7ca476 style(ui): rebuild ServiceConfigForm in inspector terminal vocabulary
Previous CSS lived in DeckyFleet.css only, so when the form rendered
inside MazeNET Inspector the inputs fell back to browser defaults
(white-on-white, oversized labels, mismatched buttons).

New ServiceConfigForm.css ships with the component itself: small
uppercase tracking-1 labels at 0.6rem (matches kvs .k), dark
transparent inputs with violet focus, matrix-green text inside
inputs, custom select chevron, dedicated svc-cfg-btn that visually
mirrors maze-btn.small, password reveal toggle, and a 96px label
column so labels never wrap into the input. Help text drops to
0.58rem dim under the input. Works identically in both surfaces.
2026-04-29 11:50:35 -04:00
e51666ee14 fix(ui): stop ServiceConfigForm from re-fetching schema every render
The schema useEffect depended on currentConfig, which the parent
passes as a fresh `{}` literal on every render — referentially new
each time, so the effect re-ran and the GET /services/.../schema
hammered the server.

Schema fetch now only depends on serviceSlug; form seeding from
currentConfig moved to a separate effect keyed on JSON-stringified
config so a real change reseeds but referential churn doesn't.
2026-04-29 11:48:20 -04:00
bd7f2dfaed feat(ui): schema-driven ServiceConfigForm in Fleet & MazeNET inspectors
ServiceConfigForm.tsx fetches /topologies/services/{slug}/schema and renders
typed inputs (string/password/int/bool/textarea/enum) with reveal toggles for
secrets. SAVE persists via PUT (no restart); APPLY persists + force-recreates
the service container after a confirm dialog (matches the forwards_l3 pattern).

Mounts:
- DeckyFleet DeckyCard: clicking a service tag toggles the form below the
  EXPOSED row, gated on liveServicesEnabled (admin + non-swarm).
- MazeNET Inspector: renders the form above REMOVE SERVICE when a service
  is selected on a non-observed decky.

UI test plan is manual — no jsdom test infra in decnet_web yet.
2026-04-29 11:41:43 -04:00
75b1ce3a31 feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
  metadata so the Inspector can auto-render forms.
- PUT  /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
  validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
  force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
  through the existing _persist_fleet_change / _rerender_topology_compose
  machinery; emits decky.<name>.service_config_changed on the bus.
2026-04-29 11:38:06 -04:00
54b1fbed14 feat(services): declarative config_schema on BaseService + SSH/HTTP/HTTPS descriptors
ServiceConfigField dataclass + BaseService.validate_cfg coerce/drop submitted
service_cfg dicts against per-service typed schemas. SSH/HTTP/HTTPS now declare
the keys they already read in compose_fragment, so the upcoming Inspector form
has metadata to render from instead of hardcoded inputs per service.
2026-04-29 11:28:53 -04:00
d314470d7f fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right
Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts
TopologyDecky rows where state='running'.  No code path was flipping
that state away from the default 'pending', so the count read 0/N
even when every container was running fine — the dashboard was lying.

Two complementary fixes:

1. deploy_topology — after the post-deploy compose ps verification,
   reconcile each TopologyDecky.state from the corresponding base
   container's docker state.  running → 'running'; anything else →
   'failed'.  Reuses the ps_rows already gathered for the
   ACTIVE-vs-DEGRADED status decision; no extra docker hit.

2. apply_add_decky — _materialise_decky_spawn now returns True/False;
   on True the row is updated to state='running' before
   _assert_valid_after.  Catches the case where a decky added via the
   live mutator queue stays at 'pending' indefinitely (the deployer's
   reconcile only runs on a fresh deploy_topology pass).

Existing topology deckies in active topologies will still read as
'pending' until the next deploy_topology runs, since this is
forward-only.  An operator-side fix is to teardown + redeploy or run
the (forthcoming) reconcile-on-startup pass.
2026-04-29 11:09:32 -04:00
57e527534c fix(mutator): auto-fall-back to legacy builder when buildx wedges live decky add
apply_add_decky's compose-up was hard-failing whenever the operator's
~/.docker/buildx/activity/ landed on a read-only mount — the wedge
detection in _compose_with_retry correctly refuses to retry (would
just leak more mounts), but for live materialisation we don't want a
wedged buildx state to abort an admin's mutation.  ANTI hit it on
adding decky-a977: 'failed to update builder last activity time: ...
read-only file system → buildx wedge detected → returned non-zero'.

_compose_up_with_buildkit_fallback wraps _compose_with_retry: on a
CalledProcessError whose stderr matches both wedge signatures
(_BUILDX_WEDGE_SIGNATURE + _BUILDX_EROFS_SIGNATURE), it logs a
warning with the manual recovery steps + retries once with
DOCKER_BUILDKIT=0 set.  The legacy non-buildx builder doesn't use
the activity dir and isn't affected.

Wired into the two paths that pass --build:
* _materialise_decky_spawn (apply_add_decky)
* _materialise_decky_services_diff (apply_update_decky service add)

_materialise_decky_recreate_base doesn't build — it just recreates a
container from an existing image — so it's not affected.

Operator-facing log message points at the manual fix
(rm -rf ~/.docker/buildx/activity + docker buildx create) so they
can recover at their leisure; we don't ATTEMPT the recovery because
the activity dir might be RO for a reason (zfs/btrfs snapshot, etc.)
that an automated rm would be wrong to fight.
2026-04-29 10:59:04 -04:00
892219ec87 feat(mutator): refuse forwards_l3 promotion on non-DMZ deckies
apply_update_decky's flip path now refuses to promote a decky to
gateway unless its home LAN is a DMZ.  The compose generator publishes
host ports for forwards_l3=True; a non-DMZ gateway would shadow the
host's port space without anything legitimately able to reach the
service.  Same posture as the existing 'forwards_l3 flip on live
requires force=true' guard — refused before any DB write so a bad
mutation leaves zero side-effects.

The check is intentionally NOT a standing _RULES invariant — the
codebase uses forwards_l3 for two semantics:

  1. Generic L3 forwarding (internal bridge deckies routing between
     their multi-home LANs).  The generator writes this on internal
     bridges via bridge_forward_probability; legitimately non-DMZ.
  2. DMZ gateway (host-port publisher).  Only meaningful on DMZ.

Standing validation can't enforce DMZ-homing without breaking case 1.
The guard fires only on the explicit user-driven flip path where the
operator's intent is unambiguously case 2.  Generator output and
internal-bridge attachments bypass the check.

check_gateway_homed_in_dmz lives in validate.py for callers that want
the explicit form (and for the test surface), but is not a standing
rule — comment in _RULES explains the asymmetry.
2026-04-29 00:38:51 -04:00
c002c5a4f1 feat(ui): forwards_l3 toggle in Inspector with destructive-recreate confirm
W5's apply_update_decky now accepts a forwards_l3 flip on a live
topology only when payload['force'] is true (the unforced flip raises
MutationError to keep half-thinking operators from killing
in-container state).  Until this commit there was no UI surface that
could even submit such a flip.

Inspector grows a 'PROMOTE TO GATEWAY' / 'DEMOTE GATEWAY' button when
a (non-observed) decky is selected.  The handler:

* On pending topologies → submits via editor.updateDecky immediately.
  No confirm dialog; no live containers to disturb.
* On active/degraded topologies → window.confirm() explaining the
  destructive base recreate ('In-container state is lost; active
  sessions to it drop'), then submits with extras.force=true.

useTopologyEditor.updateDecky grows an optional extras arg that
threads force: true into the queued mutation payload.  The pending
CRUD path ignores it (no force needed when no containers exist).

MazeNET.tsx wires a toggleGateway callback that handles the
optimistic local state update, surfaces an enqueue toast on the
active path, and lets the SSE forwarder reconcile when
mutation.applied lands.
2026-04-29 00:29:46 -04:00
a27e3f5e0f fix(tests+mutator): unbreak the docker-shadow test env + let mutator delete from active
Two related fixes that came out of running the W5 tests locally:

1. tests/__init__.py — empty file, makes 'tests/' a package so pytest
   stops inserting it into sys.path.  Without it, 'tests/docker/'
   (the docker-image test category) shadowed the installed docker SDK
   on every engine-touching test in the repo:

     module 'docker' has no attribute 'DockerClient'

   Pytest's default --import-mode=prepend was the culprit; making
   tests/ a package is the cheapest fix and doesn't change
   --import-mode for the whole tree.

2. delete_topology_decky / delete_topology_edge / delete_lan grow an
   'enforce_pending: bool = True' kwarg.  Default preserves the HTTP
   CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the
   409 for free).  apply_remove_decky / apply_detach_decky /
   apply_remove_lan now pass enforce_pending=False — the mutator
   queue is the live-editing surface and has its own active-topology
   gating; the repo's pending-only guard was for design-time CRUD
   that mustn't bypass it.  Without this, apply_remove_decky was
   silently broken on active topologies pre-W5; W5's new test
   surfaced it on first run.

10/10 new W5 tests pass; 58/58 across mutator + topology suites.
2026-04-29 00:24:17 -04:00
98c929894c feat(mutator): selective materialisation for apply_update_decky + tests
apply_update_decky now discriminates three sub-cases:

* services list changed → diff old vs new and call
  _materialise_decky_services_diff (compose up -d for added,
  stop + rm -f for removed).  Mirrors services_live's pattern but
  doesn't import it — mutator-routed mutations carry a different bus
  surface (mutation.applied) than the direct API path
  (decky.<name>.service_added).
* forwards_l3 flipped → port publishing changes, which docker can
  only apply at container-create time.  Gated on payload['force'] is
  true; default raises MutationError so a half-thinking operator
  can't stomp a live decky.  When force=true,
  _materialise_decky_recreate_base does compose up -d --no-deps
  --force-recreate.  Pre-checked BEFORE the DB write so a refused
  mutation leaves zero side-effects.
* coord-only (x/y) → DB only, no docker work.

Ships tests/mutator/test_ops_materialisation.py with focused coverage
for every new helper: add_decky/remove_decky/attach_decky/
detach_decky/update_decky/update_lan paths against an active
topology, with compose primitives + docker SDK mocked at the source
modules so the helpers' lazy imports pick up the stubs.  Also covers
the pending-topology skip and the force-flag gating.
2026-04-29 00:18:20 -04:00
e3afec4e70 feat(mutator): live network.disconnect for apply_detach_decky
Symmetric to apply_attach_decky — after deleting the multi-home edge
from the DB, calls the docker SDK to drop the base container's
interface in the now-detached LAN.  Service containers lose
visibility automatically (they share the base's netns).

Idempotency: 'not connected' / 'no such' APIError is logged at info
and treated as success.
2026-04-29 00:15:39 -04:00
f347a3a736 feat(mutator): live network.connect for apply_attach_decky
After the DB writes that record the multi-home edge, calls the docker
SDK directly to add an interface to the base container's netns:

  client.networks.get(<topology bridge>).connect(<base>, ipv4_address=ip)

Non-destructive — the base keeps running, no recreate.  Service
containers automatically see the new interface because they share
the base's netns via network_mode: service:<base>.

Idempotency: docker APIError with 'already' / 'endpoint exists' is
logged at info and treated as success.  Other errors log + leave the
DB row in place; an operator retry will hit the same path.
2026-04-29 00:15:11 -04:00
eed55619cb feat(mutator): live teardown for apply_remove_decky
Captures the decky's name and services list before delete_topology_decky
runs (the helper needs both as compose targets even though the DB row
is gone), then calls _materialise_decky_remove which stops + rm -f's
the base + per-service containers via 'docker compose stop / rm -f'.

Re-renders the per-topology compose AFTER the stop/rm so a future
'compose up -d' on the file doesn't try to bring the decky back.
2026-04-29 00:14:44 -04:00
8c06190e69 feat(mutator): live spawn for apply_add_decky + shared materialisation helpers
Adds _materialise_decky_{spawn,remove,connect,disconnect,services_diff,recreate_base}
helpers alongside the existing _materialise_lan_change.  Each follows
the same skip rules: bail when topology is not active/degraded, when
agent-pinned, or when docker calls fail (logged, not re-raised — DB
remains source of truth).

apply_add_decky now calls _materialise_decky_spawn after the DB writes.
The helper:

* re-renders the per-topology compose so it lists the new decky;
* runs 'compose up -d --no-deps --build <decky_base> <decky>-<svc>...'
  in a worker thread (matches engine/services_live's pattern).

Service container targets are filtered through get_service() so
fleet_singleton services are skipped — they don't have per-decky
compose entries.  Gateway (forwards_l3=True) deckies need no
special-case here; the compose generator already emits the host
'ports:' block for them.

Subsequent commits wire the other apply_* ops to the matching
helpers.  Tests for the full set ship in the workstream's last
commit.
2026-04-29 00:14:18 -04:00
578cdf9e2e fix(mutator): reject hostile apply_update_lan changes on live topologies
subnet and is_dmz are pinned at deploy time — live deckies bind to
the bridge with IPs allocated from the old subnet, and is_dmz flips
the docker network's internal flag which can't be changed while
containers are attached.  Today the op happily wrote the new value
into the DB and left docker on the old one, drifting the two surfaces.

apply_update_lan now raises MutationError when topology status is
active or degraded and the patch touches subnet or is_dmz.  Coord
(x/y) and rename updates still pass through; renames don't currently
have a live caller and the bridge's docker name keys off the lan name
in the renderer, so the next deploy will reconcile.

This matches the posture taken by _materialise_lan_change for live
LAN add/remove (commit 472c84b).
2026-04-29 00:12:44 -04:00
2731b2608b fix(ui): keep multi-homed deckies in their home LAN on rehydrate
list_topology_edges has no ORDER BY, so SQL row order is undefined.
After apply_attach_decky added a bridge edge to a second LAN, on
refetch the bridge edge could come back first — firstLanFor then
picked it as the decky's home and the visualization 'teleported' the
decky into the other LAN (the bug ANTI saw immediately after
connecting two deckies across LANs).

Hydration now prefers the non-bridge edge (is_bridge=false) as home.
apply_add_decky writes is_bridge=false for the original edge;
apply_attach_decky writes is_bridge=true for subsequent multi-homing
edges.  Picking the non-bridge edge is stable across row reordering.

Two-pass implementation: pass 1 sets pinned homes (DMZ for gateways,
non-bridge for others); pass 2 fills any gap with the first edge
(legacy rows where is_bridge was never written).
2026-04-29 00:01:29 -04:00
472c84b9c8 fix(mutator): materialise live LAN add/remove on docker, not just the DB
apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted
the topology_lans row but never created or destroyed the docker bridge
network.  Adding a LAN to a deployed topology silently did nothing on
the substrate side; any decky later attached to it had nowhere to bind.

Both ops now call a shared _materialise_lan_change helper after the DB
write.  When the topology is active/degraded and not pinned to a swarm
agent, the helper:

* creates / removes the docker bridge network (internal=True for
  non-DMZ LANs, mirroring engine/deployer.deploy_topology),
* re-renders the per-topology compose file so future redeploys reflect
  the change.

Failures are logged, not re-raised — the DB row stays as source of
truth so an operator can retry without leaking inconsistent state.
Agent-pinned topologies are skipped; the next agent push reconciles.

apply_add_decky / apply_attach_decky have the same gap and are not
fixed here — multi-homing a running container needs careful
recreate-vs-network-connect handling and is its own commit.  Without
those, dropping a decky into a freshly-added LAN still won't spawn a
container; only the LAN itself is now live.
2026-04-29 00:00:02 -04:00
bbed52a962 fix(bus): topic segments can't contain dots — service.added → service_added
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace.  My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:

  ValueError: topic segment 'service.added' may not contain '.', ...

Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler.  Also updated
the wiki entry with a note about the underscore.
2026-04-28 23:53:25 -04:00
d595240f55 fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.

After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'.  Operators see real state on the
topology page instead of an optimistic flag.

_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
2026-04-28 23:39:50 -04:00
9e8d0b0464 fix(ui): route palette drops + design-time remove through live API on active topologies
When topoStatus is active/degraded, editor.updateDecky enqueues into
the mutator queue and returns {kind:'enqueued'}.  The palette-drop
handler then short-circuits on that and never updates local state, so
a service dragged onto a deployed decky just vanishes — what ANTI saw
as 'no way to APPLY'.

Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's
service detail panel: enqueue + no local update = chip stays.

Both now route through liveAddService / liveRemoveService when the
topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services
directly and patching local state from the response.  Pending
topologies still queue through the mutator (correct: no live
containers to mutate).

Hoisted serviceRegistry / liveAddService / liveRemoveService above
the palette-drop callback so the deps array doesn't trip the const
TDZ at render time.
2026-04-28 23:38:37 -04:00
463877b8fc fix(ui): hit /topologies/ with trailing slash to keep bearer
FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and
the browser drops Authorization on the redirected URL — the topology
picker in the canary create modal was landing as 401 even for admins.
Hit the canonical (trailing-slash) path so the request resolves on the
first hop.
2026-04-28 23:18:39 -04:00
0e5484648f feat: forward decky.*.service.* on per-topology SSE stream
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:

* topology.{id}.>  — lifecycle (status, mutation.*) — unchanged.
* decky.>          — per-decky events, filtered by payload.topology_id
                     so a fleet decky sharing a name with a topology
                     decky doesn't leak across.

_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').

useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
2026-04-28 23:15:38 -04:00
e7d49d7237 feat(ui): live service add/remove on fleet DeckyCard
DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we
just shipped on the MazeNET Inspector.  Wired to POST/DELETE
/api/v1/deckies/{name}/services{,/svc}; the response's services list
flows back through onServicesChanged to update the parent's deckies
state without a refetch.

Gated on isAdmin && !decky.swarm — swarm deckies live on a remote
agent and the W3 endpoint runs docker compose locally, same gap as
the canary planter has for agent-pinned topologies.  Out of scope
here; flagged as a known limitation.

stopPropagation on the inline buttons + add-row container keeps the
card-level click (which selects the decky for inspection) from firing
on intra-row interactions.
2026-04-28 23:13:46 -04:00
1a631c9400 fix(ui): narrow services type for Inspector live-add picker
ObservedNode.services is the literal tuple ['*']; narrowing inside the
.filter() callback was tripping TS2345.  We already gate the live
controls on node.kind !== 'observed', so casting to readonly string[]
inside the filter is safe and keeps the discriminated union strict
elsewhere.
2026-04-28 23:11:39 -04:00
2fabcd1c29 feat(ui): live service add/remove on MazeNET Inspector
When the topology is active/degraded the Inspector switches services
chips into live controls: each chip gets a × button that DELETEs to
the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker
fed by useServiceRegistry().perDecky.

Pending topologies still use the existing design-time path
(onRemoveService → editor.updateDecky); the Inspector picks based on
topologyStatus, so an operator never accidentally hits a live API
call against a topology that isn't deployed yet.

The mutation handlers in MazeNET.tsx hit POST/DELETE
/api/v1/topologies/{id}/deckies/{name}/services{,/svc} and
optimistically apply the response's services list to local state.
Cross-tab reconciliation rides on the SSE forwarder shipped in the
follow-up commit.
2026-04-28 23:11:02 -04:00
06f208c86e feat: surface fleet_singleton flag on /topologies/services
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).

The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.

decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
2026-04-28 23:08:29 -04:00
4287e94deb feat(ui): file drops tab on CanaryTokens
CanaryTokens.tsx grows a third tab — File drops — alongside Tokens
and Blobs.  The page now covers every 'admin landed bytes on a decky'
operation in one place.

FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET
toggle, topology+decky picker, absolute-path validation matching the
backend (DeckyFileDropRequest rejects relative + ..-traversal), mode
+ mtime offset inputs, and a -1w preset for backdating.  FileReader →
data URL → strip prefix → POST /api/v1/deckies/files.

The list is local-only (localStorage, capped at 200 entries).  W2's
backend doesn't persist drops by design — the endpoint is for staging
payloads, not as an audit trail.  CLEAR LIST button on the tab; no
DELETE button on rows since the local entry doesn't track whether the
file is still there (an attacker may have moved it).

Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.
2026-04-28 23:06:53 -04:00
c942d4d333 feat(ui): scope canary tokens to MazeNET topology deckies
CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal.  In
topology mode we hydrate /topologies?status=active for the topology
picker, then GET /topologies/{id} on selection to repopulate the decky
picker — topology deckies have a different shape than fleet's /deckies
endpoint.

The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'),
and a third filter dropdown alongside state.  The drawer's metadata
section shows a Scope row with a clickable jump-link back to the
MazeNET view at the right topology.

CanaryTokenRow grows a topology_id field so the drawer/list can
discriminate without re-fetching.
2026-04-28 23:04:13 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
0bc4b05c73 feat(deckies): generic file drops on fleet + MazeNET deckies
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.

Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.

Path validation rejects relative paths and '..' traversal at the request
model layer.  Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
2026-04-28 22:43:34 -04:00
3fe999d706 feat(canary): allow custom canaries on MazeNET deckies via API
POST /api/v1/canary/tokens grows an optional topology_id field.  When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container.  Absent ⇒ fleet semantics, unchanged.

The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy).  GET /api/v1/canary/tokens accepts ?topology_id= as
a filter.  DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.

422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
2026-04-28 22:34:45 -04:00
5802de1f86 feat(canary): seed baseline canaries on MazeNET deckies
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.

The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
2026-04-28 22:30:11 -04:00
04b0637c24 feat(bounty): wire artifact download into BountyInspector drawer
The Vault page already shows file drops and stored mail (e3ddeb0) but
the inspector drawer had no download button — only the live-feed
ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD
RAW action to BountyInspector that fires when bounty_type=artifact,
hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's
own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling
and 400/403/404 error mapping.

Also widen the icon/label vocabulary: artifact bounties get FileText
(file drops) or Mail (message_stored) instead of the generic Package,
and the inspector header chip mirrors the change.
2026-04-28 22:03:58 -04:00
e3ddeb0395 feat(bounty): surface file drops and stored mail in the Vault
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.

Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.

Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
2026-04-28 19:42:54 -04:00
88f276e9e7 feat(collector): drop native unix daemon syslog from ingestion
sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon
all share the SSH/telnet decky containers and write to the same syslog
socket as DECNET's own emitters. Their output was being parsed and
ingested into the JSON stream, the dashboard, and the profiler — pure
noise: sshd's "Failed password for root from X" duplicates the
auth-helper's structured auth_attempt event, pam_unix repeats it again,
CRON/systemd say nothing about attacker behavior.

Drop these APP-NAMEs in _should_ingest before the JSON write and bus
publish. Raw .log file still captures everything for forensics. The
denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators
can extend it without code changes.
2026-04-28 19:21:39 -04:00
6055f9c837 fix(deckies): set MSGID=command on bash PROMPT_COMMAND syslog lines
Add --rfc5424 --msgid command to the logger invocation in SSH and telnet
decky bashrc. MSGID arrives as "command" instead of NIL, which is what
the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic
shipped in d4591b3 stays as a safety net for any future emitter that
forgets the flags or for inflight pre-rebuild containers.
2026-04-28 19:12:11 -04:00
d4591b38dc fix(profiler): aggregate bash PROMPT_COMMAND lines into attacker profile
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".

Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.
2026-04-28 19:09:41 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
DECNET CI
499836c9e4 chore: auto-release v0.2 [skip ci] v0.2 2026-04-13 11:50:02 +00:00
bb9c782c41 Merge pull request 'tofix/merge-testing-to-main' (#6) from tofix/merge-testing-to-main into main
Some checks failed
Release / Auto-tag release (push) Successful in 16s
Release / Build, scan & push conpot (push) Failing after 4m22s
Release / Build, scan & push elasticsearch (push) Failing after 4m37s
Release / Build, scan & push llmnr (push) Failing after 4m32s
Release / Build, scan & push mongodb (push) Failing after 4m35s
Release / Build, scan & push ldap (push) Failing after 4m44s
Release / Build, scan & push docker_api (push) Failing after 4m57s
Release / Build, scan & push imap (push) Failing after 4m50s
Release / Build, scan & push http (push) Failing after 4m59s
Release / Build, scan & push mssql (push) Failing after 4m28s
Release / Build, scan & push mqtt (push) Failing after 4m38s
Release / Build, scan & push ftp (push) Failing after 5m8s
Release / Build, scan & push k8s (push) Failing after 5m3s
Release / Build, scan & push mysql (push) Failing after 1m56s
Release / Build, scan & push redis (push) Has started running
Release / Build, scan & push rdp (push) Has been cancelled
Release / Build, scan & push pop3 (push) Has been cancelled
Release / Build, scan & push postgres (push) Has been cancelled
Release / Build, scan & push sip (push) Has started running
Release / Build, scan & push smb (push) Has started running
Release / Build, scan & push smtp (push) Has started running
Release / Build, scan & push snmp (push) Has started running
Release / Build, scan & push ssh (push) Has started running
Release / Build, scan & push telnet (push) Has started running
Release / Build, scan & push tftp (push) Has started running
Release / Build, scan & push vnc (push) Has started running
Reviewed-on: #6
2026-04-13 13:49:47 +02:00