subnet and is_dmz are pinned at deploy time — live deckies bind to
the bridge with IPs allocated from the old subnet, and is_dmz flips
the docker network's internal flag which can't be changed while
containers are attached. Today the op happily wrote the new value
into the DB and left docker on the old one, drifting the two surfaces.
apply_update_lan now raises MutationError when topology status is
active or degraded and the patch touches subnet or is_dmz. Coord
(x/y) and rename updates still pass through; renames don't currently
have a live caller and the bridge's docker name keys off the lan name
in the renderer, so the next deploy will reconcile.
This matches the posture taken by _materialise_lan_change for live
LAN add/remove (commit 472c84b).
list_topology_edges has no ORDER BY, so SQL row order is undefined.
After apply_attach_decky added a bridge edge to a second LAN, on
refetch the bridge edge could come back first — firstLanFor then
picked it as the decky's home and the visualization 'teleported' the
decky into the other LAN (the bug ANTI saw immediately after
connecting two deckies across LANs).
Hydration now prefers the non-bridge edge (is_bridge=false) as home.
apply_add_decky writes is_bridge=false for the original edge;
apply_attach_decky writes is_bridge=true for subsequent multi-homing
edges. Picking the non-bridge edge is stable across row reordering.
Two-pass implementation: pass 1 sets pinned homes (DMZ for gateways,
non-bridge for others); pass 2 fills any gap with the first edge
(legacy rows where is_bridge was never written).
apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted
the topology_lans row but never created or destroyed the docker bridge
network. Adding a LAN to a deployed topology silently did nothing on
the substrate side; any decky later attached to it had nowhere to bind.
Both ops now call a shared _materialise_lan_change helper after the DB
write. When the topology is active/degraded and not pinned to a swarm
agent, the helper:
* creates / removes the docker bridge network (internal=True for
non-DMZ LANs, mirroring engine/deployer.deploy_topology),
* re-renders the per-topology compose file so future redeploys reflect
the change.
Failures are logged, not re-raised — the DB row stays as source of
truth so an operator can retry without leaking inconsistent state.
Agent-pinned topologies are skipped; the next agent push reconciles.
apply_add_decky / apply_attach_decky have the same gap and are not
fixed here — multi-homing a running container needs careful
recreate-vs-network-connect handling and is its own commit. Without
those, dropping a decky into a freshly-added LAN still won't spawn a
container; only the LAN itself is now live.
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace. My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:
ValueError: topic segment 'service.added' may not contain '.', ...
Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler. Also updated
the wiki entry with a note about the underscore.
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.
After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'. Operators see real state on the
topology page instead of an optimistic flag.
_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
When topoStatus is active/degraded, editor.updateDecky enqueues into
the mutator queue and returns {kind:'enqueued'}. The palette-drop
handler then short-circuits on that and never updates local state, so
a service dragged onto a deployed decky just vanishes — what ANTI saw
as 'no way to APPLY'.
Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's
service detail panel: enqueue + no local update = chip stays.
Both now route through liveAddService / liveRemoveService when the
topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services
directly and patching local state from the response. Pending
topologies still queue through the mutator (correct: no live
containers to mutate).
Hoisted serviceRegistry / liveAddService / liveRemoveService above
the palette-drop callback so the deps array doesn't trip the const
TDZ at render time.
FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and
the browser drops Authorization on the redirected URL — the topology
picker in the canary create modal was landing as 401 even for admins.
Hit the canonical (trailing-slash) path so the request resolves on the
first hop.
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:
* topology.{id}.> — lifecycle (status, mutation.*) — unchanged.
* decky.> — per-decky events, filtered by payload.topology_id
so a fleet decky sharing a name with a topology
decky doesn't leak across.
_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').
useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we
just shipped on the MazeNET Inspector. Wired to POST/DELETE
/api/v1/deckies/{name}/services{,/svc}; the response's services list
flows back through onServicesChanged to update the parent's deckies
state without a refetch.
Gated on isAdmin && !decky.swarm — swarm deckies live on a remote
agent and the W3 endpoint runs docker compose locally, same gap as
the canary planter has for agent-pinned topologies. Out of scope
here; flagged as a known limitation.
stopPropagation on the inline buttons + add-row container keeps the
card-level click (which selects the decky for inspection) from firing
on intra-row interactions.
ObservedNode.services is the literal tuple ['*']; narrowing inside the
.filter() callback was tripping TS2345. We already gate the live
controls on node.kind !== 'observed', so casting to readonly string[]
inside the filter is safe and keeps the discriminated union strict
elsewhere.
When the topology is active/degraded the Inspector switches services
chips into live controls: each chip gets a × button that DELETEs to
the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker
fed by useServiceRegistry().perDecky.
Pending topologies still use the existing design-time path
(onRemoveService → editor.updateDecky); the Inspector picks based on
topologyStatus, so an operator never accidentally hits a live API
call against a topology that isn't deployed yet.
The mutation handlers in MazeNET.tsx hit POST/DELETE
/api/v1/topologies/{id}/deckies/{name}/services{,/svc} and
optimistically apply the response's services list to local state.
Cross-tab reconciliation rides on the SSE forwarder shipped in the
follow-up commit.
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).
The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.
decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
CanaryTokens.tsx grows a third tab — File drops — alongside Tokens
and Blobs. The page now covers every 'admin landed bytes on a decky'
operation in one place.
FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET
toggle, topology+decky picker, absolute-path validation matching the
backend (DeckyFileDropRequest rejects relative + ..-traversal), mode
+ mtime offset inputs, and a -1w preset for backdating. FileReader →
data URL → strip prefix → POST /api/v1/deckies/files.
The list is local-only (localStorage, capped at 200 entries). W2's
backend doesn't persist drops by design — the endpoint is for staging
payloads, not as an audit trail. CLEAR LIST button on the tab; no
DELETE button on rows since the local entry doesn't track whether the
file is still there (an attacker may have moved it).
Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.
CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal. In
topology mode we hydrate /topologies?status=active for the topology
picker, then GET /topologies/{id} on selection to repopulate the decky
picker — topology deckies have a different shape than fleet's /deckies
endpoint.
The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'),
and a third filter dropdown alongside state. The drawer's metadata
section shows a Scope row with a clickable jump-link back to the
MazeNET view at the right topology.
CanaryTokenRow grows a topology_id field so the drawer/list can
discriminate without re-fetching.
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes. The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:
* add: validate against decnet.services.registry (rejects unknown +
fleet_singleton); persist the new services list; re-render the
per-scope compose file (so future redeploys reflect the change);
run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
compose so a future up -d doesn't bring it back.
Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list. Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).
Four new admin endpoints:
* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}
ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.
Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.
Path validation rejects relative paths and '..' traversal at the request
model layer. Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
POST /api/v1/canary/tokens grows an optional topology_id field. When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container. Absent ⇒ fleet semantics, unchanged.
The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy). GET /api/v1/canary/tokens accepts ?topology_id= as
a filter. DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.
422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.
The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
The Vault page already shows file drops and stored mail (e3ddeb0) but
the inspector drawer had no download button — only the live-feed
ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD
RAW action to BountyInspector that fires when bounty_type=artifact,
hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's
own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling
and 400/403/404 error mapping.
Also widen the icon/label vocabulary: artifact bounties get FileText
(file drops) or Mail (message_stored) instead of the generic Package,
and the inspector header chip mirrors the change.
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.
Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.
Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon
all share the SSH/telnet decky containers and write to the same syslog
socket as DECNET's own emitters. Their output was being parsed and
ingested into the JSON stream, the dashboard, and the profiler — pure
noise: sshd's "Failed password for root from X" duplicates the
auth-helper's structured auth_attempt event, pam_unix repeats it again,
CRON/systemd say nothing about attacker behavior.
Drop these APP-NAMEs in _should_ingest before the JSON write and bus
publish. Raw .log file still captures everything for forensics. The
denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators
can extend it without code changes.
Add --rfc5424 --msgid command to the logger invocation in SSH and telnet
decky bashrc. MSGID arrives as "command" instead of NIL, which is what
the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic
shipped in d4591b3 stays as a safety net for any future emitter that
forgets the flags or for inflight pre-rebuild containers.
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".
Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.