- TopologyList header now uses .page-header + .page-title-group +
.page-sub like Dashboard/Attackers/DeckyFleet; title typography and
separator match the rest of the app.
- Pluralisation fix: '0 topologyies' → '0 TOPOLOGIES', singular '1
TOPOLOGY'.
- When the list is empty the EmptyState renders in its own flex
container that fills the viewport so the card is centered both
axes, with bumped icon/title/hint sizing for the hero treatment.
delete_topology_cascade manually deletes status_events, edges, deckies
and lans but overlooked topology_mutations, so deleting any topology
that ever had a mutation enqueued (i.e. edits while active|degraded)
failed with an FK IntegrityError. Add the missing DELETE and extend
the cascade test to seed a mutation row.
MazeNET header now reports '{running}/{total} DECKIES RUNNING' so
operators can see per-topology runtime status at a glance.
Dashboard ACTIVE DECKIES counters used to reflect only the fleet state
file; TopologyDecky rows (MazeNET deployments) are now added in —
deployed_deckies = fleet + all topology rows, active_deckies = fleet
(no runtime field) + topology rows whose state is 'running'.
Hovering the empty-state row in LiveLogs/Dashboard tables briefly lit
the full-width td with the data-row glow. Tag the placeholder tr with
.empty-row and scope the .logs-table hover rule to :not(.empty-row).
Base .empty-state now flex-centers its icon/title/hint/CTA with a
140px min-height so icon-bearing empty states in the Dashboard side
panels (DECKIES UNDER SIEGE, TOP ATTACKERS) stop looking cramped.
Component-scoped rules (attackers-root, bounty-root, logs-root)
remain more specific and are unaffected.
- New ShortcutsHelp modal enumerates global, nav G-chord and palette
bindings; openable via ? (Shift+/) or the command palette.
- / dispatches a global decnet:focus-search event; Attackers, Bounty
and LiveLogs listen and focus their in-page search inputs (pages
without a local search are skipped per plan).
- Respects the existing editable-element guard and Alt+K palette
toggle; no rebinds to prior shortcuts.
Replace ad-hoc empty-state markup across Dashboard, TopologyList,
LiveLogs, Attackers, Bounty, AttackerDetail, SwarmHosts, RemoteUpdates
and CommandPalette with the new <EmptyState> component. Themed icons
+ hints improve discoverability; TopologyList and SwarmHosts gain
CTAs to their respective creation flows.
Each page gets its own scoped stylesheet and is rewritten around the
shared design language: filter bars, paginated lists, empty-state
blocks, BountyInspector drawer. Behavioural surface is unchanged —
same API calls, same routes, same RBAC gating.
Rewrites Dashboard.tsx around three stacked panels — live interactions,
deckies-under-siege, and top-attackers — each with its own header,
empty state, and status accents. Dashboard.css fills in the supporting
grid + type system.
- CommandPalette (Alt+K): fuzzy action launcher with keyboard nav.
- Toasts: ephemeral notification stack + provider.
- useGlobalHotkeys: Alt+K palette toggle, G-chord navigation
(G D/F/M/L/B/A/S/U/E/C), respects editable-element focus.
- Layout/App: wire ToastProvider at root, mount the palette inside the
authed shell, introduce the global search box in the top bar.
- MazeNETRoute now renders TopologyList inline when no ?topology is
present, instead of bouncing through a redirect.
- index.css: a few global token tweaks consumed by the new chrome.
Fixes a latent breakage: Config.tsx and MazeNET already imported
./Toasts/useToast but the directory was never committed.
The DELETE path on a topology whose containers are still up is a
footgun — even if the backend rejects the delete, surfacing the
button invites mistakes. Gate it so DELETE only shows for pending,
failed, and torn-down topologies. Active/degraded/deploying topologies
must be torn down first, which then reveals DELETE again.
POST /topologies/{id}/lans previously called _auto_attach_gateway()
whenever a non-DMZ LAN was created, which wired the DMZ gateway decky
to every new subnet. That's why a deployed gateway ended up with
eth0..ethN on every LAN regardless of what the user drew in MazeNET.
Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time
validator (decnet/topology/validate.py:65-110) stays strict — users
must explicitly wire the gateway to each subnet they want bridged,
which is the whole point of having a topology editor.
useMazeApi.ts: drop stale auto-bridge reference from comment.
ArtifactDrawer, SessionDrawer, CreateTopologyWizard all now:
- close on ESC
- trap Tab/Shift+Tab focus within the panel
- lock body scroll while open
- restore prior focus on unmount
Uses the new useEscapeKey + useFocusTrap hooks. No visual changes;
the bespoke CSS shells (ctw-*, inline drawer styling) are preserved.
- Modal: shared backdrop/panel with ESC-close, backdrop-click-close,
focus trap, body scroll lock; supports center + drawer-right variants,
matrix/violet accents, default/wide widths.
- EmptyState: icon + title + hint + optional CTA; compact variant
for tight rails.
- useEscapeKey, useFocusTrap: reusable hooks powering Modal; will also
be adopted by CommandPalette and ContextMenu in follow-up commits.
No retrofits yet — primitives only. tsc clean.
Pan drag previously required mousedown on the bare canvas (target ===
currentTarget). When zoomed in, net-boxes cover most of the viewport
so there was no bare grid to grab. Drop the guard — node/header/port/
resize handlers all call stopPropagation() already, so only net-box
body mousedowns bubble up to start the pan, which is exactly what
we want.
Wheel-to-zoom anchored at the cursor, ZOOM IN/OUT toolbar buttons, and
a live zoom% in the status bar. Pan layer gets transform-origin 0 0 and
a scale(zoom) factor; grid pattern tile scales with zoom; edge SVG is
overflow:visible so long edges don't clip at high zoom. World-space
hit-testing, resize deltas, and palette drops all divide by zoom.
Reset View zeroes pan AND zoom.
Clicking a service tag selects it (stops node drag), extends Selection
discriminant with {type:'service',id,nodeId}, and renders an inspector
panel showing proto/port/subnet/risk chip + REMOVE SERVICE button
(gated off for observed nodes and degraded topologies). Service-tag
styling now pulls `risk` from DEFAULT_SERVICES metadata instead of
node.status alone.
Reverse of init, step-by-step: systemctl disable --now decnet.target,
remove every decnet-*.service + decnet.target unit file, drop the
polkit rule, drop the tmpfiles.d entry, daemon-reload, remove
/etc/decnet + /etc/decnet/config.ini, /run/decnet, /opt/decnet, and
userdel/groupdel the decnet identity.
Preserves /var/lib/decnet and /var/log/decnet by default — those
hold operator data. Pass `--deinit --purge` to rm -rf them too.
Idempotent on a clean host (every step prints [SKIP]). Honours
--dry-run.
5 new tests cover the full-undo path, --purge, idempotent clean-host
deinit, dry-run side-effect-free behaviour, and the --purge without
--deinit guard.
Creates the decnet system user/group, installs every unit file from
deploy/ into /etc/systemd/system, drops the polkit rule, seeds
/opt/decnet + /var/{lib,log}/decnet + /etc/decnet + /run/decnet,
writes a placeholder /etc/decnet/config.ini, applies the new
tmpfiles.d entry so /run/decnet survives reboots, daemon-reloads,
and `systemctl enable --now decnet.target`.
Idempotent (re-runs print [SKIP] on already-configured items),
--dry-run previews the plan without touching anything, --no-start
defers the target start, --force overwrites even matching unit
files. Master-only (added to MASTER_ONLY_COMMANDS).
9 orchestration tests cover the non-root gate, dry-run, useradd/
groupadd argv, SKIP on present user/group, unit-file idempotency,
--force overwrite, --no-start suppression, happy path, and the
"deploy/ not found" error message.
Units + polkit rule + systemd_control helper + start endpoints +
installed flag + UI wiring all landed. SWARM-host start/stop and
crash-quarantine policy stay as named deferrals.
Per-row START button enabled iff `installed && status !== 'ok'`;
tooltip explains why it's disabled ("Unit not installed" /
"Already running"). Transient `starting` state shows `...` on the
button and auto-clears after 15s so the UI never gets stuck if the
heartbeat is slow.
START ALL WORKERS button in the header calls /workers/start-all and
renders the three counts in the toast:
`STARTED · N · ALREADY RUNNING · M · FAILED · K (first failure: …)`.
Tone flips to alert when K > 0.
POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown
worker, 503 if the unit file is not installed, 502 if systemctl
returns non-zero (stderr snippet in detail, full stack logged).
Admin only.
POST /api/v1/workers/start-all — best-effort: walks the worker list
in dependency order (bus → api → data-plane), skips already-active
and uninstalled units, aggregates outcomes into
{started, already_running, failed[]}. Returns 200 even on partial
failure; the caller reads the three lists.
Both endpoints delegate to the systemd_control helper, so the attack
surface for "what gets executed" is locked to `decnet-<validated-name>
.service` at two layers (router KNOWN_WORKERS + helper regex).
Ships the backend half of Config → Workers:
* Worker registry aggregates `system.*.health` + `system.bus.health`
heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
(so the UI can explain "all UNKNOWN" when the bus socket is down)
and a per-row `installed` flag populated from
`systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
`system.<name>.control`; workers listen via the shared control
listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
sniffer / prober / mutator worker loops. API self-heartbeats too
so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
validation, control listener shutdown path, and the API surface
(auth gating, bus-connected field, unknown-name 404).
Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
Thin async wrapper over `systemctl` — never shell=True, always
create_subprocess_exec. Unit names are built from
`decnet-<validated-name>.service`; the regex check is defence in depth
on top of the router-level KNOWN_WORKERS validation.
Exposes start / stop / is_active / list_installed; last is cached for
30s to keep the Workers panel cheap under REFRESH spam. On non-systemd
hosts list_installed returns an empty set, so the UI renders with
every row marked not-installed instead of 500-ing.
Scoped rule — matches only `decnet-<name>.service` and `decnet.target`.
Any unit outside that regex falls through to the default polkit policy.
Required so the API (running as the `decnet` user) can invoke
`systemctl start decnet-<name>.service` non-interactively.
Adds the five missing worker units plus a grouping target so
`systemctl start decnet.target` brings the whole fleet up in order.
Sniffer gets CAP_NET_RAW for scapy; collector and mutator join the
docker supplementary group for docker.sock access. Repoints
Documentation= across all existing units to the canonical
git.resacachile.cl wiki.
Add tests/service_testing/test_instance_seed.py — pins NODE_NAME to assert
determinism of seeded functions and sweeps NODE_NAMEs to assert cross-fleet
divergence. Conftest gains load_real_instance_seed() so template tests see
the real seeding behavior instead of a stub. Existing template tests updated
to pin NODE_NAME and match seeded outputs.
Every service template now pulls version strings, cluster/node UUIDs, auth
salts, greeting banners, and uptime from the seeded per-instance RNG instead
of hard-coded defaults. Scanners sweeping the fleet now see legitimately
diverging fingerprints per decky while each decky's own responses stay
internally consistent across restarts.
Covers elasticsearch, ftp, http, https, ldap, mongodb, mqtt, mssql, mysql,
postgres, redis, and smtp templates.
Each decky now gets a deterministic-per-instance seeded RNG derived from
NODE_NAME, so cluster UUIDs, version strings, uptime, and credentials diverge
across the fleet while staying stable within one container. The canonical
helper lives at decnet/templates/instance_seed.py; the deployer copies it into
every active template build context alongside syslog_bridge.py. Dockerfiles
COPY it to /opt/ so server.py can import it.
Connection-time jitter intentionally stays unseeded — two hits to the same
decky must not replay the same latency curve.
The ssh and telnet services hard-coded /var/lib/decnet/artifacts as the host
quarantine mount. Read it from DECNET_ARTIFACTS_ROOT with the same default so
dev/rootless deploys can point it elsewhere.
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
Adds asciinema-player dependency, SessionDrawer.tsx that pages the
transcripts API (500 events per request) and rebuilds a v2 .cast blob
for playback, and a Session Transcripts section in AttackerDetail that
deep-links into the drawer. Truncation banner surfaces the 10 MB
per-session cap when it's been hit.
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
Build login-session into both images as the swapped root shell, add a
quarantine bind mount for telnet (symmetric to SSH), seed transcripts/
dir and service discriminant at entrypoint. Deployer syncs sessrec.c +
Makefile into each build context alongside the existing syslog_bridge
helper. sessrec falls back to /etc/sessrec.service when env is stripped
(busybox /bin/login).
New decnet/templates/_shared/sessrec/ — a small C program installed as the
login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each
chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid,
and emits one RFC 5424 session_recorded line on exit (direct to PID 1's
stdout, same pattern syslog_bridge.py uses).
Storage: one shard per (decky, UTC day) at
/var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent
appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND
interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk-
free precheck (<200 MB) falls through to plain bash with a session_skipped
log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0),
or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses
replays stay aligned.
Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session
(plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD
argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so
wiring the existing argv_zap.so into this path needs a separate wrapper.
DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck
covers the worst case but can be blinded by a one-shot disk fill.
Exposes POST /topologies/reap-orphans via an arm-to-confirm button in
the topology list header. Shows a transient status line with removal
counts or the error. Admin-only on the backend; non-admins see the 403.
Topology rows deleted without a proper teardown leave Docker containers
and bridge networks behind, holding IPAM pools that cause 403 "Pool
overlaps" on the next deploy at the same subnet.
- engine/reaper.py walks the local Docker daemon, extracts the 8-char
topology prefix from every decnet_t_* resource, and force-removes
containers + networks whose prefix is not in the repo.
- POST /api/v1/topologies/reap-orphans (admin-only) returns a report
of live/orphan prefixes and what was removed.
- Resources belonging to live topologies are never touched; per-resource
errors are captured without aborting the sweep.
When create_bridge_network or compose-up raised mid-deploy, the
deployer marked the topology FAILED and re-raised — but left every
network it had already created alive. The next deploy attempt tripped
over the orphans with 'Pool overlaps with other one on this address
space' (IPAM conflict).
Track networks created in the current attempt; on exception, tear down
the started compose stack (if any), remove the networks in reverse
order, and delete the compose file before marking FAILED. Rollback
errors are logged but never mask the original failure.
Covered by a new regression test that drives a docker client which
succeeds once then raises, and asserts every created network is also
removed.
useTopologyEditor imported 'UseMazeApi' but the actual exported type
is 'MazeApi'. tsc --noEmit missed it because the file isn't in the
default tsconfig include path; tsc -b (project references, used by
'npm run build') catches it.
apply_attach_decky requires an existing decky, so the MazeNET editor
had no way to grow a live topology: creating a new decky on active
topologies 409'd on the direct-CRUD createDecky call.
- Backend: new apply_add_decky that creates the decky row + its
home-LAN edge atomically, auto-allocating an IP if none pinned.
Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS
Literal + CLI help text.
- Tests: 3 new ops tests (happy path, duplicate-name rejection,
missing-LAN rejection) plus dispatch coverage update.
- Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending
routes through createDecky + attachEdge as before; active routes
through a single add_decky enqueue. MazeNET.tsx drag-archetype,
duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the
composite so active topologies stop 409'ing on new-decky drops.
useTopologyEditor now branches on topoStatus: pending keeps direct CRUD,
active/degraded routes through enqueueMutation with expected_version.
Every primitive returns a tagged PrimitiveResult; callers skip local
state updates on enqueued and wait for the SSE mutation.applied refetch
to reflect DB truth.
- remove_lan/remove_decky/detach_decky: direct name-keyed enqueues.
- update_decky/update_lan: services/x/y lifted to top-level payload keys,
remainder placed under patch (matches apply_update_* contract).
- attach_decky: enqueued with decky+lan names; requires the decky to
already exist (Phase B step 3 adds the create+attach composite).
- createDecky stays direct-CRUD this pass — no add_decky op yet, so
new-decky drag will 409 on active until a follow-up commit.
- MazeNET surfaces mutation.failed payload.reason/error into actionErr
so the status bar tells the user WHY a queue op was rejected.
apply_update_decky only merged payload.patch into decky_config. Since
services is a separate DB column, there was no way to replace a decky's
services list via a mutation. Add a top-level services key to the op
payload that maps straight onto the services column.
Unblocks the MazeNET editor routing service-add/service-drop actions
through the mutation queue on active topologies.
Phase B step 1 of DEBT-030: introduce a status-aware editor hook that
wraps useMazeApi. Every primitive currently pass-throughs to direct
CRUD and returns {kind: 'applied', data} — behavior is unchanged.
Follow-up commits route active/degraded topologies through
enqueueMutation when status != pending.
Also tighten the SSE LIVE indicator: flip setStreamLive(true) only on
snapshot, mutation.*, or status events, not on any incidental frame.
The mutation-event stream landed this session closes the "deckies are
atomic nodes" gap for service-list changes, but substrate identity is
really ``(service, implementation_fingerprint)``. A base-image
rebuild that rotates OpenSSH 8.4 → 9.2 without changing the service
list is invisible to the correlation graph today because the prober's
dedup set is in-memory and per-run — no cross-run diff, no
"fingerprint changed" event.
DEBT-032 documents the required piece: a per-(decky, service,
probe_type) persistence layer + diff-on-change emission, with the
correlator's existing mutation-marker interleaving pattern as the
model for fingerprint markers. A mutation-vs-fingerprint divergence
detector then falls out of the data model for free — fingerprint drift
without a preceding mutation ⇒ substrate_divergence finding.