Commit Graph

545 Commits

Author SHA1 Message Date
e14527b382 feat(web): reskin Attackers, Bounty, and LiveLogs pages
Each page gets its own scoped stylesheet and is rewritten around the
shared design language: filter bars, paginated lists, empty-state
blocks, BountyInspector drawer. Behavioural surface is unchanged —
same API calls, same routes, same RBAC gating.
2026-04-22 17:15:35 -04:00
1518475946 feat(web/dashboard): reskin with richer live-activity panels
Rewrites Dashboard.tsx around three stacked panels — live interactions,
deckies-under-siege, and top-attackers — each with its own header,
empty state, and status accents. Dashboard.css fills in the supporting
grid + type system.
2026-04-22 17:15:27 -04:00
ccbe949238 feat(web): command palette, toasts, and global shell chrome
- CommandPalette (Alt+K): fuzzy action launcher with keyboard nav.
- Toasts: ephemeral notification stack + provider.
- useGlobalHotkeys: Alt+K palette toggle, G-chord navigation
  (G D/F/M/L/B/A/S/U/E/C), respects editable-element focus.
- Layout/App: wire ToastProvider at root, mount the palette inside the
  authed shell, introduce the global search box in the top bar.
- MazeNETRoute now renders TopologyList inline when no ?topology is
  present, instead of bouncing through a redirect.
- index.css: a few global token tweaks consumed by the new chrome.

Fixes a latent breakage: Config.tsx and MazeNET already imported
./Toasts/useToast but the directory was never committed.
2026-04-22 17:15:19 -04:00
dca6eddd5f feat(web/topology): hide DELETE on running topologies
The DELETE path on a topology whose containers are still up is a
footgun — even if the backend rejects the delete, surfacing the
button invites mistakes. Gate it so DELETE only shows for pending,
failed, and torn-down topologies. Active/degraded/deploying topologies
must be torn down first, which then reveals DELETE again.
2026-04-22 17:14:17 -04:00
6f537f52c2 fix(topology): remove DMZ gateway auto-attach on LAN create
POST /topologies/{id}/lans previously called _auto_attach_gateway()
whenever a non-DMZ LAN was created, which wired the DMZ gateway decky
to every new subnet. That's why a deployed gateway ended up with
eth0..ethN on every LAN regardless of what the user drew in MazeNET.

Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time
validator (decnet/topology/validate.py:65-110) stays strict — users
must explicitly wire the gateway to each subnet they want bridged,
which is the whole point of having a topology editor.

useMazeApi.ts: drop stale auto-bridge reference from comment.
2026-04-22 17:14:09 -04:00
8632cee40a feat(web): retrofit drawers + CreateTopologyWizard with ESC/focus-trap
ArtifactDrawer, SessionDrawer, CreateTopologyWizard all now:
- close on ESC
- trap Tab/Shift+Tab focus within the panel
- lock body scroll while open
- restore prior focus on unmount

Uses the new useEscapeKey + useFocusTrap hooks. No visual changes;
the bespoke CSS shells (ctw-*, inline drawer styling) are preserved.
2026-04-22 17:09:45 -04:00
d0463c2c16 feat(web): add Modal + EmptyState primitives and a11y hooks
- Modal: shared backdrop/panel with ESC-close, backdrop-click-close,
  focus trap, body scroll lock; supports center + drawer-right variants,
  matrix/violet accents, default/wide widths.
- EmptyState: icon + title + hint + optional CTA; compact variant
  for tight rails.
- useEscapeKey, useFocusTrap: reusable hooks powering Modal; will also
  be adopted by CommandPalette and ContextMenu in follow-up commits.

No retrofits yet — primitives only. tsc clean.
2026-04-22 17:04:37 -04:00
73ccf12678 fix(web): allow MazeNET canvas pan from inside net-box body
Pan drag previously required mousedown on the bare canvas (target ===
currentTarget). When zoomed in, net-boxes cover most of the viewport
so there was no bare grid to grab. Drop the guard — node/header/port/
resize handlers all call stopPropagation() already, so only net-box
body mousedowns bubble up to start the pan, which is exactly what
we want.
2026-04-22 16:43:38 -04:00
ef60b086ba feat(web): MazeNET canvas pan + zoom (0.25×–2.5×)
Wheel-to-zoom anchored at the cursor, ZOOM IN/OUT toolbar buttons, and
a live zoom% in the status bar. Pan layer gets transform-origin 0 0 and
a scale(zoom) factor; grid pattern tile scales with zoom; edge SVG is
overflow:visible so long edges don't clip at high zoom. World-space
hit-testing, resize deltas, and palette drops all divide by zoom.
Reset View zeroes pan AND zoom.
2026-04-22 16:40:47 -04:00
1f429cd00e feat(web): MazeNET 7b — service-level selection + inspector panel
Clicking a service tag selects it (stops node drag), extends Selection
discriminant with {type:'service',id,nodeId}, and renders an inspector
panel showing proto/port/subnet/risk chip + REMOVE SERVICE button
(gated off for observed nodes and degraded topologies). Service-tag
styling now pulls `risk` from DEFAULT_SERVICES metadata instead of
node.status alone.
2026-04-22 15:56:55 -04:00
6fbac5d057 feat(web): MazeNET 7a — canvas chrome + node-head visuals
Toolbar (RESET VIEW / AUTO-LAYOUT), status bar (GRAPH LIVE + pan + as-of
timestamp), 4-row legend, and archetype icon + status dot in each node
head.
2026-04-22 15:54:11 -04:00
91111ea7ee feat(cli): add decnet init --deinit to undo a previous bootstrap
Reverse of init, step-by-step: systemctl disable --now decnet.target,
remove every decnet-*.service + decnet.target unit file, drop the
polkit rule, drop the tmpfiles.d entry, daemon-reload, remove
/etc/decnet + /etc/decnet/config.ini, /run/decnet, /opt/decnet, and
userdel/groupdel the decnet identity.

Preserves /var/lib/decnet and /var/log/decnet by default — those
hold operator data. Pass `--deinit --purge` to rm -rf them too.
Idempotent on a clean host (every step prints [SKIP]). Honours
--dry-run.

5 new tests cover the full-undo path, --purge, idempotent clean-host
deinit, dry-run side-effect-free behaviour, and the --purge without
--deinit guard.
2026-04-22 14:31:56 -04:00
3dae44c652 feat(cli): add decnet init one-shot master-host bootstrap
Creates the decnet system user/group, installs every unit file from
deploy/ into /etc/systemd/system, drops the polkit rule, seeds
/opt/decnet + /var/{lib,log}/decnet + /etc/decnet + /run/decnet,
writes a placeholder /etc/decnet/config.ini, applies the new
tmpfiles.d entry so /run/decnet survives reboots, daemon-reloads,
and `systemctl enable --now decnet.target`.

Idempotent (re-runs print [SKIP] on already-configured items),
--dry-run previews the plan without touching anything, --no-start
defers the target start, --force overwrites even matching unit
files. Master-only (added to MASTER_ONLY_COMMANDS).

9 orchestration tests cover the non-root gate, dry-run, useradd/
groupadd argv, SKIP on present user/group, unit-file idempotency,
--force overwrite, --no-start suppression, happy path, and the
"deploy/ not found" error message.
2026-04-22 14:28:11 -04:00
6d769edce0 docs(debt): mark DEBT-034 (worker supervisor) shipped
Units + polkit rule + systemd_control helper + start endpoints +
installed flag + UI wiring all landed. SWARM-host start/stop and
crash-quarantine policy stay as named deferrals.
2026-04-22 14:14:22 -04:00
49a6a674e6 feat(web): wire Workers panel START + START ALL buttons
Per-row START button enabled iff `installed && status !== 'ok'`;
tooltip explains why it's disabled ("Unit not installed" /
"Already running"). Transient `starting` state shows `...` on the
button and auto-clears after 15s so the UI never gets stuck if the
heartbeat is slow.

START ALL WORKERS button in the header calls /workers/start-all and
renders the three counts in the toast:
`STARTED · N · ALREADY RUNNING · M · FAILED · K (first failure: …)`.
Tone flips to alert when K > 0.
2026-04-22 14:13:58 -04:00
13ea916943 feat(workers): add start + start-all endpoints (systemd supervisor)
POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown
worker, 503 if the unit file is not installed, 502 if systemctl
returns non-zero (stderr snippet in detail, full stack logged).
Admin only.

POST /api/v1/workers/start-all — best-effort: walks the worker list
in dependency order (bus → api → data-plane), skips already-active
and uninstalled units, aggregates outcomes into
{started, already_running, failed[]}. Returns 200 even on partial
failure; the caller reads the three lists.

Both endpoints delegate to the systemd_control helper, so the attack
surface for "what gets executed" is locked to `decnet-<validated-name>
.service` at two layers (router KNOWN_WORKERS + helper regex).
2026-04-22 14:12:29 -04:00
0fbb07c2ec feat(workers): bus-backed Workers panel (registry, control, installed flag)
Ships the backend half of Config → Workers:

* Worker registry aggregates `system.*.health` + `system.bus.health`
  heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
  out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
  (so the UI can explain "all UNKNOWN" when the bus socket is down)
  and a per-row `installed` flag populated from
  `systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
  `system.<name>.control`; workers listen via the shared control
  listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
  sniffer / prober / mutator worker loops. API self-heartbeats too
  so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
  validation, control listener shutdown path, and the API surface
  (auth gating, bus-connected field, unknown-name 404).

Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
2026-04-22 14:10:39 -04:00
fcaac648a4 feat(web): add systemd_control helper for worker unit management
Thin async wrapper over `systemctl` — never shell=True, always
create_subprocess_exec. Unit names are built from
`decnet-<validated-name>.service`; the regex check is defence in depth
on top of the router-level KNOWN_WORKERS validation.

Exposes start / stop / is_active / list_installed; last is cached for
30s to keep the Workers panel cheap under REFRESH spam. On non-systemd
hosts list_installed returns an empty set, so the UI renders with
every row marked not-installed instead of 500-ing.
2026-04-22 14:08:35 -04:00
a41ef52249 chore(polkit): allow decnet group to manage decnet-*.service without password
Scoped rule — matches only `decnet-<name>.service` and `decnet.target`.
Any unit outside that regex falls through to the default polkit policy.
Required so the API (running as the `decnet` user) can invoke
`systemctl start decnet-<name>.service` non-interactively.
2026-04-22 14:07:17 -04:00
f21453afdc chore(systemd): add units for collector/profiler/sniffer/prober/mutator + decnet.target
Adds the five missing worker units plus a grouping target so
`systemctl start decnet.target` brings the whole fleet up in order.
Sniffer gets CAP_NET_RAW for scapy; collector and mutator join the
docker supplementary group for docker.sock access. Repoints
Documentation= across all existing units to the canonical
git.resacachile.cl wiki.
2026-04-22 14:06:42 -04:00
90d0c3b206 chore(deps): add pydeps to dev extras
Dev-only dependency-graph tool; no runtime impact.
2026-04-22 09:24:35 -04:00
a63708a3d1 test(templates): cover instance_seed helper and update service tests
Add tests/service_testing/test_instance_seed.py — pins NODE_NAME to assert
determinism of seeded functions and sweeps NODE_NAMEs to assert cross-fleet
divergence. Conftest gains load_real_instance_seed() so template tests see
the real seeding behavior instead of a stub. Existing template tests updated
to pin NODE_NAME and match seeded outputs.
2026-04-22 09:24:28 -04:00
3fb84ac5d0 feat(templates): per-instance stealth via instance_seed in service servers
Every service template now pulls version strings, cluster/node UUIDs, auth
salts, greeting banners, and uptime from the seeded per-instance RNG instead
of hard-coded defaults. Scanners sweeping the fleet now see legitimately
diverging fingerprints per decky while each decky's own responses stay
internally consistent across restarts.

Covers elasticsearch, ftp, http, https, ldap, mongodb, mqtt, mssql, mysql,
postgres, redis, and smtp templates.
2026-04-22 09:24:16 -04:00
51e9e263ca feat(templates): add instance_seed stealth helper and wire into template builds
Each decky now gets a deterministic-per-instance seeded RNG derived from
NODE_NAME, so cluster UUIDs, version strings, uptime, and credentials diverge
across the fleet while staying stable within one container. The canonical
helper lives at decnet/templates/instance_seed.py; the deployer copies it into
every active template build context alongside syslog_bridge.py. Dockerfiles
COPY it to /opt/ so server.py can import it.

Connection-time jitter intentionally stays unseeded — two hits to the same
decky must not replay the same latency curve.
2026-04-22 09:24:04 -04:00
6bbb2376f7 refactor(services): make artifact root configurable via DECNET_ARTIFACTS_ROOT
The ssh and telnet services hard-coded /var/lib/decnet/artifacts as the host
quarantine mount. Read it from DECNET_ARTIFACTS_ROOT with the same default so
dev/rootless deploys can point it elsewhere.
2026-04-22 09:23:36 -04:00
6725197d58 test(web): transcripts API + attacker-transcripts router coverage
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
2026-04-21 23:11:40 -04:00
246a82774b feat(web): SessionDrawer + Sessions tab in AttackerDetail
Adds asciinema-player dependency, SessionDrawer.tsx that pages the
transcripts API (500 events per request) and rebuilds a v2 .cast blob
for playback, and a Session Transcripts section in AttackerDetail that
deep-links into the drawer. Truncation banner surfaces the 10 MB
per-session cap when it's been hit.
2026-04-21 23:08:39 -04:00
6e522c5a55 feat(web): transcripts API + repository lookups
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
2026-04-21 23:06:39 -04:00
a58d42e492 feat(templates): wire SSH+Telnet to sessrec transcript recorder
Build login-session into both images as the swapped root shell, add a
quarantine bind mount for telnet (symmetric to SSH), seed transcripts/
dir and service discriminant at entrypoint. Deployer syncs sessrec.c +
Makefile into each build context alongside the existing syslog_bridge
helper. sessrec falls back to /etc/sessrec.service when env is stripped
(busybox /bin/login).
2026-04-21 23:03:42 -04:00
4596c1d69a feat(templates): add sessrec pty transcript recorder
New decnet/templates/_shared/sessrec/ — a small C program installed as the
login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each
chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid,
and emits one RFC 5424 session_recorded line on exit (direct to PID 1's
stdout, same pattern syslog_bridge.py uses).

Storage: one shard per (decky, UTC day) at
/var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent
appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND
interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk-
free precheck (<200 MB) falls through to plain bash with a session_skipped
log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0),
or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses
replays stay aligned.

Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session
(plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD
argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so
wiring the existing argv_zap.so into this path needs a separate wrapper.

DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck
covers the worst case but can be blinded by a one-shot disk fill.
2026-04-21 22:56:42 -04:00
3d047f2100 feat(web): wire REAP ORPHANS button in topology list
Exposes POST /topologies/reap-orphans via an arm-to-confirm button in
the topology list header. Shows a transient status line with removal
counts or the error. Admin-only on the backend; non-admins see the 403.
2026-04-21 22:17:04 -04:00
8f25ff677f feat(engine,api): add orphan topology resource reaper
Topology rows deleted without a proper teardown leave Docker containers
and bridge networks behind, holding IPAM pools that cause 403 "Pool
overlaps" on the next deploy at the same subnet.

- engine/reaper.py walks the local Docker daemon, extracts the 8-char
  topology prefix from every decnet_t_* resource, and force-removes
  containers + networks whose prefix is not in the repo.
- POST /api/v1/topologies/reap-orphans (admin-only) returns a report
  of live/orphan prefixes and what was removed.
- Resources belonging to live topologies are never touched; per-resource
  errors are captured without aborting the sweep.
2026-04-21 22:13:44 -04:00
85bb0e2f65 fix(engine): roll back partial Docker state on deploy failure
When create_bridge_network or compose-up raised mid-deploy, the
deployer marked the topology FAILED and re-raised — but left every
network it had already created alive. The next deploy attempt tripped
over the orphans with 'Pool overlaps with other one on this address
space' (IPAM conflict).

Track networks created in the current attempt; on exception, tear down
the started compose stack (if any), remove the networks in reverse
order, and delete the compose file before marking FAILED. Rollback
errors are logged but never mask the original failure.

Covered by a new regression test that drives a docker client which
succeeds once then raises, and asserts every created network is also
removed.
2026-04-21 20:23:03 -04:00
9ea0abc321 fix(web): correct MazeApi type import in useTopologyEditor
useTopologyEditor imported 'UseMazeApi' but the actual exported type
is 'MazeApi'. tsc --noEmit missed it because the file isn't in the
default tsconfig include path; tsc -b (project references, used by
'npm run build') catches it.
2026-04-21 20:15:24 -04:00
c266d1b6e3 feat(mutator,web): add_decky op — create-and-attach in one mutation
apply_attach_decky requires an existing decky, so the MazeNET editor
had no way to grow a live topology: creating a new decky on active
topologies 409'd on the direct-CRUD createDecky call.

- Backend: new apply_add_decky that creates the decky row + its
  home-LAN edge atomically, auto-allocating an IP if none pinned.
  Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS
  Literal + CLI help text.
- Tests: 3 new ops tests (happy path, duplicate-name rejection,
  missing-LAN rejection) plus dispatch coverage update.
- Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending
  routes through createDecky + attachEdge as before; active routes
  through a single add_decky enqueue. MazeNET.tsx drag-archetype,
  duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the
  composite so active topologies stop 409'ing on new-decky drops.
2026-04-21 20:13:39 -04:00
8fd166470f feat(web): route editor actions through mutation queue on active topologies
useTopologyEditor now branches on topoStatus: pending keeps direct CRUD,
active/degraded routes through enqueueMutation with expected_version.
Every primitive returns a tagged PrimitiveResult; callers skip local
state updates on enqueued and wait for the SSE mutation.applied refetch
to reflect DB truth.

- remove_lan/remove_decky/detach_decky: direct name-keyed enqueues.
- update_decky/update_lan: services/x/y lifted to top-level payload keys,
  remainder placed under patch (matches apply_update_* contract).
- attach_decky: enqueued with decky+lan names; requires the decky to
  already exist (Phase B step 3 adds the create+attach composite).
- createDecky stays direct-CRUD this pass — no add_decky op yet, so
  new-decky drag will 409 on active until a follow-up commit.
- MazeNET surfaces mutation.failed payload.reason/error into actionErr
  so the status bar tells the user WHY a queue op was rejected.
2026-04-21 19:58:29 -04:00
a93cbe76f9 feat(mutator): update_decky payload accepts top-level services list
apply_update_decky only merged payload.patch into decky_config. Since
services is a separate DB column, there was no way to replace a decky's
services list via a mutation. Add a top-level services key to the op
payload that maps straight onto the services column.

Unblocks the MazeNET editor routing service-add/service-drop actions
through the mutation queue on active topologies.
2026-04-21 19:56:58 -04:00
aa848d5260 feat(web): useTopologyEditor skeleton + explicit streamLive gate
Phase B step 1 of DEBT-030: introduce a status-aware editor hook that
wraps useMazeApi. Every primitive currently pass-throughs to direct
CRUD and returns {kind: 'applied', data} — behavior is unchanged.
Follow-up commits route active/degraded topologies through
enqueueMutation when status != pending.

Also tighten the SSE LIVE indicator: flip setStreamLive(true) only on
snapshot, mutation.*, or status events, not on any incidental frame.
2026-04-21 19:54:55 -04:00
cf5ba5cf2a docs(debt): open DEBT-032 — prober can't detect fingerprint rotation
The mutation-event stream landed this session closes the "deckies are
atomic nodes" gap for service-list changes, but substrate identity is
really ``(service, implementation_fingerprint)``.  A base-image
rebuild that rotates OpenSSH 8.4 → 9.2 without changing the service
list is invisible to the correlation graph today because the prober's
dedup set is in-memory and per-run — no cross-run diff, no
"fingerprint changed" event.

DEBT-032 documents the required piece: a per-(decky, service,
probe_type) persistence layer + diff-on-change emission, with the
correlator's existing mutation-marker interleaving pattern as the
model for fingerprint markers.  A mutation-vs-fingerprint divergence
detector then falls out of the data model for free — fingerprint drift
without a preceding mutation ⇒ substrate_divergence finding.
2026-04-21 19:38:41 -04:00
d4d8a2ad0d feat(correlation): interleave mutation markers into attacker traversals
Parser now tags ``mutator`` / ``decky_mutated`` lines with ``kind="mutation"``
so the engine can route them into a sibling ``_mutations`` index keyed
by decky name instead of the per-IP attacker index.  ``traversals()``
joins the two streams: every attacker gets a ``mutations_during`` list
of markers from touched deckies bounded by their first/last-seen
window.  ``AttackerTraversal.to_dict()`` grows a ``mutations_during``
field and a ``timeline`` that chronologically interleaves hops and
markers, so an ``SSH at T5 → mutation at T6 → HTTP at T7`` substrate
transition is visible to UI consumers instead of reading as a silent
discontinuity.

The existing hops-only JSON shape is preserved; old clients that
ignore unknown keys keep working.
2026-04-21 19:37:35 -04:00
bf5ed7abbb feat(engine): emit creation/retirement mutation events on deploy/teardown
Close the lifecycle loop for the correlation graph: every decky now
enters the substrate with an explicit `trigger=creation` event
(old_services=[] ⇒ new_services=<initial>) and leaves it with
`trigger=retirement` (old=<current> ⇒ new=[]).  With scheduled/operator
mutations already flowing through emit_decky_mutated, the entire decky
lifecycle is now a well-formed sequence of mutation events — the
correlator can fold substrate_state(t) at any T by replaying them.

Lazy-imports mutator.events to dodge the engine↔mutator circular
dependency.  Bus is None at CLI sites; the syslog write is what the
correlator consumes.  Emission is soft-failing so a broken log path
never aborts a deploy.
2026-04-21 19:35:05 -04:00
fa0cdb3ab5 feat(mutator): route mutate_decky through emit_decky_mutated with trigger
Mutator now emits one decky_mutated event (RFC 5424 + bus) per
successful mutation instead of the inline decky.<id>.state bus
publish.  The previous state topic published new_services only;
mutation events carry old/new/trigger, which is what the correlation
engine needs to interleave substrate-change markers into attacker
traversals.

- mutate_decky gains trigger: MutationTrigger = "operator" and
  captures old_services before the shuffle; replaces the inline
  _publish_safely(decky.<id>.state) with emit_decky_mutated(...).
- mutate_all derives trigger internally: operator when force or
  only-filter is set (CLI --all, API mutate-now, UI bus request);
  scheduled on interval ticks.  Passed through to each mutate_decky
  call.
- Tests updated: the old decky.<id>.state assertion is replaced
  with decky.<id>.mutation topic + mutation payload shape; 3 new
  tests cover trigger derivation for scheduled / force / only paths.
  26 tests in test_mutator.py green; 116 across mutator + topology
  + bus.
2026-04-21 19:31:31 -04:00
f875350d75 feat(mutator): emit_decky_mutated helper — RFC 5424 + bus in one call
First step toward making mutation events first-class nodes in the
correlation graph. Today the graph silently reflects post-mutation
state with no marker of the transition; this helper lands the
emitter the mutator and deploy paths will call.

- decnet/mutator/events.py: emit_decky_mutated(bus, *, decky,
  old_services, new_services, trigger, actor=None, log_path=None)
  writes an RFC 5424 line (service=mutator, hostname=<decky>,
  MSGID=decky_mutated, SD params for old/new services + trigger +
  optional actor) to DECNET_INGEST_LOG_FILE, then fire-and-forget
  publishes on decky.<id>.mutation. Either side failing is soft —
  the other path still completes.
- MutationTrigger Literal covers creation, retirement, scheduled,
  operator, behavioral, healer, federation. Reserved values for v2/v3
  (behavioral + federation) stay nullable so the schema is stable.
- decnet/bus/topics.py: DECKY_MUTATION constant + decky_mutation(id)
  builder. Distinct from DECKY_STATE ("current shape") because a
  mutation is a transition event, not a steady-state snapshot.
- Empty-set symmetry: creation emits old_services=[], retirement
  emits new_services=[]. Every decky lifecycle becomes a well-formed
  fold sequence on the correlator side.
- 4 new tests: FakeBus + correlator parser round-trip; creation and
  retirement empty-set cases; bus=None still writes syslog;
  unwritable log path doesn't block bus publish. 95 tests green
  across test_mutator + tests/bus.
2026-04-21 19:29:21 -04:00
e23c6c4ee4 feat(mutator): bus-wake on decky mutate_request; adaptive sleep; heartbeat
The flat-fleet mutator was DB-poll-only and noisy — it logged
"no active deployment found" every 10s on idle hosts and ran
mutate_all at a fixed tick regardless of when the next decky
was due.

- mutate_all returns seconds-until-next-due; watch loop sleeps
  min(next_due, poll_interval_secs) with a 1s floor.
- "No deployment" is now idle, not an error: edge-triggered log
  on present<->absent transition instead of every tick.
- mutate_decky publishes decky.<name>.state on successful compose
  so UIs react in real time.
- New decky.*.mutate_request subscription lets API/CLI/UI force
  an immediate mutation of a specific decky without waiting for
  its interval; target name feeds mutate_all(only={...}).
- system.mutator.health heartbeat via run_health_heartbeat helper,
  bringing the mutator in line with DEBT-031 workers.

Tests: next_due return, only= filter, decky.<name>.state publish
on success, no publish on compose failure. Full mutator+topology-
mutator+bus suite (109) green.
2026-04-21 19:28:01 -04:00
f76fc09caf docs(debt): mark DEBT-031 resolved; document deferrals
All nine service workers now participate in the host-local bus: sniffer,
prober, correlator (via profiler), profiler, collector, ingester, agent,
forwarder, updater.  Pre-bus behavior is preserved end-to-end for
DECNET_BUS_ENABLED=false and get_bus() failures.

Three items intentionally deferred: realism-probe decky.{id}.state
(needs a realism probe path that doesn't exist yet), correlator session
boundaries (needs session state), and bus-wake subscriptions (publishes
landed; wake side wired to no subscriber today).
2026-04-21 17:02:57 -04:00
5c0631e12c feat(agent,forwarder,updater): publish system.<worker>.health heartbeats (DEBT-031 workers 7-9)
All three workers now share a run_health_heartbeat helper in
decnet.bus.publish.  Each publishes system.<worker>.health on a 30s tick
with {worker, ts} plus optional per-worker extras.  Subscribers can
watch system.*.health to see every DECNET worker on a host at once.

- agent: heartbeat runs inside the FastAPI lifespan alongside the
  existing master-facing heartbeat; bus-disabled path is a no-op.
- forwarder: heartbeat task spawned at run_forwarder entry, cancelled
  in the finally block so a crashed master loop never leaks the task.
- updater: new FastAPI lifespan hosts the heartbeat.

Heartbeat helper swallows extra() failures and is cancellation-safe so
lifespan teardown never hangs on it.
2026-04-21 17:02:10 -04:00
cbb394a160 feat(ingester): publish system.log per committed batch (DEBT-031 worker 6)
Ingester connects the bus at startup, emits a batch-committed summary
(component/flushed/position) after each successful _flush_batch.  Zero-
row flushes are suppressed so the topic stays meaningful.

Complements the collector's per-line system.log publishes: collector
signals ingress, ingester signals DB-persisted progress.  Federation
forwarder (worker 8) will subscribe to the batch-committed leaf to
trigger its upstream push.

Bus stays optional: publish_safely swallows failures, get_bus() can
return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully
functional.
2026-04-21 16:58:49 -04:00
a448dbe283 feat(collector): publish system.log per ingested event (DEBT-031 worker 5)
log_collector_worker connects the bus at startup, builds a thread-safe
system.log publisher, and hands it to each container-stream thread
through _stream_container's new publish_fn parameter.  Publishing fires
right after the JSON record is written — same rate-limiter path, no
extra parsing, compact payload (decky/service/event_type/attacker_ip/
timestamp) so subscribers can redraw without re-reading the DB.

Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false the
factory returns a no-op publisher and the stream thread calls it
unconditionally.  Hook failures are logged and never abort the thread.
2026-04-21 16:57:21 -04:00
67c2e30f89 feat(profiler): publish attacker.scored per profile upsert (DEBT-031 worker 4)
The profiler worker threads its bus publisher through _WorkerState so
_update_profiles can emit a compact attacker.scored event for every
upsert.  Payload carries the headline counts (event/service/decky/
bounty/credential) plus is_traversal, so the MazeNET attacker pool can
redraw without a round-trip.

Bus stays optional: publish_attacker=None when DECNET_BUS_ENABLED=false
or get_bus() fails, and hook exceptions are logged without breaking the
upsert path.
2026-04-21 16:54:40 -04:00
e51b65d7c3 feat(correlation,profiler): publish attacker.observed on first sighting (DEBT-031 worker 3)
CorrelationEngine gains an optional publish_fn hook fired once per unique
attacker IP.  The profiler worker — sole caller of the engine today —
carries the bus physically, builds a thread-safe publisher, and wraps it
with the attacker.observed topic before handing it in.

Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false, the
engine runs publish_fn=None and the worker degrades to DB-only.  Hook
failures log a warning and never break ingestion.
2026-04-21 16:53:03 -04:00