Commit Graph

521 Commits

Author SHA1 Message Date
6bbb2376f7 refactor(services): make artifact root configurable via DECNET_ARTIFACTS_ROOT
The ssh and telnet services hard-coded /var/lib/decnet/artifacts as the host
quarantine mount. Read it from DECNET_ARTIFACTS_ROOT with the same default so
dev/rootless deploys can point it elsewhere.
2026-04-22 09:23:36 -04:00
6725197d58 test(web): transcripts API + attacker-transcripts router coverage
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
2026-04-21 23:11:40 -04:00
246a82774b feat(web): SessionDrawer + Sessions tab in AttackerDetail
Adds asciinema-player dependency, SessionDrawer.tsx that pages the
transcripts API (500 events per request) and rebuilds a v2 .cast blob
for playback, and a Session Transcripts section in AttackerDetail that
deep-links into the drawer. Truncation banner surfaces the 10 MB
per-session cap when it's been hit.
2026-04-21 23:08:39 -04:00
6e522c5a55 feat(web): transcripts API + repository lookups
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
2026-04-21 23:06:39 -04:00
a58d42e492 feat(templates): wire SSH+Telnet to sessrec transcript recorder
Build login-session into both images as the swapped root shell, add a
quarantine bind mount for telnet (symmetric to SSH), seed transcripts/
dir and service discriminant at entrypoint. Deployer syncs sessrec.c +
Makefile into each build context alongside the existing syslog_bridge
helper. sessrec falls back to /etc/sessrec.service when env is stripped
(busybox /bin/login).
2026-04-21 23:03:42 -04:00
4596c1d69a feat(templates): add sessrec pty transcript recorder
New decnet/templates/_shared/sessrec/ — a small C program installed as the
login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each
chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid,
and emits one RFC 5424 session_recorded line on exit (direct to PID 1's
stdout, same pattern syslog_bridge.py uses).

Storage: one shard per (decky, UTC day) at
/var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent
appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND
interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk-
free precheck (<200 MB) falls through to plain bash with a session_skipped
log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0),
or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses
replays stay aligned.

Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session
(plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD
argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so
wiring the existing argv_zap.so into this path needs a separate wrapper.

DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck
covers the worst case but can be blinded by a one-shot disk fill.
2026-04-21 22:56:42 -04:00
3d047f2100 feat(web): wire REAP ORPHANS button in topology list
Exposes POST /topologies/reap-orphans via an arm-to-confirm button in
the topology list header. Shows a transient status line with removal
counts or the error. Admin-only on the backend; non-admins see the 403.
2026-04-21 22:17:04 -04:00
8f25ff677f feat(engine,api): add orphan topology resource reaper
Topology rows deleted without a proper teardown leave Docker containers
and bridge networks behind, holding IPAM pools that cause 403 "Pool
overlaps" on the next deploy at the same subnet.

- engine/reaper.py walks the local Docker daemon, extracts the 8-char
  topology prefix from every decnet_t_* resource, and force-removes
  containers + networks whose prefix is not in the repo.
- POST /api/v1/topologies/reap-orphans (admin-only) returns a report
  of live/orphan prefixes and what was removed.
- Resources belonging to live topologies are never touched; per-resource
  errors are captured without aborting the sweep.
2026-04-21 22:13:44 -04:00
85bb0e2f65 fix(engine): roll back partial Docker state on deploy failure
When create_bridge_network or compose-up raised mid-deploy, the
deployer marked the topology FAILED and re-raised — but left every
network it had already created alive. The next deploy attempt tripped
over the orphans with 'Pool overlaps with other one on this address
space' (IPAM conflict).

Track networks created in the current attempt; on exception, tear down
the started compose stack (if any), remove the networks in reverse
order, and delete the compose file before marking FAILED. Rollback
errors are logged but never mask the original failure.

Covered by a new regression test that drives a docker client which
succeeds once then raises, and asserts every created network is also
removed.
2026-04-21 20:23:03 -04:00
9ea0abc321 fix(web): correct MazeApi type import in useTopologyEditor
useTopologyEditor imported 'UseMazeApi' but the actual exported type
is 'MazeApi'. tsc --noEmit missed it because the file isn't in the
default tsconfig include path; tsc -b (project references, used by
'npm run build') catches it.
2026-04-21 20:15:24 -04:00
c266d1b6e3 feat(mutator,web): add_decky op — create-and-attach in one mutation
apply_attach_decky requires an existing decky, so the MazeNET editor
had no way to grow a live topology: creating a new decky on active
topologies 409'd on the direct-CRUD createDecky call.

- Backend: new apply_add_decky that creates the decky row + its
  home-LAN edge atomically, auto-allocating an IP if none pinned.
  Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS
  Literal + CLI help text.
- Tests: 3 new ops tests (happy path, duplicate-name rejection,
  missing-LAN rejection) plus dispatch coverage update.
- Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending
  routes through createDecky + attachEdge as before; active routes
  through a single add_decky enqueue. MazeNET.tsx drag-archetype,
  duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the
  composite so active topologies stop 409'ing on new-decky drops.
2026-04-21 20:13:39 -04:00
8fd166470f feat(web): route editor actions through mutation queue on active topologies
useTopologyEditor now branches on topoStatus: pending keeps direct CRUD,
active/degraded routes through enqueueMutation with expected_version.
Every primitive returns a tagged PrimitiveResult; callers skip local
state updates on enqueued and wait for the SSE mutation.applied refetch
to reflect DB truth.

- remove_lan/remove_decky/detach_decky: direct name-keyed enqueues.
- update_decky/update_lan: services/x/y lifted to top-level payload keys,
  remainder placed under patch (matches apply_update_* contract).
- attach_decky: enqueued with decky+lan names; requires the decky to
  already exist (Phase B step 3 adds the create+attach composite).
- createDecky stays direct-CRUD this pass — no add_decky op yet, so
  new-decky drag will 409 on active until a follow-up commit.
- MazeNET surfaces mutation.failed payload.reason/error into actionErr
  so the status bar tells the user WHY a queue op was rejected.
2026-04-21 19:58:29 -04:00
a93cbe76f9 feat(mutator): update_decky payload accepts top-level services list
apply_update_decky only merged payload.patch into decky_config. Since
services is a separate DB column, there was no way to replace a decky's
services list via a mutation. Add a top-level services key to the op
payload that maps straight onto the services column.

Unblocks the MazeNET editor routing service-add/service-drop actions
through the mutation queue on active topologies.
2026-04-21 19:56:58 -04:00
aa848d5260 feat(web): useTopologyEditor skeleton + explicit streamLive gate
Phase B step 1 of DEBT-030: introduce a status-aware editor hook that
wraps useMazeApi. Every primitive currently pass-throughs to direct
CRUD and returns {kind: 'applied', data} — behavior is unchanged.
Follow-up commits route active/degraded topologies through
enqueueMutation when status != pending.

Also tighten the SSE LIVE indicator: flip setStreamLive(true) only on
snapshot, mutation.*, or status events, not on any incidental frame.
2026-04-21 19:54:55 -04:00
cf5ba5cf2a docs(debt): open DEBT-032 — prober can't detect fingerprint rotation
The mutation-event stream landed this session closes the "deckies are
atomic nodes" gap for service-list changes, but substrate identity is
really ``(service, implementation_fingerprint)``.  A base-image
rebuild that rotates OpenSSH 8.4 → 9.2 without changing the service
list is invisible to the correlation graph today because the prober's
dedup set is in-memory and per-run — no cross-run diff, no
"fingerprint changed" event.

DEBT-032 documents the required piece: a per-(decky, service,
probe_type) persistence layer + diff-on-change emission, with the
correlator's existing mutation-marker interleaving pattern as the
model for fingerprint markers.  A mutation-vs-fingerprint divergence
detector then falls out of the data model for free — fingerprint drift
without a preceding mutation ⇒ substrate_divergence finding.
2026-04-21 19:38:41 -04:00
d4d8a2ad0d feat(correlation): interleave mutation markers into attacker traversals
Parser now tags ``mutator`` / ``decky_mutated`` lines with ``kind="mutation"``
so the engine can route them into a sibling ``_mutations`` index keyed
by decky name instead of the per-IP attacker index.  ``traversals()``
joins the two streams: every attacker gets a ``mutations_during`` list
of markers from touched deckies bounded by their first/last-seen
window.  ``AttackerTraversal.to_dict()`` grows a ``mutations_during``
field and a ``timeline`` that chronologically interleaves hops and
markers, so an ``SSH at T5 → mutation at T6 → HTTP at T7`` substrate
transition is visible to UI consumers instead of reading as a silent
discontinuity.

The existing hops-only JSON shape is preserved; old clients that
ignore unknown keys keep working.
2026-04-21 19:37:35 -04:00
bf5ed7abbb feat(engine): emit creation/retirement mutation events on deploy/teardown
Close the lifecycle loop for the correlation graph: every decky now
enters the substrate with an explicit `trigger=creation` event
(old_services=[] ⇒ new_services=<initial>) and leaves it with
`trigger=retirement` (old=<current> ⇒ new=[]).  With scheduled/operator
mutations already flowing through emit_decky_mutated, the entire decky
lifecycle is now a well-formed sequence of mutation events — the
correlator can fold substrate_state(t) at any T by replaying them.

Lazy-imports mutator.events to dodge the engine↔mutator circular
dependency.  Bus is None at CLI sites; the syslog write is what the
correlator consumes.  Emission is soft-failing so a broken log path
never aborts a deploy.
2026-04-21 19:35:05 -04:00
fa0cdb3ab5 feat(mutator): route mutate_decky through emit_decky_mutated with trigger
Mutator now emits one decky_mutated event (RFC 5424 + bus) per
successful mutation instead of the inline decky.<id>.state bus
publish.  The previous state topic published new_services only;
mutation events carry old/new/trigger, which is what the correlation
engine needs to interleave substrate-change markers into attacker
traversals.

- mutate_decky gains trigger: MutationTrigger = "operator" and
  captures old_services before the shuffle; replaces the inline
  _publish_safely(decky.<id>.state) with emit_decky_mutated(...).
- mutate_all derives trigger internally: operator when force or
  only-filter is set (CLI --all, API mutate-now, UI bus request);
  scheduled on interval ticks.  Passed through to each mutate_decky
  call.
- Tests updated: the old decky.<id>.state assertion is replaced
  with decky.<id>.mutation topic + mutation payload shape; 3 new
  tests cover trigger derivation for scheduled / force / only paths.
  26 tests in test_mutator.py green; 116 across mutator + topology
  + bus.
2026-04-21 19:31:31 -04:00
f875350d75 feat(mutator): emit_decky_mutated helper — RFC 5424 + bus in one call
First step toward making mutation events first-class nodes in the
correlation graph. Today the graph silently reflects post-mutation
state with no marker of the transition; this helper lands the
emitter the mutator and deploy paths will call.

- decnet/mutator/events.py: emit_decky_mutated(bus, *, decky,
  old_services, new_services, trigger, actor=None, log_path=None)
  writes an RFC 5424 line (service=mutator, hostname=<decky>,
  MSGID=decky_mutated, SD params for old/new services + trigger +
  optional actor) to DECNET_INGEST_LOG_FILE, then fire-and-forget
  publishes on decky.<id>.mutation. Either side failing is soft —
  the other path still completes.
- MutationTrigger Literal covers creation, retirement, scheduled,
  operator, behavioral, healer, federation. Reserved values for v2/v3
  (behavioral + federation) stay nullable so the schema is stable.
- decnet/bus/topics.py: DECKY_MUTATION constant + decky_mutation(id)
  builder. Distinct from DECKY_STATE ("current shape") because a
  mutation is a transition event, not a steady-state snapshot.
- Empty-set symmetry: creation emits old_services=[], retirement
  emits new_services=[]. Every decky lifecycle becomes a well-formed
  fold sequence on the correlator side.
- 4 new tests: FakeBus + correlator parser round-trip; creation and
  retirement empty-set cases; bus=None still writes syslog;
  unwritable log path doesn't block bus publish. 95 tests green
  across test_mutator + tests/bus.
2026-04-21 19:29:21 -04:00
e23c6c4ee4 feat(mutator): bus-wake on decky mutate_request; adaptive sleep; heartbeat
The flat-fleet mutator was DB-poll-only and noisy — it logged
"no active deployment found" every 10s on idle hosts and ran
mutate_all at a fixed tick regardless of when the next decky
was due.

- mutate_all returns seconds-until-next-due; watch loop sleeps
  min(next_due, poll_interval_secs) with a 1s floor.
- "No deployment" is now idle, not an error: edge-triggered log
  on present<->absent transition instead of every tick.
- mutate_decky publishes decky.<name>.state on successful compose
  so UIs react in real time.
- New decky.*.mutate_request subscription lets API/CLI/UI force
  an immediate mutation of a specific decky without waiting for
  its interval; target name feeds mutate_all(only={...}).
- system.mutator.health heartbeat via run_health_heartbeat helper,
  bringing the mutator in line with DEBT-031 workers.

Tests: next_due return, only= filter, decky.<name>.state publish
on success, no publish on compose failure. Full mutator+topology-
mutator+bus suite (109) green.
2026-04-21 19:28:01 -04:00
f76fc09caf docs(debt): mark DEBT-031 resolved; document deferrals
All nine service workers now participate in the host-local bus: sniffer,
prober, correlator (via profiler), profiler, collector, ingester, agent,
forwarder, updater.  Pre-bus behavior is preserved end-to-end for
DECNET_BUS_ENABLED=false and get_bus() failures.

Three items intentionally deferred: realism-probe decky.{id}.state
(needs a realism probe path that doesn't exist yet), correlator session
boundaries (needs session state), and bus-wake subscriptions (publishes
landed; wake side wired to no subscriber today).
2026-04-21 17:02:57 -04:00
5c0631e12c feat(agent,forwarder,updater): publish system.<worker>.health heartbeats (DEBT-031 workers 7-9)
All three workers now share a run_health_heartbeat helper in
decnet.bus.publish.  Each publishes system.<worker>.health on a 30s tick
with {worker, ts} plus optional per-worker extras.  Subscribers can
watch system.*.health to see every DECNET worker on a host at once.

- agent: heartbeat runs inside the FastAPI lifespan alongside the
  existing master-facing heartbeat; bus-disabled path is a no-op.
- forwarder: heartbeat task spawned at run_forwarder entry, cancelled
  in the finally block so a crashed master loop never leaks the task.
- updater: new FastAPI lifespan hosts the heartbeat.

Heartbeat helper swallows extra() failures and is cancellation-safe so
lifespan teardown never hangs on it.
2026-04-21 17:02:10 -04:00
cbb394a160 feat(ingester): publish system.log per committed batch (DEBT-031 worker 6)
Ingester connects the bus at startup, emits a batch-committed summary
(component/flushed/position) after each successful _flush_batch.  Zero-
row flushes are suppressed so the topic stays meaningful.

Complements the collector's per-line system.log publishes: collector
signals ingress, ingester signals DB-persisted progress.  Federation
forwarder (worker 8) will subscribe to the batch-committed leaf to
trigger its upstream push.

Bus stays optional: publish_safely swallows failures, get_bus() can
return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully
functional.
2026-04-21 16:58:49 -04:00
a448dbe283 feat(collector): publish system.log per ingested event (DEBT-031 worker 5)
log_collector_worker connects the bus at startup, builds a thread-safe
system.log publisher, and hands it to each container-stream thread
through _stream_container's new publish_fn parameter.  Publishing fires
right after the JSON record is written — same rate-limiter path, no
extra parsing, compact payload (decky/service/event_type/attacker_ip/
timestamp) so subscribers can redraw without re-reading the DB.

Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false the
factory returns a no-op publisher and the stream thread calls it
unconditionally.  Hook failures are logged and never abort the thread.
2026-04-21 16:57:21 -04:00
67c2e30f89 feat(profiler): publish attacker.scored per profile upsert (DEBT-031 worker 4)
The profiler worker threads its bus publisher through _WorkerState so
_update_profiles can emit a compact attacker.scored event for every
upsert.  Payload carries the headline counts (event/service/decky/
bounty/credential) plus is_traversal, so the MazeNET attacker pool can
redraw without a round-trip.

Bus stays optional: publish_attacker=None when DECNET_BUS_ENABLED=false
or get_bus() fails, and hook exceptions are logged without breaking the
upsert path.
2026-04-21 16:54:40 -04:00
e51b65d7c3 feat(correlation,profiler): publish attacker.observed on first sighting (DEBT-031 worker 3)
CorrelationEngine gains an optional publish_fn hook fired once per unique
attacker IP.  The profiler worker — sole caller of the engine today —
carries the bus physically, builds a thread-safe publisher, and wraps it
with the attacker.observed topic before handing it in.

Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false, the
engine runs publish_fn=None and the worker degrades to DB-only.  Hook
failures log a warning and never break ingestion.
2026-04-21 16:53:03 -04:00
34d9e37ab0 feat(prober): publish attacker.fingerprinted on the bus (DEBT-031)
Each successful JARM / HASSH / TCPfp probe fans out an
attacker.fingerprinted event; the probe family goes in event.type so a
single subscription covers all three.  Payload carries the attacker IP,
port, and probe-specific hash — enough for the MazeNET live map to
render fingerprint info on observed attackers.

Lifts the thread-safe publisher helper out of the sniffer worker into
decnet/bus/publish.py so the prober (and every future worker with a
to_thread hot path) can reuse it without copy-pasting the
run_coroutine_threadsafe dance.  Sniffer rewires onto the shared helper
in passing.

Adds ATTACKER_FINGERPRINTED as a new leaf — distinct from
ATTACKER_OBSERVED (correlator's first-sight signal) because an active
probe result is additional evidence about an already-observed attacker.

Note: the plan's decky.{id}.state realism-probe publish path is
deferred — the current prober fingerprints attackers, not decky
realism.  Will revisit when realism probes exist.
2026-04-21 16:47:55 -04:00
7f497ac552 feat(sniffer): publish decky.{id}.traffic on the bus (DEBT-031)
SnifferEngine gains an optional publish_fn hook, invoked after the
dedup + syslog write for traffic-summary events only (tls_session,
tcp_flow_timing, tcp_syn_fingerprint) — intermediate parser artifacts
like tls_client_hello stay off the bus.

The sniffer worker wires get_bus() + a thread-safe shim that marshals
sync calls from the scapy sniff thread back onto the asyncio loop via
run_coroutine_threadsafe.  Bus failure at startup degrades cleanly to
publish-off mode; publish failures at runtime never escape the sniff
thread.
2026-04-21 16:35:50 -04:00
f3eaab5d37 refactor(bus): extract publish_safely + extend topics for DEBT-031
Shared publish_safely helper at decnet/bus/publish.py so the nine
workers about to be wired into the bus don't each copy-paste the
"never raise back at the caller" contract. Mutator drops its private
copy and imports the canonical one.

topics.py gains the attacker.* hierarchy (observed, scored,
session.started, session.ended) and a system_health(worker) builder
for per-worker health heartbeats — both prerequisites for the worker
rollout under DEBT-031.
2026-04-21 16:32:30 -04:00
e083bbe17c docs(debt): add DEBT-031 — workers publish/subscribe to bus if available
Per-worker integration of the service bus shipped in DEBT-029. Publishes
are fire-and-forget; subscribes wake polling loops. Bus stays optional —
if get_bus() fails or DECNET_BUS_ENABLED=false, workers log once and
continue in poll-only mode (mirrors decnet/mutator/engine.py:run_watch_loop).
2026-04-21 14:49:45 -04:00
d97a32e2d0 docs(dev): resolve DEBT-030 phase A + add mutator-family bus smoke
- scripts/bus/smoke-mutator.sh: boots decnet bus, subscribes to
  topology.>, publishes one event per mutation-lifecycle state plus
  a topology.status transition, asserts all four land on the
  subscriber. Cheap E2E for the topic hierarchy the mutator + SSE
  route rely on.
- development/DEBT.md: mark DEBT-030  resolved (Phase A) with a
  summary of what shipped; flag the optimistic staged-buffer editor
  as Phase B follow-up, not debt.
2026-04-21 14:39:25 -04:00
1968f6e741 test(mutator,web): cover bus publishes, bus-wake, and SSE events route
- tests/topology/test_mutator.py: reconcile_topologies publishes
  applying+applied on success, applying+failed+status on failure; and
  stays safe when bus=None. _wake_on_enqueue sets its asyncio.Event
  on every matching enqueue event.
- tests/api/topology/test_mutations.py: POST /mutations publishes
  mutation.enqueued after a successful DB write, via a FakeBus
  injected in place of the app-wide bus singleton.
- tests/api/topology/test_events_stream.py: SSE route returns 401
  unauthenticated, 404 for unknown topologies, and (driving the
  async generator directly) emits a snapshot on connect plus
  forwards a published mutation.applied as an `event: mutation.applied`
  SSE frame.
2026-04-21 14:39:12 -04:00
8ecb9e6c2d feat(web/mazenet): subscribe to topology SSE stream in editor
Wire the MazeNET editor to the new /topologies/{id}/events SSE route
so live (active|degraded) topologies reflect mutator state transitions
without reload:

- useTopologyStream hook opens an EventSource against
  /topologies/{id}/events?token=<jwt>, with 3s reconnect matching the
  dashboard's /stream consumer. Callback refs avoid tearing down the
  connection on consumer rerenders.
- useMazeApi gains enqueueMutation(topologyId, op, payload,
  expectedVersion?) — thin wrapper over POST /mutations.
- MazeNET.tsx opens the stream only when topoStatus is active|degraded
  (pending editors have nothing to stream) and refetches on
  mutation.applied|failed|status events. Header shows a LIVE /
  CONNECTING… indicator.

Phase A slice — Apply (N changes) with an optimistic staged buffer
lands in a follow-up; the hooks + API method it'll need are already
here.
2026-04-21 14:38:58 -04:00
f611e7363b feat(mutator,web): live topology mutation pipeline backend (DEBT-030)
Wire the mutator and web API into the service bus so live-topology
edits flow sub-second from enqueue to UI:

- Mutator publishes every state transition on the bus (mutation.applying
  /applied/failed + topology.status). Fire-and-forget; DB stays source
  of truth.
- Mutator watch loop subscribes to topology.*.mutation.enqueued and
  wakes early via asyncio.Event — the 10s poll becomes a fallback
  heartbeat, not the primary dispatch trigger.
- POST /topologies/{id}/mutations publishes mutation.enqueued after
  the DB write succeeds.
- New GET /topologies/{id}/events SSE route: snapshot on connect
  (status + in-flight mutations), live forwards topology.{id}.>
  bus events, 15s keepalive. ?token= auth mirrors /stream.
- New decnet/bus/app.py — process-wide lazy bus singleton for the
  API, closed cleanly on lifespan shutdown.
2026-04-21 14:38:25 -04:00
f0349632c3 chore(bus): add scripts/bus/ smoke + manual test helpers
start.sh boots a local bus on /tmp (no root, no decnet group).
sub.py / pub.py are thin CLIs over UnixSocketBus for manual poking.
smoke.sh is a self-contained end-to-end check — spawns a worker,
subscribes, publishes, asserts delivery, cleans up.
2026-04-21 14:03:30 -04:00
fbf289ff63 feat(bus): host-local UNIX-socket pub/sub worker (DEBT-029)
Land the `decnet bus` worker and `get_bus()` factory. Transport is a
host-local UNIX-domain socket (0660, group=decnet); authz is the file
mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body.
NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget —
DB stays the source of truth. `FakeBus` / `NullBus` for tests and the
disabled path. Cross-host federation is deferred to a future
`--bridge-tcp` mode; DEBT-030 is master-only and unblocked.
2026-04-21 13:49:02 -04:00
4481a947d4 docs(dev): tick shipped items on the roadmap
Plugin SDK docs and the 250-user / 100-req-per-second API targets are
met; mark them done.
2026-04-21 10:24:50 -04:00
1b64453aa7 feat(web/fleet): redesign DeckyFleet view with archetype wizard
Port the design-handoff layout into a scoped DeckyFleet.css (no more
piggybacking on Dashboard.css). Add an archetype-first creation wizard
that consumes /api/v1/topologies/archetypes, falling back to the
MazeNET ARCHETYPES constant when the endpoint is unavailable.
2026-04-21 10:24:43 -04:00
4727ea0af2 feat(web/mazenet): polish editor UX
Canvas grew a deployed prop so nodes can visually distinguish "live in
docker" from "planned". ContextMenu learned nested submenus with
ChevronRight affordance; NetBox renders a ShieldAlert for DMZ LANs;
Palette got additional lucide icons. Dead PendingChange union pulled
out of types.ts — Phase-3 mutation ops are driven by the API layer now,
not a frontend type.
2026-04-21 10:24:32 -04:00
59d618d25f feat(web): topologies nav entry and /mazenet route guard
New /topologies page lists topologies; a bare /mazenet now redirects
there since the editor has no meaning without ?topology=<id>. Wizard
picks up a note style + tweaked copy.
2026-04-21 10:24:23 -04:00
d9f3824086 test(topology): cover compose labels and tolerate docker filter kwarg
test_compose asserts the new decnet.topology.* labels land on both base
deckies (role=base, no service marker) and service fragments
(service=true). The stub docker client in test_deploy grew a filters
kwarg so it keeps matching the real .networks.list(filters=...) call
signature now used by the deployer.
2026-04-21 10:24:15 -04:00
071312fc0c feat(web/api): expose archetype catalog endpoint
/api/v1/topologies/archetypes returns the archetype registry (slug,
display name, description, preferred services/distros, nmap_os
fingerprint) so the frontend wizard can render a live catalog instead
of hardcoding a copy.
2026-04-21 10:24:01 -04:00
542637c0dc feat(web/api): support PATCH on proxy and CORS
The web bundle proxy handled GET/POST/PUT/DELETE but not PATCH or
preflight OPTIONS, which broke browser calls to PATCH endpoints behind
the static-bundle server. CORS middleware had the same gap.
2026-04-21 10:23:55 -04:00
1b29a7692c feat(cli/db): include topology tables in db reset
db reset drops-and-recreates a fixed table set in FK order. Topology
tables weren't in the list, so reset left orphan topology rows behind
and a fresh MazeNET deploy could collide with stale child records.
2026-04-21 10:23:49 -04:00
e75198cca9 feat(cli/topology): add delete command and null-safe show
topology delete cascades children (LANs, deckies, edges, mutations) but
refuses while containers are still running — teardown is prerequisite.
show stopped assuming every decky carried a full decky_config blob;
MazeNET-generated deckies only get hydrated on deploy, so fall back to
top-level name/services when the config isn't there.
2026-04-21 10:23:37 -04:00
0cdcfe2653 feat(agent/collector): topology-label discovery and master-authoritative supersede
Legacy fleet deckies live in decnet-state.json; MazeNET topology
containers don't. Tag them at compose-time with
decnet.topology.service=true and let the collector match on that label.
Spin up the agent's log collector on the first successful /topology/apply
(not in the lifespan — that would break the no-docker-on-boot invariant)
and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE,
separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running
both roles can't forward its own ingest back to itself.

When master pushes a topology that differs from whatever is pinned
locally, teardown the predecessor and accept the new one. Refusing with
409 left the agent stranded after partial deploys. record_error now
persists the hydrated blob so a later teardown can still walk the LAN
list — otherwise a half-failed apply strands containers + bridges with
no breadcrumb back to them.
2026-04-21 10:23:10 -04:00
050607e00d feat(web): two-step topology creation wizard pinned to target host
Replaces the single-line name input with a modal that mirrors the
design-handoff DeployWizard shape (backdrop + violet-bordered panel,
wizard-step tabs, card-picker body):

- Step 1 — TARGET: a RUN LOCALLY card plus one card per enrolled
  swarm host. Non-routable hosts render disabled with their status as
  the tooltip. Selecting an agent pins the topology via
  target_host_uuid; local stays unihost.
- Step 2 — TYPE: BLANK (POST /topologies/blank) or SEED-BASED
  (POST /topologies/ with depth, branching, deckies-per-LAN, optional
  seed). Name is required on both.

Existing navigate-to-editor-on-create behavior is preserved.
2026-04-21 01:48:05 -04:00
12e18b75db feat(swarm): expose needs_resync on TopologySummary + upsert record_error
Two small observability follow-ups to the phase-1 agent/topology wiring:

TopologySummary now carries needs_resync so operators can see the
heartbeat's resync flag via the topology list/detail API without
dropping into the DB.

TopologyStore.record_error becomes an upsert — when a docker/compose
failure fires during the first materialise (put() never reached), we
still land a marker row so GET /topology/state surfaces the error and
the next heartbeat carries an empty applied_version_hash. That empty
hash is what master's heartbeat check relies on to flag the topology
for resync instead of assuming the apply succeeded.
2026-04-21 01:41:30 -04:00
0a14dbc9f4 test(agent): pin no-auto-restore-on-boot invariant for topology cache
Four regression tests guarding Step 8 of the agent/topology wiring:

- Lifespan startup must not call docker.from_env even with a populated
  topology.db — replace docker with a boom-stub and assert zero calls.
- GET /topology/state returns the cached row verbatim without
  re-materialising bridges/containers; live observation is read-only.
- Static guard: TopologyStore must not grow a restore/replay/reapply
  method without someone re-reading the module docstring.
- Raw sqlite read + a second TopologyStore instance confirm the store
  is passive — nothing scrubs stale rows on open, which is the
  behaviour master's resync flow depends on.
2026-04-21 01:37:05 -04:00
e8f9c955b3 feat(swarm): heartbeat-driven topology resync for agent-pinned deployments
Agent heartbeats now carry an applied-topology snapshot. The master
heartbeat handler compares the reported version_hash against what
canonical_hash yields for the hydrated topology pinned to that host
and flags Topology.needs_resync on divergence (or when the agent
reports no topology at all while master expects one).

The mutator watch loop gains reconcile_agent_resyncs, which re-pushes
the current hydrated blob via AgentClient.apply_topology without
touching status, then clears the flag on success. Push failures leave
the flag set so the next tick retries.
2026-04-21 01:35:12 -04:00