DECNET

Author	SHA1	Message	Date
anti	5704e8fcce	fix(topology): delete topology_mutations in delete-cascade delete_topology_cascade manually deletes status_events, edges, deckies and lans but overlooked topology_mutations, so deleting any topology that ever had a mutation enqueued (i.e. edits while active\|degraded) failed with an FK IntegrityError. Add the missing DELETE and extend the cascade test to seed a mutation row.	2026-04-22 17:50:30 -04:00
anti	3f460bab84	feat(web): show MazeNET decky running count + roll into dashboard MazeNET header now reports '{running}/{total} DECKIES RUNNING' so operators can see per-topology runtime status at a glance. Dashboard ACTIVE DECKIES counters used to reflect only the fleet state file; TopologyDecky rows (MazeNET deployments) are now added in — deployed_deckies = fleet + all topology rows, active_deckies = fleet (no runtime field) + topology rows whose state is 'running'.	2026-04-22 17:48:04 -04:00
anti	6f537f52c2	fix(topology): remove DMZ gateway auto-attach on LAN create POST /topologies/{id}/lans previously called _auto_attach_gateway() whenever a non-DMZ LAN was created, which wired the DMZ gateway decky to every new subnet. That's why a deployed gateway ended up with eth0..ethN on every LAN regardless of what the user drew in MazeNET. Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time validator (decnet/topology/validate.py:65-110) stays strict — users must explicitly wire the gateway to each subnet they want bridged, which is the whole point of having a topology editor. useMazeApi.ts: drop stale auto-bridge reference from comment.	2026-04-22 17:14:09 -04:00
anti	91111ea7ee	feat(cli): add `decnet init --deinit` to undo a previous bootstrap Reverse of init, step-by-step: systemctl disable --now decnet.target, remove every decnet-*.service + decnet.target unit file, drop the polkit rule, drop the tmpfiles.d entry, daemon-reload, remove /etc/decnet + /etc/decnet/config.ini, /run/decnet, /opt/decnet, and userdel/groupdel the decnet identity. Preserves /var/lib/decnet and /var/log/decnet by default — those hold operator data. Pass `--deinit --purge` to rm -rf them too. Idempotent on a clean host (every step prints [SKIP]). Honours --dry-run. 5 new tests cover the full-undo path, --purge, idempotent clean-host deinit, dry-run side-effect-free behaviour, and the --purge without --deinit guard.	2026-04-22 14:31:56 -04:00
anti	3dae44c652	feat(cli): add `decnet init` one-shot master-host bootstrap Creates the decnet system user/group, installs every unit file from deploy/ into /etc/systemd/system, drops the polkit rule, seeds /opt/decnet + /var/{lib,log}/decnet + /etc/decnet + /run/decnet, writes a placeholder /etc/decnet/config.ini, applies the new tmpfiles.d entry so /run/decnet survives reboots, daemon-reloads, and `systemctl enable --now decnet.target`. Idempotent (re-runs print [SKIP] on already-configured items), --dry-run previews the plan without touching anything, --no-start defers the target start, --force overwrites even matching unit files. Master-only (added to MASTER_ONLY_COMMANDS). 9 orchestration tests cover the non-root gate, dry-run, useradd/ groupadd argv, SKIP on present user/group, unit-file idempotency, --force overwrite, --no-start suppression, happy path, and the "deploy/ not found" error message.	2026-04-22 14:28:11 -04:00
anti	13ea916943	feat(workers): add start + start-all endpoints (systemd supervisor) POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown worker, 503 if the unit file is not installed, 502 if systemctl returns non-zero (stderr snippet in detail, full stack logged). Admin only. POST /api/v1/workers/start-all — best-effort: walks the worker list in dependency order (bus → api → data-plane), skips already-active and uninstalled units, aggregates outcomes into {started, already_running, failed[]}. Returns 200 even on partial failure; the caller reads the three lists. Both endpoints delegate to the systemd_control helper, so the attack surface for "what gets executed" is locked to `decnet-<validated-name> .service` at two layers (router KNOWN_WORKERS + helper regex).	2026-04-22 14:12:29 -04:00
anti	0fbb07c2ec	feat(workers): bus-backed Workers panel (registry, control, installed flag) Ships the backend half of Config → Workers: * Worker registry aggregates `system..health` + `system.bus.health` heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop out of a 90s window (3× the 30s heartbeat interval). `GET /api/v1/workers` returns the snapshot plus `bus_connected` (so the UI can explain "all UNKNOWN" when the bus socket is down) and a per-row `installed` flag populated from `systemctl list-unit-files decnet-.service` (cached 30s). `POST /api/v1/workers/{name}/stop` publishes a stop intent on `system.<name>.control`; workers listen via the shared control listener in `bus/publish.py`. * Heartbeat + control listener wired into collector / profiler / sniffer / prober / mutator worker loops. API self-heartbeats too so the panel always has one ground-truth row. * Topic helper `system_control(name)` + tests covering builder validation, control listener shutdown path, and the API surface (auth gating, bus-connected field, unknown-name 404). Adds `StartFailure` / `StartAllResponse` models in anticipation of the upcoming start endpoints (DEBT-034).	2026-04-22 14:10:39 -04:00
anti	fcaac648a4	feat(web): add systemd_control helper for worker unit management Thin async wrapper over `systemctl` — never shell=True, always create_subprocess_exec. Unit names are built from `decnet-<validated-name>.service`; the regex check is defence in depth on top of the router-level KNOWN_WORKERS validation. Exposes start / stop / is_active / list_installed; last is cached for 30s to keep the Workers panel cheap under REFRESH spam. On non-systemd hosts list_installed returns an empty set, so the UI renders with every row marked not-installed instead of 500-ing.	2026-04-22 14:08:35 -04:00
anti	3fb84ac5d0	feat(templates): per-instance stealth via instance_seed in service servers Every service template now pulls version strings, cluster/node UUIDs, auth salts, greeting banners, and uptime from the seeded per-instance RNG instead of hard-coded defaults. Scanners sweeping the fleet now see legitimately diverging fingerprints per decky while each decky's own responses stay internally consistent across restarts. Covers elasticsearch, ftp, http, https, ldap, mongodb, mqtt, mssql, mysql, postgres, redis, and smtp templates.	2026-04-22 09:24:16 -04:00
anti	51e9e263ca	feat(templates): add instance_seed stealth helper and wire into template builds Each decky now gets a deterministic-per-instance seeded RNG derived from NODE_NAME, so cluster UUIDs, version strings, uptime, and credentials diverge across the fleet while staying stable within one container. The canonical helper lives at decnet/templates/instance_seed.py; the deployer copies it into every active template build context alongside syslog_bridge.py. Dockerfiles COPY it to /opt/ so server.py can import it. Connection-time jitter intentionally stays unseeded — two hits to the same decky must not replay the same latency curve.	2026-04-22 09:24:04 -04:00
anti	6bbb2376f7	refactor(services): make artifact root configurable via DECNET_ARTIFACTS_ROOT The ssh and telnet services hard-coded /var/lib/decnet/artifacts as the host quarantine mount. Read it from DECNET_ARTIFACTS_ROOT with the same default so dev/rootless deploys can point it elsewhere.	2026-04-22 09:23:36 -04:00
anti	6725197d58	test(web): transcripts API + attacker-transcripts router coverage Paging, truncation surfacing, admin gate, path traversal, sid-regex and decky-mismatch rejection for /transcripts; mirror coverage for /attackers/{uuid}/transcripts. Flips the Session Recording box in the roadmap (sessrec pty relay now shipping end-to-end).	2026-04-21 23:11:40 -04:00
anti	6e522c5a55	feat(web): transcripts API + repository lookups Adds get_attacker_transcripts (mirror of artifacts for session_recorded logs) and get_session_log for sid→shard resolution. New /api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events out of the shared JSONL day-shard via an mtime-keyed byte-offset index — never scans the whole shard per request. New /api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both endpoints admin-gated.	2026-04-21 23:06:39 -04:00
anti	a58d42e492	feat(templates): wire SSH+Telnet to sessrec transcript recorder Build login-session into both images as the swapped root shell, add a quarantine bind mount for telnet (symmetric to SSH), seed transcripts/ dir and service discriminant at entrypoint. Deployer syncs sessrec.c + Makefile into each build context alongside the existing syslog_bridge helper. sessrec falls back to /etc/sessrec.service when env is stripped (busybox /bin/login).	2026-04-21 23:03:42 -04:00
anti	4596c1d69a	feat(templates): add sessrec pty transcript recorder New decnet/templates/_shared/sessrec/ — a small C program installed as the login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid, and emits one RFC 5424 session_recorded line on exit (direct to PID 1's stdout, same pattern syslog_bridge.py uses). Storage: one shard per (decky, UTC day) at /var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk- free precheck (<200 MB) falls through to plain bash with a session_skipped log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0), or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses replays stay aligned. Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session (plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so wiring the existing argv_zap.so into this path needs a separate wrapper. DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck covers the worst case but can be blinded by a one-shot disk fill.	2026-04-21 22:56:42 -04:00
anti	8f25ff677f	feat(engine,api): add orphan topology resource reaper Topology rows deleted without a proper teardown leave Docker containers and bridge networks behind, holding IPAM pools that cause 403 "Pool overlaps" on the next deploy at the same subnet. - engine/reaper.py walks the local Docker daemon, extracts the 8-char topology prefix from every decnet_t_* resource, and force-removes containers + networks whose prefix is not in the repo. - POST /api/v1/topologies/reap-orphans (admin-only) returns a report of live/orphan prefixes and what was removed. - Resources belonging to live topologies are never touched; per-resource errors are captured without aborting the sweep.	2026-04-21 22:13:44 -04:00
anti	85bb0e2f65	fix(engine): roll back partial Docker state on deploy failure When create_bridge_network or compose-up raised mid-deploy, the deployer marked the topology FAILED and re-raised — but left every network it had already created alive. The next deploy attempt tripped over the orphans with 'Pool overlaps with other one on this address space' (IPAM conflict). Track networks created in the current attempt; on exception, tear down the started compose stack (if any), remove the networks in reverse order, and delete the compose file before marking FAILED. Rollback errors are logged but never mask the original failure. Covered by a new regression test that drives a docker client which succeeds once then raises, and asserts every created network is also removed.	2026-04-21 20:23:03 -04:00
anti	c266d1b6e3	feat(mutator,web): add_decky op — create-and-attach in one mutation apply_attach_decky requires an existing decky, so the MazeNET editor had no way to grow a live topology: creating a new decky on active topologies 409'd on the direct-CRUD createDecky call. - Backend: new apply_add_decky that creates the decky row + its home-LAN edge atomically, auto-allocating an IP if none pinned. Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS Literal + CLI help text. - Tests: 3 new ops tests (happy path, duplicate-name rejection, missing-LAN rejection) plus dispatch coverage update. - Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending routes through createDecky + attachEdge as before; active routes through a single add_decky enqueue. MazeNET.tsx drag-archetype, duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the composite so active topologies stop 409'ing on new-decky drops.	2026-04-21 20:13:39 -04:00
anti	a93cbe76f9	feat(mutator): update_decky payload accepts top-level services list apply_update_decky only merged payload.patch into decky_config. Since services is a separate DB column, there was no way to replace a decky's services list via a mutation. Add a top-level services key to the op payload that maps straight onto the services column. Unblocks the MazeNET editor routing service-add/service-drop actions through the mutation queue on active topologies.	2026-04-21 19:56:58 -04:00
anti	d4d8a2ad0d	feat(correlation): interleave mutation markers into attacker traversals Parser now tags ``mutator`` / ``decky_mutated`` lines with ``kind="mutation"`` so the engine can route them into a sibling ``_mutations`` index keyed by decky name instead of the per-IP attacker index. ``traversals()`` joins the two streams: every attacker gets a ``mutations_during`` list of markers from touched deckies bounded by their first/last-seen window. ``AttackerTraversal.to_dict()`` grows a ``mutations_during`` field and a ``timeline`` that chronologically interleaves hops and markers, so an ``SSH at T5 → mutation at T6 → HTTP at T7`` substrate transition is visible to UI consumers instead of reading as a silent discontinuity. The existing hops-only JSON shape is preserved; old clients that ignore unknown keys keep working.	2026-04-21 19:37:35 -04:00
anti	bf5ed7abbb	feat(engine): emit creation/retirement mutation events on deploy/teardown Close the lifecycle loop for the correlation graph: every decky now enters the substrate with an explicit `trigger=creation` event (old_services=[] ⇒ new_services=<initial>) and leaves it with `trigger=retirement` (old=<current> ⇒ new=[]). With scheduled/operator mutations already flowing through emit_decky_mutated, the entire decky lifecycle is now a well-formed sequence of mutation events — the correlator can fold substrate_state(t) at any T by replaying them. Lazy-imports mutator.events to dodge the engine↔mutator circular dependency. Bus is None at CLI sites; the syslog write is what the correlator consumes. Emission is soft-failing so a broken log path never aborts a deploy.	2026-04-21 19:35:05 -04:00
anti	fa0cdb3ab5	feat(mutator): route mutate_decky through emit_decky_mutated with trigger Mutator now emits one decky_mutated event (RFC 5424 + bus) per successful mutation instead of the inline decky.<id>.state bus publish. The previous state topic published new_services only; mutation events carry old/new/trigger, which is what the correlation engine needs to interleave substrate-change markers into attacker traversals. - mutate_decky gains trigger: MutationTrigger = "operator" and captures old_services before the shuffle; replaces the inline _publish_safely(decky.<id>.state) with emit_decky_mutated(...). - mutate_all derives trigger internally: operator when force or only-filter is set (CLI --all, API mutate-now, UI bus request); scheduled on interval ticks. Passed through to each mutate_decky call. - Tests updated: the old decky.<id>.state assertion is replaced with decky.<id>.mutation topic + mutation payload shape; 3 new tests cover trigger derivation for scheduled / force / only paths. 26 tests in test_mutator.py green; 116 across mutator + topology + bus.	2026-04-21 19:31:31 -04:00
anti	f875350d75	feat(mutator): emit_decky_mutated helper — RFC 5424 + bus in one call First step toward making mutation events first-class nodes in the correlation graph. Today the graph silently reflects post-mutation state with no marker of the transition; this helper lands the emitter the mutator and deploy paths will call. - decnet/mutator/events.py: emit_decky_mutated(bus, *, decky, old_services, new_services, trigger, actor=None, log_path=None) writes an RFC 5424 line (service=mutator, hostname=<decky>, MSGID=decky_mutated, SD params for old/new services + trigger + optional actor) to DECNET_INGEST_LOG_FILE, then fire-and-forget publishes on decky.<id>.mutation. Either side failing is soft — the other path still completes. - MutationTrigger Literal covers creation, retirement, scheduled, operator, behavioral, healer, federation. Reserved values for v2/v3 (behavioral + federation) stay nullable so the schema is stable. - decnet/bus/topics.py: DECKY_MUTATION constant + decky_mutation(id) builder. Distinct from DECKY_STATE ("current shape") because a mutation is a transition event, not a steady-state snapshot. - Empty-set symmetry: creation emits old_services=[], retirement emits new_services=[]. Every decky lifecycle becomes a well-formed fold sequence on the correlator side. - 4 new tests: FakeBus + correlator parser round-trip; creation and retirement empty-set cases; bus=None still writes syslog; unwritable log path doesn't block bus publish. 95 tests green across test_mutator + tests/bus.	2026-04-21 19:29:21 -04:00
anti	e23c6c4ee4	feat(mutator): bus-wake on decky mutate_request; adaptive sleep; heartbeat The flat-fleet mutator was DB-poll-only and noisy — it logged "no active deployment found" every 10s on idle hosts and ran mutate_all at a fixed tick regardless of when the next decky was due. - mutate_all returns seconds-until-next-due; watch loop sleeps min(next_due, poll_interval_secs) with a 1s floor. - "No deployment" is now idle, not an error: edge-triggered log on present<->absent transition instead of every tick. - mutate_decky publishes decky.<name>.state on successful compose so UIs react in real time. - New decky.*.mutate_request subscription lets API/CLI/UI force an immediate mutation of a specific decky without waiting for its interval; target name feeds mutate_all(only={...}). - system.mutator.health heartbeat via run_health_heartbeat helper, bringing the mutator in line with DEBT-031 workers. Tests: next_due return, only= filter, decky.<name>.state publish on success, no publish on compose failure. Full mutator+topology- mutator+bus suite (109) green.	2026-04-21 19:28:01 -04:00
anti	5c0631e12c	feat(agent,forwarder,updater): publish system.<worker>.health heartbeats (DEBT-031 workers 7-9) All three workers now share a run_health_heartbeat helper in decnet.bus.publish. Each publishes system.<worker>.health on a 30s tick with {worker, ts} plus optional per-worker extras. Subscribers can watch system.*.health to see every DECNET worker on a host at once. - agent: heartbeat runs inside the FastAPI lifespan alongside the existing master-facing heartbeat; bus-disabled path is a no-op. - forwarder: heartbeat task spawned at run_forwarder entry, cancelled in the finally block so a crashed master loop never leaks the task. - updater: new FastAPI lifespan hosts the heartbeat. Heartbeat helper swallows extra() failures and is cancellation-safe so lifespan teardown never hangs on it.	2026-04-21 17:02:10 -04:00
anti	cbb394a160	feat(ingester): publish system.log per committed batch (DEBT-031 worker 6) Ingester connects the bus at startup, emits a batch-committed summary (component/flushed/position) after each successful _flush_batch. Zero- row flushes are suppressed so the topic stays meaningful. Complements the collector's per-line system.log publishes: collector signals ingress, ingester signals DB-persisted progress. Federation forwarder (worker 8) will subscribe to the batch-committed leaf to trigger its upstream push. Bus stays optional: publish_safely swallows failures, get_bus() can return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully functional.	2026-04-21 16:58:49 -04:00
anti	a448dbe283	feat(collector): publish system.log per ingested event (DEBT-031 worker 5) log_collector_worker connects the bus at startup, builds a thread-safe system.log publisher, and hands it to each container-stream thread through _stream_container's new publish_fn parameter. Publishing fires right after the JSON record is written — same rate-limiter path, no extra parsing, compact payload (decky/service/event_type/attacker_ip/ timestamp) so subscribers can redraw without re-reading the DB. Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false the factory returns a no-op publisher and the stream thread calls it unconditionally. Hook failures are logged and never abort the thread.	2026-04-21 16:57:21 -04:00
anti	67c2e30f89	feat(profiler): publish attacker.scored per profile upsert (DEBT-031 worker 4) The profiler worker threads its bus publisher through _WorkerState so _update_profiles can emit a compact attacker.scored event for every upsert. Payload carries the headline counts (event/service/decky/ bounty/credential) plus is_traversal, so the MazeNET attacker pool can redraw without a round-trip. Bus stays optional: publish_attacker=None when DECNET_BUS_ENABLED=false or get_bus() fails, and hook exceptions are logged without breaking the upsert path.	2026-04-21 16:54:40 -04:00
anti	e51b65d7c3	feat(correlation,profiler): publish attacker.observed on first sighting (DEBT-031 worker 3) CorrelationEngine gains an optional publish_fn hook fired once per unique attacker IP. The profiler worker — sole caller of the engine today — carries the bus physically, builds a thread-safe publisher, and wraps it with the attacker.observed topic before handing it in. Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false, the engine runs publish_fn=None and the worker degrades to DB-only. Hook failures log a warning and never break ingestion.	2026-04-21 16:53:03 -04:00
anti	34d9e37ab0	feat(prober): publish attacker.fingerprinted on the bus (DEBT-031) Each successful JARM / HASSH / TCPfp probe fans out an attacker.fingerprinted event; the probe family goes in event.type so a single subscription covers all three. Payload carries the attacker IP, port, and probe-specific hash — enough for the MazeNET live map to render fingerprint info on observed attackers. Lifts the thread-safe publisher helper out of the sniffer worker into decnet/bus/publish.py so the prober (and every future worker with a to_thread hot path) can reuse it without copy-pasting the run_coroutine_threadsafe dance. Sniffer rewires onto the shared helper in passing. Adds ATTACKER_FINGERPRINTED as a new leaf — distinct from ATTACKER_OBSERVED (correlator's first-sight signal) because an active probe result is additional evidence about an already-observed attacker. Note: the plan's decky.{id}.state realism-probe publish path is deferred — the current prober fingerprints attackers, not decky realism. Will revisit when realism probes exist.	2026-04-21 16:47:55 -04:00
anti	7f497ac552	feat(sniffer): publish decky.{id}.traffic on the bus (DEBT-031) SnifferEngine gains an optional publish_fn hook, invoked after the dedup + syslog write for traffic-summary events only (tls_session, tcp_flow_timing, tcp_syn_fingerprint) — intermediate parser artifacts like tls_client_hello stay off the bus. The sniffer worker wires get_bus() + a thread-safe shim that marshals sync calls from the scapy sniff thread back onto the asyncio loop via run_coroutine_threadsafe. Bus failure at startup degrades cleanly to publish-off mode; publish failures at runtime never escape the sniff thread.	2026-04-21 16:35:50 -04:00
anti	f3eaab5d37	refactor(bus): extract publish_safely + extend topics for DEBT-031 Shared publish_safely helper at decnet/bus/publish.py so the nine workers about to be wired into the bus don't each copy-paste the "never raise back at the caller" contract. Mutator drops its private copy and imports the canonical one. topics.py gains the attacker.* hierarchy (observed, scored, session.started, session.ended) and a system_health(worker) builder for per-worker health heartbeats — both prerequisites for the worker rollout under DEBT-031.	2026-04-21 16:32:30 -04:00
anti	f611e7363b	feat(mutator,web): live topology mutation pipeline backend (DEBT-030) Wire the mutator and web API into the service bus so live-topology edits flow sub-second from enqueue to UI: - Mutator publishes every state transition on the bus (mutation.applying /applied/failed + topology.status). Fire-and-forget; DB stays source of truth. - Mutator watch loop subscribes to topology.*.mutation.enqueued and wakes early via asyncio.Event — the 10s poll becomes a fallback heartbeat, not the primary dispatch trigger. - POST /topologies/{id}/mutations publishes mutation.enqueued after the DB write succeeds. - New GET /topologies/{id}/events SSE route: snapshot on connect (status + in-flight mutations), live forwards topology.{id}.> bus events, 15s keepalive. ?token= auth mirrors /stream. - New decnet/bus/app.py — process-wide lazy bus singleton for the API, closed cleanly on lifespan shutdown.	2026-04-21 14:38:25 -04:00
anti	fbf289ff63	feat(bus): host-local UNIX-socket pub/sub worker (DEBT-029) Land the `decnet bus` worker and `get_bus()` factory. Transport is a host-local UNIX-domain socket (0660, group=decnet); authz is the file mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body. NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget — DB stays the source of truth. `FakeBus` / `NullBus` for tests and the disabled path. Cross-host federation is deferred to a future `--bridge-tcp` mode; DEBT-030 is master-only and unblocked.	2026-04-21 13:49:02 -04:00
anti	071312fc0c	feat(web/api): expose archetype catalog endpoint /api/v1/topologies/archetypes returns the archetype registry (slug, display name, description, preferred services/distros, nmap_os fingerprint) so the frontend wizard can render a live catalog instead of hardcoding a copy.	2026-04-21 10:24:01 -04:00
anti	542637c0dc	feat(web/api): support PATCH on proxy and CORS The web bundle proxy handled GET/POST/PUT/DELETE but not PATCH or preflight OPTIONS, which broke browser calls to PATCH endpoints behind the static-bundle server. CORS middleware had the same gap.	2026-04-21 10:23:55 -04:00
anti	1b29a7692c	feat(cli/db): include topology tables in db reset db reset drops-and-recreates a fixed table set in FK order. Topology tables weren't in the list, so reset left orphan topology rows behind and a fresh MazeNET deploy could collide with stale child records.	2026-04-21 10:23:49 -04:00
anti	e75198cca9	feat(cli/topology): add delete command and null-safe show topology delete cascades children (LANs, deckies, edges, mutations) but refuses while containers are still running — teardown is prerequisite. show stopped assuming every decky carried a full decky_config blob; MazeNET-generated deckies only get hydrated on deploy, so fall back to top-level name/services when the config isn't there.	2026-04-21 10:23:37 -04:00
anti	0cdcfe2653	feat(agent/collector): topology-label discovery and master-authoritative supersede Legacy fleet deckies live in decnet-state.json; MazeNET topology containers don't. Tag them at compose-time with decnet.topology.service=true and let the collector match on that label. Spin up the agent's log collector on the first successful /topology/apply (not in the lifespan — that would break the no-docker-on-boot invariant) and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE, separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running both roles can't forward its own ingest back to itself. When master pushes a topology that differs from whatever is pinned locally, teardown the predecessor and accept the new one. Refusing with 409 left the agent stranded after partial deploys. record_error now persists the hydrated blob so a later teardown can still walk the LAN list — otherwise a half-failed apply strands containers + bridges with no breadcrumb back to them.	2026-04-21 10:23:10 -04:00
anti	12e18b75db	feat(swarm): expose needs_resync on TopologySummary + upsert record_error Two small observability follow-ups to the phase-1 agent/topology wiring: TopologySummary now carries needs_resync so operators can see the heartbeat's resync flag via the topology list/detail API without dropping into the DB. TopologyStore.record_error becomes an upsert — when a docker/compose failure fires during the first materialise (put() never reached), we still land a marker row so GET /topology/state surfaces the error and the next heartbeat carries an empty applied_version_hash. That empty hash is what master's heartbeat check relies on to flag the topology for resync instead of assuming the apply succeeded.	2026-04-21 01:41:30 -04:00
anti	e8f9c955b3	feat(swarm): heartbeat-driven topology resync for agent-pinned deployments Agent heartbeats now carry an applied-topology snapshot. The master heartbeat handler compares the reported version_hash against what canonical_hash yields for the hydrated topology pinned to that host and flags Topology.needs_resync on divergence (or when the agent reports no topology at all while master expects one). The mutator watch loop gains reconcile_agent_resyncs, which re-pushes the current hydrated blob via AgentClient.apply_topology without touching status, then clears the flag on success. Push failures leave the flag set so the next tick retries.	2026-04-21 01:35:12 -04:00
anti	05d1ebbaaa	feat(engine): route agent-pinned topologies via AgentClient deploy_topology and teardown_topology now branch on target_host_uuid. When set: - Hydrate the topology locally (validator runs exactly as before). - Compute canonical_hash; push {hydrated, version_hash} to the pinned agent through AgentClient.apply_topology. - Status machine still moves PENDING -> DEPLOYING -> ACTIVE on 2xx, PENDING -> DEPLOYING -> FAILED on error; master remains the sole owner of the row. Teardown flips to TEARING_DOWN, fires /topology/teardown, then TORN_DOWN — we log a warning on agent error but still settle to TORN_DOWN so operators can delete the row (agent garbage is cleaned on the next re-enroll). Unihost deploys are unchanged — the field defaults to NULL so every existing flow takes the local path. Step 6 of the agent <-> topology integration.	2026-04-21 01:27:59 -04:00
anti	5f8a746d6e	feat(swarm): AgentClient topology apply/teardown/state methods Three new RPCs mirroring the existing deploy/teardown/status pattern: - apply_topology(hydrated, version_hash) — long-timeout (600s) for image pulls + compose up. - teardown_topology(topology_id) — 300s timeout; enough for a stubborn compose-down without hanging a heartbeat. - get_topology_state() — short control-plane read for reconcile. The per-call timeout swap uses the same trick as .deploy(). Step 5 of the agent <-> topology integration.	2026-04-21 01:26:21 -04:00
anti	13cb0ff38e	feat(agent): topology apply/teardown/state endpoints New mTLS-protected routes on the agent: - POST /topology/apply — master pushes {hydrated, version_hash}. Validates the hash matches locally (serialisation drift guard), runs the topology through the same validator/composer pipeline used master-side, then creates bridges + compose up + records the apply in topology.db. - POST /topology/teardown — dismantles compose, removes bridges, clears topology.db. Idempotent. - GET /topology/state — returns applied row + live docker observation for the heartbeat. Implementation lives in decnet/agent/topology_ops.py; it reuses the private compose helpers from decnet.engine.deployer so we don't duplicate compose/project-name plumbing. The apply path is sync under the hood (docker SDK + subprocess); we hop to a thread so the event loop keeps servicing other agent traffic. v1 is one-topology-per-agent; cross-topology apply returns 409. Step 4 of the agent <-> topology integration.	2026-04-21 01:25:15 -04:00
anti	aea3e7e05b	feat(agent): sqlite-backed topology_store as applied-state cache Single-row sqlite tracking which topology the agent last applied and its version hash. Sync/stdlib, same pattern as the log-forwarder offset store. v1 is one-topology-per-agent; attempting to apply a different topology over a populated row raises AlreadyApplied so the endpoint can return 409. observed() snapshots live docker state (decnet-topology-* bridges + decnet-* containers) for the heartbeat. The store is a cache, not authority — no auto-restore on boot. Master remains the only source of truth. Step 3 of the agent <-> topology integration.	2026-04-21 01:22:01 -04:00
anti	98465af226	feat(topology): canonical_hash for applied-state comparison Tiny pure helper both master and agent will use to answer "is the applied state the one we expect?". SHA-256 of canonical JSON with volatile keys (timestamps, status, version, canvas x/y/w/h) stripped so the hash only captures deployment-relevant state. Step 2 of the agent <-> topology integration.	2026-04-21 01:20:42 -04:00
anti	5a0cf5d7c8	feat(topology): add target_host_uuid to pin topologies to swarm agents Adds the `target_host_uuid` FK on `Topology` plus wiring through the two create endpoints (`POST /topologies`, `POST /topologies/blank`). Validates the mode/host pair: `mode='agent'` now requires a known, routable host; `mode='unihost'` must leave the field unset. Surfaced on `TopologySummary` so list/detail responses expose it. Purely additive at the schema level — existing unihost flows unchanged (field defaults to `NULL`). Step 1 of the agent <-> topology integration.	2026-04-21 01:19:45 -04:00
anti	b261e8e5fa	feat(topology): add teardown endpoint + UI button Active/degraded/failed/deploying topologies cannot be deleted without first transitioning to torn_down, but the UI had no way to trigger that. Add POST /topologies/{id}/teardown mirroring the deploy endpoint (background task, 202 Accepted), and a click-to-arm TEARDOWN button on the topology list card that shows whenever the row is in a teardown-eligible state.	2026-04-20 23:41:37 -04:00
anti	c37d1f09c6	feat(deployer): warn when userland-proxy masks attacker source IPs MazeNET publishes gateway ports on the host via Docker. With the default userland-proxy enabled, attacker connections appear to originate from the bridge gateway instead of the real remote IP. Log a soft warning at deploy time when the topology publishes any ports and docker info reports UserlandProxy=true, pointing the operator at the daemon.json toggle. Best-effort: daemon talk failures silently no-op.	2026-04-20 23:37:59 -04:00
anti	4d2e38f616	fix(network): sweep orphan Docker bridges that squat on our subnet A prior half-torn-down topology can leave a bridge network alive under a different name that still owns our intended subnet. Docker then rejects our create with 'Pool overlaps with other one on this address space', and the topology deploy fails. Extend create_bridge_network to sweep any unused bridge whose IPAM subnet matches the one we're about to claim (skipping networks with running containers — those are live use).	2026-04-20 23:19:42 -04:00

1 2 3 4 5 ...

323 Commits