delete_topology_cascade manually deletes status_events, edges, deckies
and lans but overlooked topology_mutations, so deleting any topology
that ever had a mutation enqueued (i.e. edits while active|degraded)
failed with an FK IntegrityError. Add the missing DELETE and extend
the cascade test to seed a mutation row.
MazeNET header now reports '{running}/{total} DECKIES RUNNING' so
operators can see per-topology runtime status at a glance.
Dashboard ACTIVE DECKIES counters used to reflect only the fleet state
file; TopologyDecky rows (MazeNET deployments) are now added in —
deployed_deckies = fleet + all topology rows, active_deckies = fleet
(no runtime field) + topology rows whose state is 'running'.
POST /topologies/{id}/lans previously called _auto_attach_gateway()
whenever a non-DMZ LAN was created, which wired the DMZ gateway decky
to every new subnet. That's why a deployed gateway ended up with
eth0..ethN on every LAN regardless of what the user drew in MazeNET.
Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time
validator (decnet/topology/validate.py:65-110) stays strict — users
must explicitly wire the gateway to each subnet they want bridged,
which is the whole point of having a topology editor.
useMazeApi.ts: drop stale auto-bridge reference from comment.
Reverse of init, step-by-step: systemctl disable --now decnet.target,
remove every decnet-*.service + decnet.target unit file, drop the
polkit rule, drop the tmpfiles.d entry, daemon-reload, remove
/etc/decnet + /etc/decnet/config.ini, /run/decnet, /opt/decnet, and
userdel/groupdel the decnet identity.
Preserves /var/lib/decnet and /var/log/decnet by default — those
hold operator data. Pass `--deinit --purge` to rm -rf them too.
Idempotent on a clean host (every step prints [SKIP]). Honours
--dry-run.
5 new tests cover the full-undo path, --purge, idempotent clean-host
deinit, dry-run side-effect-free behaviour, and the --purge without
--deinit guard.
Creates the decnet system user/group, installs every unit file from
deploy/ into /etc/systemd/system, drops the polkit rule, seeds
/opt/decnet + /var/{lib,log}/decnet + /etc/decnet + /run/decnet,
writes a placeholder /etc/decnet/config.ini, applies the new
tmpfiles.d entry so /run/decnet survives reboots, daemon-reloads,
and `systemctl enable --now decnet.target`.
Idempotent (re-runs print [SKIP] on already-configured items),
--dry-run previews the plan without touching anything, --no-start
defers the target start, --force overwrites even matching unit
files. Master-only (added to MASTER_ONLY_COMMANDS).
9 orchestration tests cover the non-root gate, dry-run, useradd/
groupadd argv, SKIP on present user/group, unit-file idempotency,
--force overwrite, --no-start suppression, happy path, and the
"deploy/ not found" error message.
POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown
worker, 503 if the unit file is not installed, 502 if systemctl
returns non-zero (stderr snippet in detail, full stack logged).
Admin only.
POST /api/v1/workers/start-all — best-effort: walks the worker list
in dependency order (bus → api → data-plane), skips already-active
and uninstalled units, aggregates outcomes into
{started, already_running, failed[]}. Returns 200 even on partial
failure; the caller reads the three lists.
Both endpoints delegate to the systemd_control helper, so the attack
surface for "what gets executed" is locked to `decnet-<validated-name>
.service` at two layers (router KNOWN_WORKERS + helper regex).
Ships the backend half of Config → Workers:
* Worker registry aggregates `system.*.health` + `system.bus.health`
heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
(so the UI can explain "all UNKNOWN" when the bus socket is down)
and a per-row `installed` flag populated from
`systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
`system.<name>.control`; workers listen via the shared control
listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
sniffer / prober / mutator worker loops. API self-heartbeats too
so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
validation, control listener shutdown path, and the API surface
(auth gating, bus-connected field, unknown-name 404).
Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
Thin async wrapper over `systemctl` — never shell=True, always
create_subprocess_exec. Unit names are built from
`decnet-<validated-name>.service`; the regex check is defence in depth
on top of the router-level KNOWN_WORKERS validation.
Exposes start / stop / is_active / list_installed; last is cached for
30s to keep the Workers panel cheap under REFRESH spam. On non-systemd
hosts list_installed returns an empty set, so the UI renders with
every row marked not-installed instead of 500-ing.
Every service template now pulls version strings, cluster/node UUIDs, auth
salts, greeting banners, and uptime from the seeded per-instance RNG instead
of hard-coded defaults. Scanners sweeping the fleet now see legitimately
diverging fingerprints per decky while each decky's own responses stay
internally consistent across restarts.
Covers elasticsearch, ftp, http, https, ldap, mongodb, mqtt, mssql, mysql,
postgres, redis, and smtp templates.
Each decky now gets a deterministic-per-instance seeded RNG derived from
NODE_NAME, so cluster UUIDs, version strings, uptime, and credentials diverge
across the fleet while staying stable within one container. The canonical
helper lives at decnet/templates/instance_seed.py; the deployer copies it into
every active template build context alongside syslog_bridge.py. Dockerfiles
COPY it to /opt/ so server.py can import it.
Connection-time jitter intentionally stays unseeded — two hits to the same
decky must not replay the same latency curve.
The ssh and telnet services hard-coded /var/lib/decnet/artifacts as the host
quarantine mount. Read it from DECNET_ARTIFACTS_ROOT with the same default so
dev/rootless deploys can point it elsewhere.
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
Build login-session into both images as the swapped root shell, add a
quarantine bind mount for telnet (symmetric to SSH), seed transcripts/
dir and service discriminant at entrypoint. Deployer syncs sessrec.c +
Makefile into each build context alongside the existing syslog_bridge
helper. sessrec falls back to /etc/sessrec.service when env is stripped
(busybox /bin/login).
New decnet/templates/_shared/sessrec/ — a small C program installed as the
login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each
chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid,
and emits one RFC 5424 session_recorded line on exit (direct to PID 1's
stdout, same pattern syslog_bridge.py uses).
Storage: one shard per (decky, UTC day) at
/var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent
appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND
interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk-
free precheck (<200 MB) falls through to plain bash with a session_skipped
log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0),
or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses
replays stay aligned.
Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session
(plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD
argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so
wiring the existing argv_zap.so into this path needs a separate wrapper.
DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck
covers the worst case but can be blinded by a one-shot disk fill.
Topology rows deleted without a proper teardown leave Docker containers
and bridge networks behind, holding IPAM pools that cause 403 "Pool
overlaps" on the next deploy at the same subnet.
- engine/reaper.py walks the local Docker daemon, extracts the 8-char
topology prefix from every decnet_t_* resource, and force-removes
containers + networks whose prefix is not in the repo.
- POST /api/v1/topologies/reap-orphans (admin-only) returns a report
of live/orphan prefixes and what was removed.
- Resources belonging to live topologies are never touched; per-resource
errors are captured without aborting the sweep.
When create_bridge_network or compose-up raised mid-deploy, the
deployer marked the topology FAILED and re-raised — but left every
network it had already created alive. The next deploy attempt tripped
over the orphans with 'Pool overlaps with other one on this address
space' (IPAM conflict).
Track networks created in the current attempt; on exception, tear down
the started compose stack (if any), remove the networks in reverse
order, and delete the compose file before marking FAILED. Rollback
errors are logged but never mask the original failure.
Covered by a new regression test that drives a docker client which
succeeds once then raises, and asserts every created network is also
removed.
apply_attach_decky requires an existing decky, so the MazeNET editor
had no way to grow a live topology: creating a new decky on active
topologies 409'd on the direct-CRUD createDecky call.
- Backend: new apply_add_decky that creates the decky row + its
home-LAN edge atomically, auto-allocating an IP if none pinned.
Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS
Literal + CLI help text.
- Tests: 3 new ops tests (happy path, duplicate-name rejection,
missing-LAN rejection) plus dispatch coverage update.
- Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending
routes through createDecky + attachEdge as before; active routes
through a single add_decky enqueue. MazeNET.tsx drag-archetype,
duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the
composite so active topologies stop 409'ing on new-decky drops.
apply_update_decky only merged payload.patch into decky_config. Since
services is a separate DB column, there was no way to replace a decky's
services list via a mutation. Add a top-level services key to the op
payload that maps straight onto the services column.
Unblocks the MazeNET editor routing service-add/service-drop actions
through the mutation queue on active topologies.
Parser now tags ``mutator`` / ``decky_mutated`` lines with ``kind="mutation"``
so the engine can route them into a sibling ``_mutations`` index keyed
by decky name instead of the per-IP attacker index. ``traversals()``
joins the two streams: every attacker gets a ``mutations_during`` list
of markers from touched deckies bounded by their first/last-seen
window. ``AttackerTraversal.to_dict()`` grows a ``mutations_during``
field and a ``timeline`` that chronologically interleaves hops and
markers, so an ``SSH at T5 → mutation at T6 → HTTP at T7`` substrate
transition is visible to UI consumers instead of reading as a silent
discontinuity.
The existing hops-only JSON shape is preserved; old clients that
ignore unknown keys keep working.
Close the lifecycle loop for the correlation graph: every decky now
enters the substrate with an explicit `trigger=creation` event
(old_services=[] ⇒ new_services=<initial>) and leaves it with
`trigger=retirement` (old=<current> ⇒ new=[]). With scheduled/operator
mutations already flowing through emit_decky_mutated, the entire decky
lifecycle is now a well-formed sequence of mutation events — the
correlator can fold substrate_state(t) at any T by replaying them.
Lazy-imports mutator.events to dodge the engine↔mutator circular
dependency. Bus is None at CLI sites; the syslog write is what the
correlator consumes. Emission is soft-failing so a broken log path
never aborts a deploy.
Mutator now emits one decky_mutated event (RFC 5424 + bus) per
successful mutation instead of the inline decky.<id>.state bus
publish. The previous state topic published new_services only;
mutation events carry old/new/trigger, which is what the correlation
engine needs to interleave substrate-change markers into attacker
traversals.
- mutate_decky gains trigger: MutationTrigger = "operator" and
captures old_services before the shuffle; replaces the inline
_publish_safely(decky.<id>.state) with emit_decky_mutated(...).
- mutate_all derives trigger internally: operator when force or
only-filter is set (CLI --all, API mutate-now, UI bus request);
scheduled on interval ticks. Passed through to each mutate_decky
call.
- Tests updated: the old decky.<id>.state assertion is replaced
with decky.<id>.mutation topic + mutation payload shape; 3 new
tests cover trigger derivation for scheduled / force / only paths.
26 tests in test_mutator.py green; 116 across mutator + topology
+ bus.
First step toward making mutation events first-class nodes in the
correlation graph. Today the graph silently reflects post-mutation
state with no marker of the transition; this helper lands the
emitter the mutator and deploy paths will call.
- decnet/mutator/events.py: emit_decky_mutated(bus, *, decky,
old_services, new_services, trigger, actor=None, log_path=None)
writes an RFC 5424 line (service=mutator, hostname=<decky>,
MSGID=decky_mutated, SD params for old/new services + trigger +
optional actor) to DECNET_INGEST_LOG_FILE, then fire-and-forget
publishes on decky.<id>.mutation. Either side failing is soft —
the other path still completes.
- MutationTrigger Literal covers creation, retirement, scheduled,
operator, behavioral, healer, federation. Reserved values for v2/v3
(behavioral + federation) stay nullable so the schema is stable.
- decnet/bus/topics.py: DECKY_MUTATION constant + decky_mutation(id)
builder. Distinct from DECKY_STATE ("current shape") because a
mutation is a transition event, not a steady-state snapshot.
- Empty-set symmetry: creation emits old_services=[], retirement
emits new_services=[]. Every decky lifecycle becomes a well-formed
fold sequence on the correlator side.
- 4 new tests: FakeBus + correlator parser round-trip; creation and
retirement empty-set cases; bus=None still writes syslog;
unwritable log path doesn't block bus publish. 95 tests green
across test_mutator + tests/bus.
The flat-fleet mutator was DB-poll-only and noisy — it logged
"no active deployment found" every 10s on idle hosts and ran
mutate_all at a fixed tick regardless of when the next decky
was due.
- mutate_all returns seconds-until-next-due; watch loop sleeps
min(next_due, poll_interval_secs) with a 1s floor.
- "No deployment" is now idle, not an error: edge-triggered log
on present<->absent transition instead of every tick.
- mutate_decky publishes decky.<name>.state on successful compose
so UIs react in real time.
- New decky.*.mutate_request subscription lets API/CLI/UI force
an immediate mutation of a specific decky without waiting for
its interval; target name feeds mutate_all(only={...}).
- system.mutator.health heartbeat via run_health_heartbeat helper,
bringing the mutator in line with DEBT-031 workers.
Tests: next_due return, only= filter, decky.<name>.state publish
on success, no publish on compose failure. Full mutator+topology-
mutator+bus suite (109) green.
All three workers now share a run_health_heartbeat helper in
decnet.bus.publish. Each publishes system.<worker>.health on a 30s tick
with {worker, ts} plus optional per-worker extras. Subscribers can
watch system.*.health to see every DECNET worker on a host at once.
- agent: heartbeat runs inside the FastAPI lifespan alongside the
existing master-facing heartbeat; bus-disabled path is a no-op.
- forwarder: heartbeat task spawned at run_forwarder entry, cancelled
in the finally block so a crashed master loop never leaks the task.
- updater: new FastAPI lifespan hosts the heartbeat.
Heartbeat helper swallows extra() failures and is cancellation-safe so
lifespan teardown never hangs on it.
Ingester connects the bus at startup, emits a batch-committed summary
(component/flushed/position) after each successful _flush_batch. Zero-
row flushes are suppressed so the topic stays meaningful.
Complements the collector's per-line system.log publishes: collector
signals ingress, ingester signals DB-persisted progress. Federation
forwarder (worker 8) will subscribe to the batch-committed leaf to
trigger its upstream push.
Bus stays optional: publish_safely swallows failures, get_bus() can
return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully
functional.
log_collector_worker connects the bus at startup, builds a thread-safe
system.log publisher, and hands it to each container-stream thread
through _stream_container's new publish_fn parameter. Publishing fires
right after the JSON record is written — same rate-limiter path, no
extra parsing, compact payload (decky/service/event_type/attacker_ip/
timestamp) so subscribers can redraw without re-reading the DB.
Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false the
factory returns a no-op publisher and the stream thread calls it
unconditionally. Hook failures are logged and never abort the thread.
The profiler worker threads its bus publisher through _WorkerState so
_update_profiles can emit a compact attacker.scored event for every
upsert. Payload carries the headline counts (event/service/decky/
bounty/credential) plus is_traversal, so the MazeNET attacker pool can
redraw without a round-trip.
Bus stays optional: publish_attacker=None when DECNET_BUS_ENABLED=false
or get_bus() fails, and hook exceptions are logged without breaking the
upsert path.
CorrelationEngine gains an optional publish_fn hook fired once per unique
attacker IP. The profiler worker — sole caller of the engine today —
carries the bus physically, builds a thread-safe publisher, and wraps it
with the attacker.observed topic before handing it in.
Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false, the
engine runs publish_fn=None and the worker degrades to DB-only. Hook
failures log a warning and never break ingestion.
Each successful JARM / HASSH / TCPfp probe fans out an
attacker.fingerprinted event; the probe family goes in event.type so a
single subscription covers all three. Payload carries the attacker IP,
port, and probe-specific hash — enough for the MazeNET live map to
render fingerprint info on observed attackers.
Lifts the thread-safe publisher helper out of the sniffer worker into
decnet/bus/publish.py so the prober (and every future worker with a
to_thread hot path) can reuse it without copy-pasting the
run_coroutine_threadsafe dance. Sniffer rewires onto the shared helper
in passing.
Adds ATTACKER_FINGERPRINTED as a new leaf — distinct from
ATTACKER_OBSERVED (correlator's first-sight signal) because an active
probe result is additional evidence about an already-observed attacker.
Note: the plan's decky.{id}.state realism-probe publish path is
deferred — the current prober fingerprints attackers, not decky
realism. Will revisit when realism probes exist.
SnifferEngine gains an optional publish_fn hook, invoked after the
dedup + syslog write for traffic-summary events only (tls_session,
tcp_flow_timing, tcp_syn_fingerprint) — intermediate parser artifacts
like tls_client_hello stay off the bus.
The sniffer worker wires get_bus() + a thread-safe shim that marshals
sync calls from the scapy sniff thread back onto the asyncio loop via
run_coroutine_threadsafe. Bus failure at startup degrades cleanly to
publish-off mode; publish failures at runtime never escape the sniff
thread.
Shared publish_safely helper at decnet/bus/publish.py so the nine
workers about to be wired into the bus don't each copy-paste the
"never raise back at the caller" contract. Mutator drops its private
copy and imports the canonical one.
topics.py gains the attacker.* hierarchy (observed, scored,
session.started, session.ended) and a system_health(worker) builder
for per-worker health heartbeats — both prerequisites for the worker
rollout under DEBT-031.
Wire the mutator and web API into the service bus so live-topology
edits flow sub-second from enqueue to UI:
- Mutator publishes every state transition on the bus (mutation.applying
/applied/failed + topology.status). Fire-and-forget; DB stays source
of truth.
- Mutator watch loop subscribes to topology.*.mutation.enqueued and
wakes early via asyncio.Event — the 10s poll becomes a fallback
heartbeat, not the primary dispatch trigger.
- POST /topologies/{id}/mutations publishes mutation.enqueued after
the DB write succeeds.
- New GET /topologies/{id}/events SSE route: snapshot on connect
(status + in-flight mutations), live forwards topology.{id}.>
bus events, 15s keepalive. ?token= auth mirrors /stream.
- New decnet/bus/app.py — process-wide lazy bus singleton for the
API, closed cleanly on lifespan shutdown.
Land the `decnet bus` worker and `get_bus()` factory. Transport is a
host-local UNIX-domain socket (0660, group=decnet); authz is the file
mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body.
NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget —
DB stays the source of truth. `FakeBus` / `NullBus` for tests and the
disabled path. Cross-host federation is deferred to a future
`--bridge-tcp` mode; DEBT-030 is master-only and unblocked.
/api/v1/topologies/archetypes returns the archetype registry (slug,
display name, description, preferred services/distros, nmap_os
fingerprint) so the frontend wizard can render a live catalog instead
of hardcoding a copy.
The web bundle proxy handled GET/POST/PUT/DELETE but not PATCH or
preflight OPTIONS, which broke browser calls to PATCH endpoints behind
the static-bundle server. CORS middleware had the same gap.
db reset drops-and-recreates a fixed table set in FK order. Topology
tables weren't in the list, so reset left orphan topology rows behind
and a fresh MazeNET deploy could collide with stale child records.
topology delete cascades children (LANs, deckies, edges, mutations) but
refuses while containers are still running — teardown is prerequisite.
show stopped assuming every decky carried a full decky_config blob;
MazeNET-generated deckies only get hydrated on deploy, so fall back to
top-level name/services when the config isn't there.
Legacy fleet deckies live in decnet-state.json; MazeNET topology
containers don't. Tag them at compose-time with
decnet.topology.service=true and let the collector match on that label.
Spin up the agent's log collector on the first successful /topology/apply
(not in the lifespan — that would break the no-docker-on-boot invariant)
and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE,
separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running
both roles can't forward its own ingest back to itself.
When master pushes a topology that differs from whatever is pinned
locally, teardown the predecessor and accept the new one. Refusing with
409 left the agent stranded after partial deploys. record_error now
persists the hydrated blob so a later teardown can still walk the LAN
list — otherwise a half-failed apply strands containers + bridges with
no breadcrumb back to them.
Two small observability follow-ups to the phase-1 agent/topology wiring:
TopologySummary now carries needs_resync so operators can see the
heartbeat's resync flag via the topology list/detail API without
dropping into the DB.
TopologyStore.record_error becomes an upsert — when a docker/compose
failure fires during the first materialise (put() never reached), we
still land a marker row so GET /topology/state surfaces the error and
the next heartbeat carries an empty applied_version_hash. That empty
hash is what master's heartbeat check relies on to flag the topology
for resync instead of assuming the apply succeeded.
Agent heartbeats now carry an applied-topology snapshot. The master
heartbeat handler compares the reported version_hash against what
canonical_hash yields for the hydrated topology pinned to that host
and flags Topology.needs_resync on divergence (or when the agent
reports no topology at all while master expects one).
The mutator watch loop gains reconcile_agent_resyncs, which re-pushes
the current hydrated blob via AgentClient.apply_topology without
touching status, then clears the flag on success. Push failures leave
the flag set so the next tick retries.
deploy_topology and teardown_topology now branch on
target_host_uuid. When set:
- Hydrate the topology locally (validator runs exactly as before).
- Compute canonical_hash; push {hydrated, version_hash} to the
pinned agent through AgentClient.apply_topology.
- Status machine still moves PENDING -> DEPLOYING -> ACTIVE on 2xx,
PENDING -> DEPLOYING -> FAILED on error; master remains the sole
owner of the row.
Teardown flips to TEARING_DOWN, fires /topology/teardown, then
TORN_DOWN — we log a warning on agent error but still settle to
TORN_DOWN so operators can delete the row (agent garbage is cleaned
on the next re-enroll).
Unihost deploys are unchanged — the field defaults to NULL so every
existing flow takes the local path.
Step 6 of the agent <-> topology integration.
Three new RPCs mirroring the existing deploy/teardown/status pattern:
- apply_topology(hydrated, version_hash) — long-timeout (600s) for
image pulls + compose up.
- teardown_topology(topology_id) — 300s timeout; enough for a
stubborn compose-down without hanging a heartbeat.
- get_topology_state() — short control-plane read for reconcile.
The per-call timeout swap uses the same trick as .deploy().
Step 5 of the agent <-> topology integration.
New mTLS-protected routes on the agent:
- POST /topology/apply — master pushes {hydrated, version_hash}.
Validates the hash matches locally (serialisation drift guard),
runs the topology through the same validator/composer pipeline
used master-side, then creates bridges + compose up + records the
apply in topology.db.
- POST /topology/teardown — dismantles compose, removes bridges,
clears topology.db. Idempotent.
- GET /topology/state — returns applied row + live docker
observation for the heartbeat.
Implementation lives in decnet/agent/topology_ops.py; it reuses the
private compose helpers from decnet.engine.deployer so we don't
duplicate compose/project-name plumbing. The apply path is sync
under the hood (docker SDK + subprocess); we hop to a thread so the
event loop keeps servicing other agent traffic.
v1 is one-topology-per-agent; cross-topology apply returns 409.
Step 4 of the agent <-> topology integration.
Single-row sqlite tracking which topology the agent last applied and
its version hash. Sync/stdlib, same pattern as the log-forwarder
offset store. v1 is one-topology-per-agent; attempting to apply a
different topology over a populated row raises AlreadyApplied so the
endpoint can return 409. observed() snapshots live docker state
(decnet-topology-* bridges + decnet-* containers) for the heartbeat.
The store is a cache, not authority — no auto-restore on boot.
Master remains the only source of truth.
Step 3 of the agent <-> topology integration.
Tiny pure helper both master and agent will use to answer "is the
applied state the one we expect?". SHA-256 of canonical JSON with
volatile keys (timestamps, status, version, canvas x/y/w/h) stripped
so the hash only captures deployment-relevant state.
Step 2 of the agent <-> topology integration.
Adds the `target_host_uuid` FK on `Topology` plus wiring through the
two create endpoints (`POST /topologies`, `POST /topologies/blank`).
Validates the mode/host pair: `mode='agent'` now requires a known,
routable host; `mode='unihost'` must leave the field unset.
Surfaced on `TopologySummary` so list/detail responses expose it.
Purely additive at the schema level — existing unihost flows unchanged
(field defaults to `NULL`).
Step 1 of the agent <-> topology integration.
Active/degraded/failed/deploying topologies cannot be deleted
without first transitioning to torn_down, but the UI had no way
to trigger that. Add POST /topologies/{id}/teardown mirroring the
deploy endpoint (background task, 202 Accepted), and a
click-to-arm TEARDOWN button on the topology list card that shows
whenever the row is in a teardown-eligible state.
MazeNET publishes gateway ports on the host via Docker. With the
default userland-proxy enabled, attacker connections appear to
originate from the bridge gateway instead of the real remote IP.
Log a soft warning at deploy time when the topology publishes any
ports and docker info reports UserlandProxy=true, pointing the
operator at the daemon.json toggle. Best-effort: daemon talk
failures silently no-op.
A prior half-torn-down topology can leave a bridge network alive under
a different name that still owns our intended subnet. Docker then
rejects our create with 'Pool overlaps with other one on this address
space', and the topology deploy fails.
Extend create_bridge_network to sweep any unused bridge whose IPAM
subnet matches the one we're about to claim (skipping networks with
running containers — those are live use).