DECNET

Author	SHA1	Message	Date
anti	2bcef50ac5	feat(webhooks): circuit breaker auto-disables misbehaving subscriptions After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed deliveries, the worker calls trip_webhook_circuit(uuid, ts) which flips enabled=False and stamps auto_disabled_at. The worker sets its reload flag so the next dispatch epoch stops consuming events for the tripped sub entirely — one dead receiver can't poison the shared egress pool anymore. Operator clears the trip via PATCH — setting enabled=True when the sub was previously disabled clears auto_disabled_at, zeros consecutive_failures, and clears last_error. Admin-pause → re-enable hits the same path harmlessly. Three observable states now distinguishable in the UI: - Active enabled=True, auto_disabled_at=NULL - Admin-paused enabled=False, auto_disabled_at=NULL - Tripped enabled=False, auto_disabled_at=<ts> UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and a "N TRIPPED" count in the page header. Hover tooltip tells the operator how to reset ("Re-enable via Edit"). record_webhook_failure now returns the new consecutive_failures count so the worker can compare against the threshold without a second roundtrip. trip_webhook_circuit is idempotent — re-tripping just re-stamps auto_disabled_at. Closes THREAT_MODEL WH-02 and DEBT-037 §1.	2026-04-24 16:24:33 -04:00
anti	638236113d	feat(webhooks): non-blocking http:// warning + WH-03 accepted risk WebhookResponse now carries a `warnings: list[str]` field. When the subscription's URL starts with http://, an `insecure_url` advisory is surfaced on every GET/CREATE without blocking the request. HMAC still detects tampering regardless of transport — only read-confidentiality is lost over plaintext — and test/dev environments without TLS stay usable. Matches the operator-trust posture already established by DA-06 (admin-on-admin protection is out of scope). The alternative — hard rejection at admin time — was considered and declined; warning-plus- visibility is the right shape. THREAT_MODEL WH-03 accepted risk registered; revisit triggers are multi-admin delegation, a regulated customer, or an operator ticket asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.	2026-04-24 15:53:30 -04:00
anti	b70845a85d	feat(webhooks): subscription CRUD + HMAC-signed delivery client Introduces the webhook egress foundation — a new WebhookSubscription table, admin-gated CRUD under /api/v1/webhooks, and the shared delivery client that both the test-ping route and the upcoming worker will use. No worker yet; this commit is API + model + client only. Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands to bus-topic patterns at the router layer; storage is always the raw pattern list. Advanced mode lets admins supply raw NATS-style patterns directly. Filter-at-subscribe: the worker (next commit) will subscribe to the union of patterns across enabled subscriptions. Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature), retry on 429/5xx/network errors with jittered backoff, no-retry on 4xx. Secrets never leave the server on GET/LIST — only the create response carries the secret for copy-out. CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after every mutation so the (future) worker can hot-reload. Opens DEBT-037 for the deferred items (circuit breaker, dead-letter, batch delivery, payload templates, secret-at-rest).	2026-04-24 15:30:05 -04:00
anti	162f7c1194	feat(api/sse): per-user connection cap + viewer-safe invariant New decnet/web/sse_limits.py provides sse_connection_slot, an async context manager that counts live SSE connections per user UUID and raises 429 when a per-user cap is exceeded (default 5, override via DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their first async with, so the cap check fires before any stream data is yielded. The cap must sit inside the generator — StreamingResponse returns before the generator body runs, so a handler-level wrapper would release the slot immediately. Put prefetch + slot + loop all under the one async with. Also documents F6/I (role leakage) as mitigated-by-construction via handler docstrings: every event type on both streams wraps data already reachable via viewer-gated REST, so no per-event filter is needed until a new event family is introduced. The invariant is written into the handler docstrings so a future PR can't silently add admin-only events. Resolves THREAT_MODEL F6/I and F6/D.	2026-04-24 15:01:20 -04:00
anti	df84981954	feat(api): pin response_model on dict-returning mutation routes Every mutation route that returned an untyped dict now declares response_model at the decorator. MessageResponse covers the eight {"message": ...} envelopes (change-password, mutate-decky, mutate- interval, update-deployment-limit, update-global-mutation-interval, delete-user, update-user-role, reset-user-password). Purpose-built models cover the richer shapes (DeployResponse for /deckies/deploy, PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans, UserResponse for /config/users). 204-No-Content and Response/ ORJSONResponse routes stay as-is. The wire shape for clients is unchanged — the envelopes already only shipped a message field. What changes is that a handler which accidentally returns a richer dict (e.g. a full user row including password_hash) would be silently stripped to the declared fields at serialization time. Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs and /attackers search routes LIKE-scan unbounded columns, but both are admin-gated, limit-capped, and operator rate-limit scope per DA-04. FTS5 stays a performance TODO, not a security blocker.	2026-04-24 14:27:58 -04:00
anti	a935bf2663	feat(api): cap offset on list-topologies and transcript endpoints The other five query endpoints (/logs, /attackers, /attacker-commands, /bounties, /topologies/{id}) already declared le=2147483647 on offset; these two were inconsistently uncapped. Bring them in line to close the F4/D deep-pagination row. Also resolves F4/T (ORM sort injection — already mitigated by the regex pattern on /attackers sort_by, no other route accepts a column name) and F4/D (limit cap — already universal) with code pointers.	2026-04-24 14:14:25 -04:00
anti	99ccd41bb5	feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options Harden the attacker-controlled artifact download path (F7) with explicit response headers instead of relying on Starlette's defaults (which only emit attachment for non-ASCII filenames and never set nosniff). Also resolves the THREAT_MODEL F7 path-traversal row (containment check was already in _resolve_artifact_path) and the fleet-deploy detail=str(e) audit (all four sites are admin-gated deliberate validator UX or structured worker-response fields).	2026-04-24 13:24:34 -04:00
anti	1e7703d64d	refactor(db): name the keystroke-dynamics thresholds + add max_pause_gap Follow-ups on `9232031` per review: - Module-level constants KD_PAUSE_BURST_MAX_S (0.2s), KD_PAUSE_THINK_MAX_S (1.5s), KD_START_OF_ACTION_IDLE_S (2.0s). Docstrings reference them by name; future calibration against real session data only has to touch one place. Threshold for "started a new action" raised from 1s → 2s — 1s catches too much mid-command hesitation to be empirically bimodal. - New column kd_max_pause_gap (seconds). The distracted bucket count alone can't distinguish one 3s pause from three 60s pauses; max-gap carries that signal in one cheap scalar (vs widening the histogram to a fourth bucket). - Scope-framing docstring above the whole kd_* section: intended use is session clustering / tooling attribution, explicitly NOT biometric identity, admission decisions, or ML-driven user ID. Keeps a future well-intentioned contributor from walking the project into legal/ethics territory by accident. - TODO comment on kd_top_bigrams: v1's JSON-in-TEXT is fine for "show the top digraphs on the attacker page". If bigram-similarity queries become hot, promote to a session_bigram_stats(sid, bigram, count, mean_iat_s) table or Postgres JSONB + GIN. Neither changes the write-side ingester materially. No new migration helper — pre-v1 schema additions go through create_all on fresh DBs; the existing _migrate_session_profile_table stays but does not get extended. Alembic lands at v1 and sweeps all the ad-hoc migrations at once.	2026-04-24 10:49:38 -04:00
anti	9232031ec7	feat(db): extend SessionProfile schema with DEBT-036 keystroke features Adds the three signal columns motivated by the manual keystroke analysis in DEBT-036 directly to the SessionProfile table. Pre-v1 so we modify the schema in place — Alembic arrives at v1. Columns: - kd_top_bigrams (TEXT) — JSON of top-N most-common digraphs with mean IAT per bigram. Complements kd_digraph_simhash ("same typist?") with "same typist in same mental state?" (tired / rested / distracted shifts bigram-specific IATs measurably). - kd_start_of_action_latency (REAL/DOUBLE) — median IAT of the first keystroke after an idle gap > 1s. Separates "initiating a command" from "executing a remembered one"; real humans have measurable start-of-action latency, bots don't. - kd_pause_hist_burst / _think / _distracted (INT) — three-bucket histogram (counts, <0.2s / 0.2-1.5s / >1.5s). More discriminating than the existing flat burst_ratio / think_ratio pair: C2 operators concentrate in burst with a thin tail; opportunistic humans have a fat think bucket and a long distracted tail. Both backends get an idempotent ADD COLUMN migration (_migrate_session_profile_table) wired into initialize() alongside the existing _migrate_attackers_table path — guards on PRAGMA table_info (SQLite) / information_schema.COLUMNS (MySQL) so reruns are safe. PII discipline comment on kd_digraph_simhash and kd_top_bigrams: both operate on bigram CHARACTERS, never on raw input stream content. Attacker passwords typed over SSH must not land here. Test updated for the MySQL initialize() migration-order contract.	2026-04-24 10:45:48 -04:00
anti	323077b383	fix(web/transcripts): fall back to shard-scan when Log row has no shard_path sessrec.c emits the session_recorded SD blob with sid/service/src_ip/ duration_s/bytes/truncated — it never emitted shard_path. The web handler still asked for fields.shard_path, got "", tripped the sessions-YYYY-MM-DD.jsonl basename regex and returned 400 "invalid shard name" for every legitimate transcript request. Handler now: - Fast-paths when fields.shard_path IS present and validates (for any future emitter or ingester that backfills it). - Otherwise enumerates sessions-YYYY-MM-DD.jsonl shards under ARTIFACTS_ROOT/{decky}/{service}/transcripts/ (newest first) and returns the first one whose per-sid index contains our sid. - Security invariant preserved: only files whose basename matches the _SHARD_BASENAME_RE are ever opened, and they always resolve inside ARTIFACTS_ROOT. A forged fields.shard_path is silently ignored. - Soft-fails OSError/PermissionError on the transcripts dir (decky containers often write it with a uid the API can't read) — returns 404 instead of a 500 traceback. test_forged_shard_path_blocked updated to match the new semantics: forgery is ignored, the real shard is served via fallback. The invariant (no /etc/passwd access) is still asserted by the fact that status is 200 with data from the test shard.	2026-04-24 01:18:40 -04:00
anti	bfff212a05	fix(api): gate embedded Docker-log collector on DECNET_EMBED_COLLECTOR The API lifespan unconditionally spawned log_collector_worker, appending every container line to DECNET_INGEST_LOG_FILE. On hosts that also run decnet-collector.service (installed by 'decnet init') that's two tailers writing the same events to the same file — the ingester then inserts each event twice and the dashboard shows every command duplicated. Add DECNET_EMBED_COLLECTOR (default false), matching the existing DECNET_EMBED_PROFILER and DECNET_EMBED_SNIFFER pattern directly above this block. Single-process dev setups without systemd can flip it on to restore the all-in-one behaviour; multi-process production gets the single-writer invariant by default.	2026-04-24 00:47:37 -04:00
anti	d61e143b71	fix(stress): unblock Locust runs from login rate-limit self-DoS Locust spawns N virtual users (default 1000), all from 127.0.0.1 as admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so the 11th on_start() got 429 and a RuntimeError. A @task(2) login in the task weights turned the whole run into a 429 factory even after ramp-up. And _login_with_retry treated 429 as non-retryable, so there was no graceful degradation path. Three changes, one root cause: - decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true). When false, slowapi's Limiter(enabled=False) makes @limiter.limit a no-op. Default ships unchanged; nobody should ever release with this off. - tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the uvicorn subprocess env. Stress tests measure throughput, not rate limiting. - tests/stress/locustfile.py: drop the @task(2) login — it added zero coverage (every user already logs in at on_start) and only generated contention. Teach _login_with_retry to honour 429 + Retry-After so a Locust pointed at a limiter-enabled server degrades gracefully instead of crashing on_start.	2026-04-24 00:13:15 -04:00
anti	26d04d5eb8	fix(db): SessionProfile.kd_digraph_simhash must be BINARY(8), not BLOB MySQL can't index a BLOB/TEXT column without a prefix length, so create_all() on a fresh MySQL schema blew up with "BLOB/TEXT column 'kd_digraph_simhash' used in key specification without a key length". SimHashes are a fixed 8 bytes — the variable-length type was a SQLAlchemy-side auto-mapping from 'Optional[bytes]', not an actual schema requirement. Switch to BINARY(8), which is portable: MySQL gets a fixed-width indexable BINARY, SQLite treats it as BLOB and doesn't care about key length.	2026-04-23 22:06:38 -04:00
anti	0eb0b32c7a	refactor(swarm): enroll bundle switches from exclude list to include list Exclude lists fail open — anything new at the master's repo root (venvs, logs, dev notes, .env.local, local DB dumps) silently leaks into every agent bundle. On this box a stray .311 venv (335 MB) + logs/ (220 MB) bloated the tarball to ~150 MB and blew test_enroll_bundle timeouts. Replace _EXCLUDES + _is_excluded with _INCLUDED_ROOT_FILES + _INCLUDED_DIRS + _EXCLUDED_DECNET_SUBTREES and iterate via os.walk with in-place dirnames[:] pruning so master-only subtrees (decnet/web, decnet/mutator, decnet/profiler) and __pycache__ aren't descended into at all. Bundle contents are now strictly: pyproject.toml + the decnet/ package minus the three master-only subtrees. Synthetic entries (INI, certs, systemd units) unchanged — they were always added inline, not from the tree walk. test_enroll_bundle.py: 20/20 pass in 24s (was timing out at 15s/test).	2026-04-23 21:47:47 -04:00
anti	ffc275f051	feat(geoip): country-code enrichment via RIR delegated-stats Populates Attacker.country_code + country_source (MVP) using the five RIR delegated-stats files (ARIN/RIPE/APNIC/LACNIC/AFRINIC). Offline, license-free, no outbound traffic that could burn honeypot stealth. - decnet.geoip package with factory/base/lookup + rir/ subpackage (fetch/parse/provider) mirroring the db + bus factory convention - Profiler._build_record calls enrich_ip on every upsert - Idempotent ALTER TABLE migrations for both SQLite and MySQL - decnet geoip refresh/lookup CLI (master-only) - /var/lib/decnet/geoip seeded by decnet init - DECNET_GEOIP_ENABLED=false kill-switch; set in tests/conftest.py so unit tests never trigger the first-access fetch	2026-04-23 21:12:38 -04:00
anti	ef4179ea1f	feat(api): opaque 500 handler + error_id correlation for unhandled exceptions Registers a generic @app.exception_handler(Exception) that catches anything uncaught in route handlers / dependencies. Prod response is opaque: {detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode (DECNET_DEVELOPER=True) adds exception_type and traceback fields so failures are debuggable without tailing server logs. The error_id is logged alongside the full traceback server-side, letting operators correlate a user's 500 report with the exact exception via `grep <error_id> /var/log/decnet.log`. FastAPI's own HTTPException routing and the existing RequestValidationError / ValidationError / RateLimitExceeded handlers still take precedence — this handler only fires on genuinely-uncaught exceptions. Flips threat model F1/I 'traceback / stack trace leakage' from ? to M and logs a follow-up checklist entry for 4 detail=str(e) sites in the fleet deploy router (admin-gated, different threat class, separate audit).	2026-04-23 14:07:32 -04:00
anti	2f4f81e5de	feat(api): rate-limit /auth/login + scaffold threat model Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per 5 minutes per-IP AND per-username, tripping either → 429. Per-IP catches botnets hitting one account; per-username catches distributed credential stuffing against one account. In-memory storage: dashboard API is single-process, Redis is disproportionate for v1. X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy deployments get one shared bucket per proxy IP. Logged in the threat model as accepted risk DA-08, to be revisited when a verified-proxy config lands. Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element methodology, system-context DFD, and Dashboard↔API as the first fully worked component (7 sub-flows, ~50 threat entries). F1 Authn ships with 3 threats mitigated: rate limit (new), uniform 401 (verified already in place), bcrypt length clamp (verified already in place via Pydantic max_length=72).	2026-04-23 13:25:28 -04:00
anti	8cbb7834ef	feat(web): SMTP victim-domain + stored-mail panels on attacker detail Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail (admin) endpoints, plus two new sections on the attacker detail page: VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL with a drawer that decodes headers, lists attachments, and downloads the raw .eml via the existing artifact endpoint (?service=smtp).	2026-04-22 22:33:53 -04:00
anti	d43303251d	feat(profiler): track SMTP victim domains per attacker New SmtpTarget table records each (attacker, domain) pair observed via the SMTP honeypots. Only the domain is stored — local-parts are dropped at ingestion, so this table holds no user-identifying data beyond the target organisation's identity. The profiler worker extracts domains from rcpt_to / rcpt_denied / message_accepted events, normalizes them (lowercase, strip local-part, drop blocked TLDs), and upserts one row per pair with a running count + first_seen / last_seen. Three repo methods shipped: * increment_smtp_target(attacker, domain) — upsert + bump * list_smtp_targets(attacker) — per-attacker view * smtp_target_seen(domain) — cross-attacker aggregate, shaped as the federation-gossip RPC that V2 will expose. The gossip-query shape is load-bearing: each operator can answer "have any of your attackers targeted corp1.com?" without leaking which attackers or when — the aggregate returns a bool + total count + first/last seen, nothing else.	2026-04-22 22:23:27 -04:00
anti	c50448995b	feat(smtp): capture full messages + attachments to disk SMTP template now writes each accepted DATA body as a .eml file into a bind-mounted per-decky quarantine dir and emits a `message_stored` log with sha256, size, decoded headers, and an attachment manifest (filename + sha256 + size + content-type). Attachment hashing uses the decoded payload so operators can match against VT / MalwareBazaar directly. Body accumulator is capped at SMTP_MAX_BODY_BYTES (default 10 MB, matching the EHLO SIZE advert) so a streaming client can't OOM the container. The existing /api/v1/artifacts/{decky}/{stored_as} endpoint now takes an optional ?service= query param (defaults to ssh for back-compat) and can serve .eml files out of the smtp subdir. Forensic metadata rides the normal log pipeline, same as SSH file_captured.	2026-04-22 22:17:50 -04:00
anti	d47a84c90b	refactor(models): split models.py into topical submodules decnet/web/db/models.py was approaching 1000 lines across User/Log/ Attacker/Swarm/Topology/Workers/Updater/Health domains. Split into a package with one module per domain; __init__.py re-exports every symbol so all 52 call sites keep importing from decnet.web.db.models unchanged.	2026-04-22 21:55:41 -04:00
anti	119b4e8724	feat(db): add session_profile table for keystroke-dynamics fingerprints New purpose-built table with schema_version column committed from day one so V2 federation gossip can cluster sessions across operators without retrofitting. Ships with the empty write path (upsert_session_profile); ingestion of keystroke features (IKI moments, control-char rates, digraph SimHash) is tracked as V2 work. Closes gap #2 from SIGNAL_CAPTURE_AUDIT.md.	2026-04-22 21:39:17 -04:00
anti	d3321324eb	feat(sniffer): capture SSH client banner from TCP stream Parse RFC 4253 §4.2 identification strings from the first attacker→decky data segment on TCP/22; emit ssh_client_banner syslog events and bus fan-out. Profiler's sniffer_rollup dedupes observed banners into a new AttackerBehavior.ssh_client_banners JSON column. Closes gap #3 from SIGNAL_CAPTURE_AUDIT.md.	2026-04-22 21:37:01 -04:00
anti	8181f39ae2	feat(profiler): persist raw SSH KEX algorithm ordering Prober already emits kex_algorithms in hassh_fingerprint syslog events, but the raw ordered list was only queryable via the generic bounty store. Add a dedicated AttackerBehavior.kex_order_raw column (TEXT, JSON list) so post-v1 KEX-order fingerprinting has a typed, indexable home. Pipeline: - sniffer_rollup() now consumes hassh_fingerprint events and collects distinct kex_algorithms strings across ports. - build_behavior_record() JSON-encodes the list (NULL when empty). - sqlmodel_repo._deserialize_behavior() parses it back into a list. Closes pre-v1 gap #1 from SIGNAL_CAPTURE_AUDIT.md.	2026-04-22 21:29:46 -04:00
anti	5704e8fcce	fix(topology): delete topology_mutations in delete-cascade delete_topology_cascade manually deletes status_events, edges, deckies and lans but overlooked topology_mutations, so deleting any topology that ever had a mutation enqueued (i.e. edits while active\|degraded) failed with an FK IntegrityError. Add the missing DELETE and extend the cascade test to seed a mutation row.	2026-04-22 17:50:30 -04:00
anti	3f460bab84	feat(web): show MazeNET decky running count + roll into dashboard MazeNET header now reports '{running}/{total} DECKIES RUNNING' so operators can see per-topology runtime status at a glance. Dashboard ACTIVE DECKIES counters used to reflect only the fleet state file; TopologyDecky rows (MazeNET deployments) are now added in — deployed_deckies = fleet + all topology rows, active_deckies = fleet (no runtime field) + topology rows whose state is 'running'.	2026-04-22 17:48:04 -04:00
anti	6f537f52c2	fix(topology): remove DMZ gateway auto-attach on LAN create POST /topologies/{id}/lans previously called _auto_attach_gateway() whenever a non-DMZ LAN was created, which wired the DMZ gateway decky to every new subnet. That's why a deployed gateway ended up with eth0..ethN on every LAN regardless of what the user drew in MazeNET. Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time validator (decnet/topology/validate.py:65-110) stays strict — users must explicitly wire the gateway to each subnet they want bridged, which is the whole point of having a topology editor. useMazeApi.ts: drop stale auto-bridge reference from comment.	2026-04-22 17:14:09 -04:00
anti	13ea916943	feat(workers): add start + start-all endpoints (systemd supervisor) POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown worker, 503 if the unit file is not installed, 502 if systemctl returns non-zero (stderr snippet in detail, full stack logged). Admin only. POST /api/v1/workers/start-all — best-effort: walks the worker list in dependency order (bus → api → data-plane), skips already-active and uninstalled units, aggregates outcomes into {started, already_running, failed[]}. Returns 200 even on partial failure; the caller reads the three lists. Both endpoints delegate to the systemd_control helper, so the attack surface for "what gets executed" is locked to `decnet-<validated-name> .service` at two layers (router KNOWN_WORKERS + helper regex).	2026-04-22 14:12:29 -04:00
anti	0fbb07c2ec	feat(workers): bus-backed Workers panel (registry, control, installed flag) Ships the backend half of Config → Workers: * Worker registry aggregates `system..health` + `system.bus.health` heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop out of a 90s window (3× the 30s heartbeat interval). `GET /api/v1/workers` returns the snapshot plus `bus_connected` (so the UI can explain "all UNKNOWN" when the bus socket is down) and a per-row `installed` flag populated from `systemctl list-unit-files decnet-.service` (cached 30s). `POST /api/v1/workers/{name}/stop` publishes a stop intent on `system.<name>.control`; workers listen via the shared control listener in `bus/publish.py`. * Heartbeat + control listener wired into collector / profiler / sniffer / prober / mutator worker loops. API self-heartbeats too so the panel always has one ground-truth row. * Topic helper `system_control(name)` + tests covering builder validation, control listener shutdown path, and the API surface (auth gating, bus-connected field, unknown-name 404). Adds `StartFailure` / `StartAllResponse` models in anticipation of the upcoming start endpoints (DEBT-034).	2026-04-22 14:10:39 -04:00
anti	fcaac648a4	feat(web): add systemd_control helper for worker unit management Thin async wrapper over `systemctl` — never shell=True, always create_subprocess_exec. Unit names are built from `decnet-<validated-name>.service`; the regex check is defence in depth on top of the router-level KNOWN_WORKERS validation. Exposes start / stop / is_active / list_installed; last is cached for 30s to keep the Workers panel cheap under REFRESH spam. On non-systemd hosts list_installed returns an empty set, so the UI renders with every row marked not-installed instead of 500-ing.	2026-04-22 14:08:35 -04:00
anti	6725197d58	test(web): transcripts API + attacker-transcripts router coverage Paging, truncation surfacing, admin gate, path traversal, sid-regex and decky-mismatch rejection for /transcripts; mirror coverage for /attackers/{uuid}/transcripts. Flips the Session Recording box in the roadmap (sessrec pty relay now shipping end-to-end).	2026-04-21 23:11:40 -04:00
anti	6e522c5a55	feat(web): transcripts API + repository lookups Adds get_attacker_transcripts (mirror of artifacts for session_recorded logs) and get_session_log for sid→shard resolution. New /api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events out of the shared JSONL day-shard via an mtime-keyed byte-offset index — never scans the whole shard per request. New /api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both endpoints admin-gated.	2026-04-21 23:06:39 -04:00
anti	8f25ff677f	feat(engine,api): add orphan topology resource reaper Topology rows deleted without a proper teardown leave Docker containers and bridge networks behind, holding IPAM pools that cause 403 "Pool overlaps" on the next deploy at the same subnet. - engine/reaper.py walks the local Docker daemon, extracts the 8-char topology prefix from every decnet_t_* resource, and force-removes containers + networks whose prefix is not in the repo. - POST /api/v1/topologies/reap-orphans (admin-only) returns a report of live/orphan prefixes and what was removed. - Resources belonging to live topologies are never touched; per-resource errors are captured without aborting the sweep.	2026-04-21 22:13:44 -04:00
anti	c266d1b6e3	feat(mutator,web): add_decky op — create-and-attach in one mutation apply_attach_decky requires an existing decky, so the MazeNET editor had no way to grow a live topology: creating a new decky on active topologies 409'd on the direct-CRUD createDecky call. - Backend: new apply_add_decky that creates the decky row + its home-LAN edge atomically, auto-allocating an IP if none pinned. Post-apply validation still runs. Added to DISPATCH + _MUTATION_OPS Literal + CLI help text. - Tests: 3 new ops tests (happy path, duplicate-name rejection, missing-LAN rejection) plus dispatch coverage update. - Frontend: useTopologyEditor gains addDeckyToLan() composite. Pending routes through createDecky + attachEdge as before; active routes through a single add_decky enqueue. MazeNET.tsx drag-archetype, duplicate, DMZ-gateway, and ctx-menu add-decky paths all use the composite so active topologies stop 409'ing on new-decky drops.	2026-04-21 20:13:39 -04:00
anti	cbb394a160	feat(ingester): publish system.log per committed batch (DEBT-031 worker 6) Ingester connects the bus at startup, emits a batch-committed summary (component/flushed/position) after each successful _flush_batch. Zero- row flushes are suppressed so the topic stays meaningful. Complements the collector's per-line system.log publishes: collector signals ingress, ingester signals DB-persisted progress. Federation forwarder (worker 8) will subscribe to the batch-committed leaf to trigger its upstream push. Bus stays optional: publish_safely swallows failures, get_bus() can return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully functional.	2026-04-21 16:58:49 -04:00
anti	f611e7363b	feat(mutator,web): live topology mutation pipeline backend (DEBT-030) Wire the mutator and web API into the service bus so live-topology edits flow sub-second from enqueue to UI: - Mutator publishes every state transition on the bus (mutation.applying /applied/failed + topology.status). Fire-and-forget; DB stays source of truth. - Mutator watch loop subscribes to topology.*.mutation.enqueued and wakes early via asyncio.Event — the 10s poll becomes a fallback heartbeat, not the primary dispatch trigger. - POST /topologies/{id}/mutations publishes mutation.enqueued after the DB write succeeds. - New GET /topologies/{id}/events SSE route: snapshot on connect (status + in-flight mutations), live forwards topology.{id}.> bus events, 15s keepalive. ?token= auth mirrors /stream. - New decnet/bus/app.py — process-wide lazy bus singleton for the API, closed cleanly on lifespan shutdown.	2026-04-21 14:38:25 -04:00
anti	071312fc0c	feat(web/api): expose archetype catalog endpoint /api/v1/topologies/archetypes returns the archetype registry (slug, display name, description, preferred services/distros, nmap_os fingerprint) so the frontend wizard can render a live catalog instead of hardcoding a copy.	2026-04-21 10:24:01 -04:00
anti	542637c0dc	feat(web/api): support PATCH on proxy and CORS The web bundle proxy handled GET/POST/PUT/DELETE but not PATCH or preflight OPTIONS, which broke browser calls to PATCH endpoints behind the static-bundle server. CORS middleware had the same gap.	2026-04-21 10:23:55 -04:00
anti	12e18b75db	feat(swarm): expose needs_resync on TopologySummary + upsert record_error Two small observability follow-ups to the phase-1 agent/topology wiring: TopologySummary now carries needs_resync so operators can see the heartbeat's resync flag via the topology list/detail API without dropping into the DB. TopologyStore.record_error becomes an upsert — when a docker/compose failure fires during the first materialise (put() never reached), we still land a marker row so GET /topology/state surfaces the error and the next heartbeat carries an empty applied_version_hash. That empty hash is what master's heartbeat check relies on to flag the topology for resync instead of assuming the apply succeeded.	2026-04-21 01:41:30 -04:00
anti	e8f9c955b3	feat(swarm): heartbeat-driven topology resync for agent-pinned deployments Agent heartbeats now carry an applied-topology snapshot. The master heartbeat handler compares the reported version_hash against what canonical_hash yields for the hydrated topology pinned to that host and flags Topology.needs_resync on divergence (or when the agent reports no topology at all while master expects one). The mutator watch loop gains reconcile_agent_resyncs, which re-pushes the current hydrated blob via AgentClient.apply_topology without touching status, then clears the flag on success. Push failures leave the flag set so the next tick retries.	2026-04-21 01:35:12 -04:00
anti	5a0cf5d7c8	feat(topology): add target_host_uuid to pin topologies to swarm agents Adds the `target_host_uuid` FK on `Topology` plus wiring through the two create endpoints (`POST /topologies`, `POST /topologies/blank`). Validates the mode/host pair: `mode='agent'` now requires a known, routable host; `mode='unihost'` must leave the field unset. Surfaced on `TopologySummary` so list/detail responses expose it. Purely additive at the schema level — existing unihost flows unchanged (field defaults to `NULL`). Step 1 of the agent <-> topology integration.	2026-04-21 01:19:45 -04:00
anti	b261e8e5fa	feat(topology): add teardown endpoint + UI button Active/degraded/failed/deploying topologies cannot be deleted without first transitioning to torn_down, but the UI had no way to trigger that. Add POST /topologies/{id}/teardown mirroring the deploy endpoint (background task, 202 Accepted), and a click-to-arm TEARDOWN button on the topology list card that shows whenever the row is in a teardown-eligible state.	2026-04-20 23:41:37 -04:00
anti	be4e1b1891	feat(mazenet): auto-bridge new LANs to the DMZ gateway When a non-DMZ LAN is created via POST /lans, look up the topology's gateway (decky with forwards_l3=True attached to the DMZ) and insert an edge binding it to the new LAN. The gateway becomes multi-homed to every internal LAN automatically, so DMZ_ORPHAN cannot arise from ordinary editor use. Also fixes delete_lan: the home-decky guard used scalar_one_or_none, which blew up when the gateway already had >1 'other' LAN edge. Switch to scalars().first() — we only need to know some other edge exists, not a unique one.	2026-04-20 23:07:19 -04:00
anti	cc9765e54e	fix(mazenet): drop fictional host-mode on DMZ gateway stub POST /topologies/blank seeded the gateway decky with archetype=host-gateway + network_mode=host, but neither was wired: no compose writer reads network_mode and host-gateway is not a real archetype. Replace with archetype=deaddeck + forwards_l3=true so the gateway is a normal multi-homed bridge decky, consistent with how compose.py interprets forwards_l3 (sysctl + NET_ADMIN). Edge marked is_bridge=true, forwards_l3=true so downstream readers (generator, compose, validator) see a real bridge attachment.	2026-04-20 23:06:54 -04:00
anti	d06b04221f	feat(api/topology): live mutation queue endpoints (POST/GET /mutations)	2026-04-20 19:38:55 -04:00
anti	ff0b2efbb0	feat(api/topology): pending-only child CRUD for LANs, deckies, edges	2026-04-20 19:37:16 -04:00
anti	999113e3c3	feat(api/topology): POST/DELETE/deploy endpoints for MazeNET topologies	2026-04-20 19:34:35 -04:00
anti	38db76dd14	fix(api): document 400 on topology read endpoints for schemathesis contract DECNET's app-level RequestValidationError handler remaps structural 422→400, including query/path constraint violations (limit bounds, the next-subnet base pattern, etc.). Schemathesis fuzzing will drive those code paths and fail response_schema_conformance unless 400 is declared in responses={}. Adds the entry to every phase-3 read route.	2026-04-20 18:30:32 -04:00
anti	f182c98ffa	feat(api): phase 3 step 2 — topology read endpoints (list/get/status/catalog) GET /api/v1/topologies — paginated list with status filter. Extends repo.list_topologies() to accept limit/offset and adds count_topologies() for the total envelope field. GET /api/v1/topologies/{id} — hydrated TopologyDetail; 404 if missing. GET /api/v1/topologies/{id}/status-events — audit trail, limit-capped. Catalog helpers for the phase-4 canvas UI: * GET /topologies/services — full service catalog. * GET /topologies/next-subnet?base=172.20 — wraps SubnetAllocator against reserved_subnets across non-torn-down topologies. * GET /topologies/{id}/lans/{lan_id}/next-ip — IPAllocator pre-seeded with existing decky IPs in that LAN. All read routes are viewer-or-admin. Sub-routers are included in an order that keeps literal catalog paths (/services, /next-subnet) from being shadowed by the /{topology_id} trie branch.	2026-04-20 18:25:33 -04:00
anti	2379b2aeda	feat(api): phase 3 step 1 — topology request/response models + router skeleton Add Pydantic DTOs in decnet/web/db/models.py covering every phase-3 endpoint shape: TopologyGenerateRequest, TopologySummary/Detail, child create/update requests, MutationEnqueueRequest (Literal op guard), MutationRow with JSON-payload decoder, validation/version/not-editable error envelopes, and the three catalog responses. Create decnet/web/router/topology/ as an import-safe package exporting topology_router (prefix /topologies) — sub-routers land step-by-step in subsequent commits. Mount under the main api router alongside swarm_mgmt. tests/api/topology/test_models.py pins repo-dict ↔ DTO parity so future repo-row drift breaks the contract test before the endpoints.	2026-04-20 18:16:30 -04:00

1 2 3 4

198 Commits