DECNET

Author	SHA1	Message	Date
anti	578cdf9e2e	fix(mutator): reject hostile apply_update_lan changes on live topologies subnet and is_dmz are pinned at deploy time — live deckies bind to the bridge with IPs allocated from the old subnet, and is_dmz flips the docker network's internal flag which can't be changed while containers are attached. Today the op happily wrote the new value into the DB and left docker on the old one, drifting the two surfaces. apply_update_lan now raises MutationError when topology status is active or degraded and the patch touches subnet or is_dmz. Coord (x/y) and rename updates still pass through; renames don't currently have a live caller and the bridge's docker name keys off the lan name in the renderer, so the next deploy will reconcile. This matches the posture taken by _materialise_lan_change for live LAN add/remove (commit `472c84b`).	2026-04-29 00:12:44 -04:00
anti	472c84b9c8	fix(mutator): materialise live LAN add/remove on docker, not just the DB apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted the topology_lans row but never created or destroyed the docker bridge network. Adding a LAN to a deployed topology silently did nothing on the substrate side; any decky later attached to it had nowhere to bind. Both ops now call a shared _materialise_lan_change helper after the DB write. When the topology is active/degraded and not pinned to a swarm agent, the helper: * creates / removes the docker bridge network (internal=True for non-DMZ LANs, mirroring engine/deployer.deploy_topology), * re-renders the per-topology compose file so future redeploys reflect the change. Failures are logged, not re-raised — the DB row stays as source of truth so an operator can retry without leaking inconsistent state. Agent-pinned topologies are skipped; the next agent push reconciles. apply_add_decky / apply_attach_decky have the same gap and are not fixed here — multi-homing a running container needs careful recreate-vs-network-connect handling and is its own commit. Without those, dropping a decky into a freshly-added LAN still won't spawn a container; only the LAN itself is now live.	2026-04-29 00:00:02 -04:00
anti	bbed52a962	fix(bus): topic segments can't contain dots — service.added → service_added Bus topic segments are NATS-style tokens and the validator at bus/topics.py:402 rejects '.', '*', '>', whitespace. My W3 constants 'service.added' / 'service.removed' tripped this on every live add/remove call: ValueError: topic segment 'service.added' may not contain '.', ... Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'. Aligned the SSE forwarder's name mapping (decky.<name>.service_added → SSE event 'decky.service_added') and the frontend's useTopologyStream listener + MazeNET.tsx event handler. Also updated the wiki entry with a note about the underscore.	2026-04-28 23:53:25 -04:00
anti	d595240f55	fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash deploy_topology was flipping to ACTIVE the moment 'compose up -d' returned 0, but compose returns 0 as soon as containers are started. A service that crashes on boot (port bind failure, bad image, missing entrypoint) left the topology row sitting at ACTIVE indefinitely while half the substrate was dead. After compose returns, we now run 'compose ps --all --format json', parse the newline-delimited per-container rows, and downgrade to DEGRADED with a reason listing the first eight unhealthy containers if anything isn't in state='running'. Operators see real state on the topology page instead of an optimistic flag. _compose_ps swallows compose-level errors (returns []) so an unrelated docker hiccup doesn't gate the success path — the existing in-flight exception path still catches genuine deploy failures with FAILED.	2026-04-28 23:39:50 -04:00
anti	0e5484648f	feat: forward decky..service. on per-topology SSE stream The /topologies/{id}/events SSE proxy now subscribes to two bus patterns concurrently and merges them through a bounded asyncio.Queue: * topology.{id}.> — lifecycle (status, mutation.) — unchanged. decky.> — per-decky events, filtered by payload.topology_id so a fleet decky sharing a name with a topology decky doesn't leak across. _sse_name_for routes 'decky.<name>.service.added' to the SSE event name 'decky.service.added' (kept the prefix so the frontend doesn't collide with topology lifecycle events that share leaf names like 'status'). useTopologyStream surfaces the two new event names; MazeNET.tsx's onStreamEvent optimistically patches the matching node's services list so a second tab reflects shape changes without a refetch.	2026-04-28 23:15:38 -04:00
anti	06f208c86e	feat: surface fleet_singleton flag on /topologies/services Adds a fleet_singletons array to ServiceCatalogResponse so per-decky add UIs can filter out services like LLMNR that run once fleet-wide (and would 422 server-side at the live add endpoint). The existing 'services: list[str]' field is unchanged for back-compat with MazeNET/useMazeApi.ts:257; the new field is additive. decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a module-scoped cache (registry only changes on BYOS install / plugin drop, neither of which happens mid-session) and exposes a precomputed .perDecky list so consumers don't need to re-derive the diff.	2026-04-28 23:08:29 -04:00
anti	6ac8cac908	feat(deckies): live service add/remove without full redeploy decnet.engine.services_live exposes add_service / remove_service for both fleet and topology decky scopes. The host's _compose() wrapper already supported per-service targeting (up --no-deps -d <svc>, stop, rm -f); what was missing was the orchestration around it: * add: validate against decnet.services.registry (rejects unknown + fleet_singleton); persist the new services list; re-render the per-scope compose file (so future redeploys reflect the change); run docker compose up -d --no-deps --build <decky>-<svc>. * remove: stop + rm -f the service container; persist; re-render compose so a future up -d doesn't bring it back. Both publish decky.<name>.service.added / .removed on the bus, with the post-mutation services list. Topic constants added to decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored). Four new admin endpoints: * POST/DELETE /api/v1/deckies/{name}/services{,/svc} * POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc} ServiceMutationError messages are mapped at the API boundary to 404 (decky/topology missing), 409 (idempotency violation), 422 (unknown or fleet_singleton service).	2026-04-28 22:51:42 -04:00
anti	0bc4b05c73	feat(deckies): generic file drops on fleet + MazeNET deckies Extracts the docker-exec-with-base64-stdin pattern out of canary/planter and orchestrator/drivers/ssh into a shared decnet.decky_io package. Both consumers now delegate; the canary planter test still proves the contract end-to-end. Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops. Container resolution is shared with the canary path: topology_id absent means fleet (<name>-ssh), present routes through resolve_decky_container which picks <name>-ssh when the topology decky exposes ssh, else the topology base container decnet_t_<id8>_<name>. Path validation rejects relative paths and '..' traversal at the request model layer. Bad base64 → 400; unknown topology → 404; decky not in topology → 422; docker exec failure → 409.	2026-04-28 22:43:34 -04:00
anti	3fe999d706	feat(canary): allow custom canaries on MazeNET deckies via API POST /api/v1/canary/tokens grows an optional topology_id field. When present, the server hydrates the topology, validates the named decky is in it, and resolves the docker container via planter.resolve_topology_container — <name>-ssh if the decky exposes ssh, else the topology base container. Absent ⇒ fleet semantics, unchanged. The token row gets a nullable topology_id column (no migration helper per pre-v1 policy). GET /api/v1/canary/tokens accepts ?topology_id= as a filter. DELETE re-resolves the container at revoke time so a redeployed topology is still reachable. 422 when the named decky isn't in the topology; 404 when the topology itself doesn't exist.	2026-04-28 22:34:45 -04:00
anti	5802de1f86	feat(canary): seed baseline canaries on MazeNET deckies Topology deploys now plant the configured canary baseline set on every decky in the topology, mirroring the fleet-deploy hook. Containers are resolved via resolve_topology_container — <decky>-ssh when the decky exposes an ssh service, else the topology base container decnet_t_<id8>_<decky>. The planter's plant/revoke/seed_baseline grow an optional container= kwarg; default preserves the fleet <name>-ssh resolution.	2026-04-28 22:30:11 -04:00
anti	e3ddeb0395	feat(bounty): surface file drops and stored mail in the Vault The Bounty Vault page only read from the Bounty table, but inotifywait-captured file drops (event_type=file_captured) and SMTP quarantined messages (event_type=message_stored) were only landing in the Logs table. AttackerDetail's tabs queried logs directly, so they showed up per-attacker but were invisible on the global Vault page. Mirror both events into Bounty as bounty_type=artifact with payload.kind ∈ {file, mail} so the existing dedup (bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an ARTIFACTS segment to the Vault filter row, plus dedicated render branches: file drops show orig_path + size + writer attribution; mail shows subject + From + attachment count + size, with the Mail icon distinguishing them from FileText for file drops. Forward-only — existing logs stay where they are. A backfill pass would be straightforward (read Log WHERE event_type IN ('file_captured', 'message_stored') and feed each row through _extract_bounty) but is out of scope here.	2026-04-28 19:42:54 -04:00
anti	88f276e9e7	feat(collector): drop native unix daemon syslog from ingestion sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon all share the SSH/telnet decky containers and write to the same syslog socket as DECNET's own emitters. Their output was being parsed and ingested into the JSON stream, the dashboard, and the profiler — pure noise: sshd's "Failed password for root from X" duplicates the auth-helper's structured auth_attempt event, pam_unix repeats it again, CRON/systemd say nothing about attacker behavior. Drop these APP-NAMEs in _should_ingest before the JSON write and bus publish. Raw .log file still captures everything for forensics. The denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators can extend it without code changes.	2026-04-28 19:21:39 -04:00
anti	6055f9c837	fix(deckies): set MSGID=command on bash PROMPT_COMMAND syslog lines Add --rfc5424 --msgid command to the logger invocation in SSH and telnet decky bashrc. MSGID arrives as "command" instead of NIL, which is what the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic shipped in `d4591b3` stays as a safety net for any future emitter that forgets the flags or for inflight pre-rebuild containers.	2026-04-28 19:12:11 -04:00
anti	d4591b38dc	fix(profiler): aggregate bash PROMPT_COMMAND lines into attacker profile SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"` which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter silently dropped them — the IP profile existed but no command transcripts or artifacts. Confirmed in the wild: 44/48 events from one attacker were event_type="-". Rewrite event_type to "command" in both parsers when MSGID=NIL and the msg starts with "CMD ". Correlation parser also extracts the cmd= payload into fields["command"] so the profiler can build the transcript; collector parser leaves fields={} to avoid duplicate pills in the dashboard.	2026-04-28 19:09:41 -04:00
anti	862e4dbb31	merge: testing → main (reconcile 2-week divergence)	2026-04-28 18:36:00 -04:00
anti	15b2e7ba5c	refactor(db): split credentials.py into a credentials/ subpackage Splits the 459-line credentials.py into two submixins plus a composing CredentialsMixin in credentials/__init__.py: _core.py (~190) Credential capture: upsert, list, filters, per-attacker / per-secret reads, attacker_uuid backfill reuse.py (~270) CredentialReuse correlation: upsert, candidate mining, list/get + the _enrich_with_secret helper that lifts the printable/b64 from underlying rows _merge_unique stays with reuse.py (its only caller). _enrich_with_secret stays with reuse.py — it's an internal helper of list_credential_reuses / get_credential_reuse_by_id, never called from the capture path.	2026-04-28 16:05:57 -04:00
anti	3d00de8fd3	refactor(db): split attackers.py into an attackers/ subpackage Splits the 494-line attackers.py into five submixin files plus a composing AttackersMixin in attackers/__init__.py: _core.py (~95) Attacker CRUD + _deserialize_attacker behavior.py (~110) AttackerBehavior + _deserialize_behavior sessions.py (~50) SessionProfile read/write smtp.py (~70) SmtpTarget per-attacker + cross-attacker views activity.py (~190) log-derived activity (commands, leaks, artifacts, stored mail, session log, transcripts) IdentitiesMixin.list_observations_for_identity calls self._deserialize_attacker; MRO resolves it onto AttackersCoreMixin through the composed SQLModelRepository class.	2026-04-28 15:46:28 -04:00
anti	5e7d68fde3	refactor(db): split topology.py into a topology/ subpackage Splits the 694-line topology.py into five submixin files plus a composing TopologyMixin in topology/__init__.py: _core.py (~225) topologies CRUD + _assert_pending / _check_and_bump_version concurrency guards lans.py (~115) LAN CRUD deckies.py (~130) topology decky CRUD + list_running_topology_deckies edges.py (~80) edge CRUD + status-event log mutations.py (~165) live reconciler queue (atomic claim + state writes) Sibling submixins call self._assert_pending and self._check_and_bump_version; MRO resolves them onto TopologyCoreMixin through the composed SQLModelRepository class.	2026-04-28 15:16:42 -04:00
anti	20e89eb0a6	refactor(db): extract TopologyMixin Moves the 31 MazeNET topology methods (topologies CRUD, LANs, deckies, edges, status events, mutation queue) into sqlmodel_repo/topology.py. Includes _assert_pending and _check_and_bump_version concurrency guards. This is the last domain extraction; sqlmodel_repo/__init__.py is now ~165 lines: lifecycle (initialize/reinitialize/migrations), the admin self-heal seed, get_state/set_state, and the mixin composition.	2026-04-28 15:11:14 -04:00
anti	7483d01311	refactor(db): extract IdentitiesMixin and CampaignsMixin Splits the AttackerIdentity and Campaign clustering reads/writes into sqlmodel_repo/identities.py and sqlmodel_repo/campaigns.py. Both call _deserialize_attacker (identities only) which resolves through AttackersMixin via MRO.	2026-04-28 15:07:39 -04:00
anti	912171d053	refactor(db): extract AttackersMixin Moves the 19 attacker-domain methods (core CRUD, behavior, sessions, smtp targets, log-derived activity views) plus the _deserialize_attacker and _deserialize_behavior helpers into sqlmodel_repo/attackers.py.	2026-04-28 15:04:51 -04:00
anti	7ba8bafcaa	refactor(db): extract CredentialsMixin Moves the 12 credential and credential-reuse methods (incl. the _merge_unique and _enrich_with_secret helpers) into sqlmodel_repo/credentials.py.	2026-04-28 15:00:04 -04:00
anti	5b1af331b9	refactor(db): extract CanaryMixin Moves the 13 canary blob/token/trigger methods into sqlmodel_repo/canary.py.	2026-04-28 14:55:52 -04:00
anti	03b3c8855c	refactor(db): extract OrchestratorMixin Moves the 9 orchestrator event/email log + prune methods into sqlmodel_repo/orchestrator.py.	2026-04-28 14:54:20 -04:00
anti	555cd13f09	refactor(db): extract RealismMixin Moves the 8 synthetic-file + realism-config methods into sqlmodel_repo/realism.py.	2026-04-28 14:52:59 -04:00
anti	9b845269c9	refactor(db): extract LogsMixin Moves the 8 log methods (incl. get_stats_summary aggregator) into sqlmodel_repo/logs.py. get_log_histogram remains an abstract dialect override point; sqlite/mysql subclasses still override it via MRO.	2026-04-28 14:51:35 -04:00
anti	a0aeba5abc	refactor(db): extract FleetMixin and promote JSON helpers Moves the 6 fleet-decky methods (incl. cross-source list_running_deckies aggregator) into sqlmodel_repo/fleet.py. _serialize_json_fields and _deserialize_json_fields move to _helpers.py since they're shared across fleet, topology, and canary.	2026-04-28 14:50:01 -04:00
anti	d989cd0461	refactor(db): extract WebhooksMixin Moves the 9 webhook-subscription methods (CRUD + delivery bookkeeping) into sqlmodel_repo/webhooks.py.	2026-04-28 14:47:42 -04:00
anti	167f140b0e	refactor(db): extract BountiesMixin Moves the 5 bounty methods plus the cross-table purge_logs_and_bounties helper into sqlmodel_repo/bounties.py.	2026-04-28 14:46:39 -04:00
anti	c6804d79b6	refactor(db): extract DeckiesMixin Moves the 4 decky-shard CRUD methods into sqlmodel_repo/deckies.py.	2026-04-28 14:45:15 -04:00
anti	eebf9e4c97	refactor(db): extract AuthMixin Moves the 7 user CRUD methods into sqlmodel_repo/auth.py. _ensure_admin_user stays in __init__.py so DECNET_ADMIN_PASSWORD remains addressable at the module path tests already monkeypatch.	2026-04-28 14:43:49 -04:00
anti	99adbebe75	refactor(db): extract SwarmMixin Moves the 7 swarm-host CRUD methods into sqlmodel_repo/swarm.py.	2026-04-28 14:42:58 -04:00
anti	85c914e754	refactor(db): extract AttackerIntelMixin Moves upsert_attacker_intel, get_attacker_intel_by_uuid, and get_unenriched_attackers into sqlmodel_repo/attacker_intel.py. Composed onto SQLModelRepository via mixin inheritance.	2026-04-28 14:40:36 -04:00
anti	e16f47ad24	refactor(db): extract _safe_session/_detach_close to _helpers.py Module-level session helpers move into sqlmodel_repo/_helpers.py. __init__.py re-exports them so external import paths (decnet.web.db.sqlmodel_repo._safe_session) keep resolving.	2026-04-28 14:38:26 -04:00
anti	4167345d51	refactor(db): convert sqlmodel_repo.py to a package Pure rename — the old monolithic 3505-line file becomes decnet/web/db/sqlmodel_repo/__init__.py. No code changes. Subsequent commits will extract per-domain mixins out of __init__.py to mirror the topical layout used by decnet/web/db/models/.	2026-04-28 14:37:18 -04:00
anti	6d8c90777d	chore: remove vulture-flagged dead code, add whitelist - plain.py: drop `or True` short-circuit + unreachable return; drop now-unused _HASH_HINTS - ingester.py: drop unused `current_position` param from _flush_batch - vulture_whitelist.py: document remaining false positives (FastAPI Depends side-effects, IMAP uid_mode where UID==seq)	2026-04-28 14:30:12 -04:00
anti	72cc928ebf	feat(prober-cert): roll up fingerprints onto AttackerIdentity Brings the federation-gossip columns on AttackerIdentity to life — ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting the union of every member observation's fingerprints JSON onto the identity at clusterer create / link / merge time. - decnet/profiler/identity_rollup.py: pure extract_fp_summaries() reads the production bounty shape (payload.fingerprint_type + payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON list[str] per family, or None when a family has no signal so the column stays NULL instead of '[]'. - BaseRepository.update_identity_fingerprints + SQLModel impl: one idempotent write that overwrites the three summary columns and bumps updated_at. - ConnectedComponentsClusterer: after every per-component reconciliation (fresh-create OR existing-merge+link), recomputes and writes the rollup for the target identity. Wrapped in a best-effort helper so a write failure logs but never breaks the tick. - Tests: extract_fp_summaries unit (dedup, sort determinism, unknown types ignored, malformed JSON, nested-stringified payloads, non-string values); end-to-end clusterer ticks populate the columns on create + on later observation links; no-fingerprint clusters keep the columns NULL.	2026-04-28 11:28:54 -04:00
anti	5f8149daee	feat(prober-cert): capture leaf TLS cert after successful JARM JARM probes are crafted ClientHellos with weird ciphers — they never complete a real handshake, so the peer cert isn't reachable from those sockets. After a non-empty JARM hash proves the port speaks TLS, do a separate ssl.wrap_socket() against the same (ip, port) to fetch and parse the leaf cert. - decnet/prober/tlscert.py: fetch + parse via cryptography lib; swallows all connect/handshake/parse failures (returns None). - decnet/prober/worker.py::_capture_tls_cert: emits a tls_certificate event with subject_cn / issuer / SANs / validity / SHA-256 + publishes on the bus. Wired from _jarm_phase only when JARM succeeds, so non-TLS ports never trigger a second connect. - Tests cover happy path, cert-fetch failure, defense-in-depth crash, empty-JARM skip, publish_fn, and parser edge cases (garbage DER, empty bytes, missing SAN extension, non-self-signed).	2026-04-28 11:14:44 -04:00
anti	4749c972e5	feat(prober-cert): schema for active TLS cert capture Adds storage for TLS certificate details collected from attacker-run servers by the active prober (sibling to the existing JARM probe). - AttackerIdentity.tls_cert_sha256 / Campaign.tls_cert_sha256: JSON list[str] columns mirroring ja3_hashes / hassh_hashes for federation gossip. - ingester clause 9b: emits a 'tls_certificate' fingerprint bounty when a prober event carries subject_cn (disjoint from the existing sniffer-gated clause). - Prober-side capture (ssl.wrap_socket follow-up after JARM) and profiler rollup land in sibling commits.	2026-04-28 11:09:25 -04:00
anti	ccc8619387	fix(test-schemathesis): disable rate limiter in fuzz subprocess Schemathesis fires up to 3000 examples per endpoint. POST /auth/login caps at 10/5min per IP, so the second example onward returns 429 and the positive_data_acceptance check flags it as RejectedPositiveData (its allowed-status list is hardcoded in schemathesis to 2xx/401/403/404/409/5xx, so OpenAPI tweaks can't fix it). DECNET_LIMITER_ENABLED=false exists for exactly this case (see limiter.py docstring on stress/load testing). Reverts the custom_openapi shim from `5d88346` / `9b1168c` — the endpoint already declares 429 in its responses= map (api_login.py:38), and the shim turned out to address a problem that wasn't there. Drop the companion test along with it.	2026-04-28 09:51:49 -04:00
anti	9b1168ce0b	fix(api): scope 429 OpenAPI injection to rate-limited routes Previous commit advertised 429 on every operation. Only routes decorated with @limiter.limit can actually return slowapi's 429 — currently just POST /api/v1/auth/login. Documenting it elsewhere is dishonest and would mislead clients into expecting a response the server cannot produce. Walk slowapi's _route_limits / _dynamic_route_limits registries to identify decorated endpoints, match them to FastAPI routes by {module}.{name}, and only inject 429 on those. Existing per-route 429 declarations (e.g. SSE connection-cap on events streams via sse_limits) are untouched.	2026-04-28 01:00:34 -04:00
anti	5d883466a2	fix(api): advertise 429 on every operation in OpenAPI SlowAPI middleware can short-circuit any request with 429 once a per-route or per-IP rate limit fires (e.g. POST /api/v1/auth/login is capped at 10/5min). The OpenAPI spec did not declare 429 on any operation, so schemathesis flagged legitimate rate-limit responses as RejectedPositiveData / status-code-nonconformance failures. Override app.openapi to inject a generic 429 response object on every HTTP operation in the generated schema. Add a contract test that fails if any operation drops the 429 advertisement.	2026-04-28 00:58:37 -04:00
anti	8344b539c8	fix(ssh-template): drop sshd/pam_unix native chatter at rsyslog OpenSSH's native syslog ("Failed password", "Connection from", "Connection closed by …") and the pam_unix lines emitted from sshd's PAM stack add no signal beyond what auth-helper already captures as structured login_attempt events. They cluttered the dashboard and arrived without an SD wrapper, forcing prose-IP heuristics in the collector. Add a `:programname, isequal, "sshd" stop` rule above the forwarding actions in /etc/rsyslog.d/50-journal-forward.conf. pam_unix lines from sshd inherit programname=sshd so the same rule covers both. sudo / login / su pam_unix lines keep flowing (different programname), so post-login privilege escalation telemetry is preserved.	2026-04-27 23:26:53 -04:00
anti	9350ce195a	fix(collector,correlation): extract attacker IP from sshd/pam free-form prose Native sshd and pam_unix lines route through rsyslog without the relay@55555 SD wrapper and without key=value pairs, so attacker_ip fell through to "Unknown". Add a prose-IP fallback to both parsers: anchored patterns (from/rhost/client/src) win first so we never pick the local listener in "Connection from X port Y on Z port 22", with a bare-IPv4 scan as the last resort.	2026-04-27 23:16:42 -04:00
anti	3c571cce5a	fix(correlation): prober events no longer count as attacker traversal The prober writes events with hostname=decnet-prober and target_ip= <the attacker being fingerprinted>. The parser pulls target_ip into attacker_ip (it's one of _IP_FIELDS), which is correct for indexing fingerprints under the attacker — but it had a side effect: every fingerprinted attacker had two distinct deckies on file (the real decoy they touched + decnet-prober) and the correlation engine's traversals() classified that as lateral movement. Live dashboard showed bogus "dmz-gateway -> decnet-prober" paths and TRAVERSAL badges on attackers who'd done nothing but knock on the front door. The prober is internal infrastructure, not a hop. Filter the "decnet-" namespace out of distinct-decky counts and hop paths in the engine. Fingerprints stay attached to the attacker profile via the existing per-IP event index — just no longer as traversal.	2026-04-27 23:02:23 -04:00
anti	e03a6d10a0	fix(collector): retry on event-stream errors and add periodic reconciler Hit live on first VPS deploy: a window between the initial client.containers.list() snapshot and the client.events() start-event stream let topology service containers slip through, requiring an operator restart for them to be picked up. Two fixes: * `_watch_events` now wraps the events() call in a retry loop with exponential backoff (1s -> 30s cap). A docker.errors.APIError, daemon reload, or SDK stream-decode hiccup used to make the executor task return cleanly, leaving the collector "running" with no event subscription. Future container starts were silently dropped until the unit was restarted. * New `_reconcile_loop` async task ticks every DECNET_COLLECTOR_RECONCILE_S (default 30s), re-scans client.containers.list(), and calls _spawn for any service container not already in `active`. Belt to the event watcher's suspenders: even if a start event is dropped during a reconnect window, the reconciler picks it up within one cycle. Also prunes finished futures from `active` so the dict's bounded by current container count rather than agent lifetime churn.	2026-04-27 22:56:13 -04:00
anti	c5db1d7ba2	fix(config-ini): strip inline # and ; comments from values The module docstring teaches inline comments — `mode = master # or "agent"` is the canonical example for the [decnet] section. Python's configparser ignores those by default unless inline_comment_prefixes is set explicitly, so the comment became part of the value and downstream validators rejected it ("mode must be 'agent' or 'master', got 'master # or \"agent\"'"). Hit live on first VPS deploy: every CLI invocation crashed at import time with a stack trace that didn't make it obvious the docstring's example was the trigger. Now the parser does what the docs promise.	2026-04-27 22:55:58 -04:00
anti	0b1a17b4eb	fix(agent): pass --always-recreate-deps so service netns shares stay fresh Decky service containers join their base via `network_mode: container:<base>` and Docker binds that share at service start time. If `docker compose up` recreates a base (e.g. ports: changes after a forwards_l3 toggle) but decides services are unchanged, services keep a stale FD into the destroyed namespace and end up with only `lo` — so external traffic hits a closed port on the live base and gets RST. Hit live on the first VPS deploy: external SSH to the dmz-gateway was refused while sshd was listening, because base and service netns inodes had drifted apart. `--always-recreate-deps` makes compose rebuild every dependent whenever its base is recreated, removing the race entirely.	2026-04-27 22:55:48 -04:00
anti	0a525ebd37	fix(web): proxy follows DECNET_API_HOST instead of hardcoding 127.0.0.1 The dashboard's /api/* proxy hardcoded 127.0.0.1 as the target host. That works when the API binds to a wildcard or to loopback, but breaks the moment an operator binds the API to a specific address — e.g. a Tailscale IP for tailnet-only deploys: the API stops listening on loopback entirely and the proxy gets ECONNREFUSED on every request. The web command now reads DECNET_API_HOST and falls back to loopback only when the API is on a wildcard (0.0.0.0 / :: / unset). A new --api-host flag overrides at the CLI level.	2026-04-27 22:55:25 -04:00
anti	673bc5b819	ops(init): ship logrotate config so /var/log/decnet can't fill the disk Without rotation, the syslog listener and per-host collector grow /var/log/decnet/ without bound — a noisy attacker (or an active probe storm) fills the disk in hours on a small VPS. New deploy/logrotate.d/decnet caps at 7 daily rotations or 100 MiB, whichever comes first, and uses copytruncate because the ingester and forwarder hold the files open via Python and won't reopen on a rename rotation. Wire install / remove into `decnet init` and `decnet init --deinit` alongside the existing tmpfiles.d / polkit handling.	2026-04-27 21:26:13 -04:00

... 4 5 6 7 8 ...

796 Commits