DECNET

Author	SHA1	Message	Date
anti	9350ce195a	fix(collector,correlation): extract attacker IP from sshd/pam free-form prose Native sshd and pam_unix lines route through rsyslog without the relay@55555 SD wrapper and without key=value pairs, so attacker_ip fell through to "Unknown". Add a prose-IP fallback to both parsers: anchored patterns (from/rhost/client/src) win first so we never pick the local listener in "Connection from X port Y on Z port 22", with a bare-IPv4 scan as the last resort.	2026-04-27 23:16:42 -04:00
anti	e03a6d10a0	fix(collector): retry on event-stream errors and add periodic reconciler Hit live on first VPS deploy: a window between the initial client.containers.list() snapshot and the client.events() start-event stream let topology service containers slip through, requiring an operator restart for them to be picked up. Two fixes: * `_watch_events` now wraps the events() call in a retry loop with exponential backoff (1s -> 30s cap). A docker.errors.APIError, daemon reload, or SDK stream-decode hiccup used to make the executor task return cleanly, leaving the collector "running" with no event subscription. Future container starts were silently dropped until the unit was restarted. * New `_reconcile_loop` async task ticks every DECNET_COLLECTOR_RECONCILE_S (default 30s), re-scans client.containers.list(), and calls _spawn for any service container not already in `active`. Belt to the event watcher's suspenders: even if a start event is dropped during a reconnect window, the reconciler picks it up within one cycle. Also prunes finished futures from `active` so the dict's bounded by current container count rather than agent lifetime churn.	2026-04-27 22:56:13 -04:00
anti	817ce32e6d	fix(collector): label-based fleet container discovery The events watcher's start-event filter previously called _load_service_container_names(), which reads decnet-state.json on every event. decnet deploy writes that state file out-of-band with docker compose up, so a container's start event could arrive before the state was committed — the watcher then dropped the event silently and never tailed the container's stdout. The visible symptom was an empty Credentials view (and Logs/Bounty) after a fresh deploy until the collector was manually restarted. Fix: stamp decnet.fleet.{service,decky,service_name} labels on every fleet service container at compose-time, and let the collector recognize either the fleet or topology label without touching the state file. The state-file name match remains as a fallback for legacy containers that predate the new labels.	2026-04-25 08:11:21 -04:00
anti	0fbb07c2ec	feat(workers): bus-backed Workers panel (registry, control, installed flag) Ships the backend half of Config → Workers: * Worker registry aggregates `system..health` + `system.bus.health` heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop out of a 90s window (3× the 30s heartbeat interval). `GET /api/v1/workers` returns the snapshot plus `bus_connected` (so the UI can explain "all UNKNOWN" when the bus socket is down) and a per-row `installed` flag populated from `systemctl list-unit-files decnet-.service` (cached 30s). `POST /api/v1/workers/{name}/stop` publishes a stop intent on `system.<name>.control`; workers listen via the shared control listener in `bus/publish.py`. * Heartbeat + control listener wired into collector / profiler / sniffer / prober / mutator worker loops. API self-heartbeats too so the panel always has one ground-truth row. * Topic helper `system_control(name)` + tests covering builder validation, control listener shutdown path, and the API surface (auth gating, bus-connected field, unknown-name 404). Adds `StartFailure` / `StartAllResponse` models in anticipation of the upcoming start endpoints (DEBT-034).	2026-04-22 14:10:39 -04:00
anti	a448dbe283	feat(collector): publish system.log per ingested event (DEBT-031 worker 5) log_collector_worker connects the bus at startup, builds a thread-safe system.log publisher, and hands it to each container-stream thread through _stream_container's new publish_fn parameter. Publishing fires right after the JSON record is written — same rate-limiter path, no extra parsing, compact payload (decky/service/event_type/attacker_ip/ timestamp) so subscribers can redraw without re-reading the DB. Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false the factory returns a no-op publisher and the stream thread calls it unconditionally. Hook failures are logged and never abort the thread.	2026-04-21 16:57:21 -04:00
anti	0cdcfe2653	feat(agent/collector): topology-label discovery and master-authoritative supersede Legacy fleet deckies live in decnet-state.json; MazeNET topology containers don't. Tag them at compose-time with decnet.topology.service=true and let the collector match on that label. Spin up the agent's log collector on the first successful /topology/apply (not in the lifespan — that would break the no-docker-on-boot invariant) and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE, separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running both roles can't forward its own ingest back to itself. When master pushes a topology that differs from whatever is pinned locally, teardown the predecessor and accept the new one. Refusing with 409 left the agent stranded after partial deploys. record_error now persists the hydrated blob so a later teardown can still walk the LAN list — otherwise a half-failed apply strands containers + bridges with no breadcrumb back to them.	2026-04-21 10:23:10 -04:00
anti	8bdc5b98c9	feat(collector): parse real PROCID and extract IPs from logger kv pairs - Relaxed RFC 5424 regex to accept either NILVALUE or a numeric PROCID; sshd / sudo go through rsyslog with their real PID, while syslog_bridge emitters keep using '-'. - Added a fallback pass that scans the MSG body for IP-shaped key=value tokens. This rescues attacker attribution for plain logger callers like the SSH PROMPT_COMMAND shim, which emits 'CMD … src=IP …' without SD-element params.	2026-04-18 05:37:08 -04:00
anti	8dd4c78b33	refactor: strip DECNET tokens from container-visible surface Rename the container-side logging module decnet_logging → syslog_bridge (canonical at templates/syslog_bridge.py, synced into each template by the deployer). Drop the stale per-template copies; setuptools find was picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay" so no obvious token appears in the rendered container image. Apply the same cloaking pattern to the telnet template that SSH got: syslog pipe moves to /run/systemd/journal/syslog-relay and the relay is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename 99-decnet.conf → 50-journal-forward.conf. SSH capture script: /var/decnet/captured → /var/lib/systemd/coredump (real systemd path), logger tag decnet-capture → systemd-journal. Compose volume updated to match the new in-container quarantine path. SD element ID shifts decnet@55555 → relay@55555; synced across collector, parser, sniffer, prober, formatter, tests, and docs so the host-side pipeline still matches what containers emit.	2026-04-17 22:57:53 -04:00
anti	29578d9d99	fix: resolve all ruff and bandit lint/security issues - Remove unused Optional import (F401) in telemetry.py - Move imports above module-level code (E402) in web/db/models.py - Default API/web hosts to 127.0.0.1 instead of 0.0.0.0 (B104) - Add usedforsecurity=False to MD5 calls in JA3/HASSH fingerprinting (B324) - Annotate intentional try/except/pass blocks with nosec (B110) - Remove stale nosec comments that no longer suppress anything	2026-04-16 01:04:57 -04:00
anti	04db13afae	feat: cross-stage trace propagation and granular per-event spans Collector now creates a span per event and injects W3C trace context into JSON records. Ingester extracts that context and creates child spans, connecting the full event journey: collector -> ingester -> db.add_log + extract_bounty -> db.add_bounty. Profiler now creates per-IP spans inside update_profiles with rich attributes (event_count, is_traversal, bounty_count, command_count). Traces in Jaeger now show the complete execution map from capture through ingestion and profiling.	2026-04-15 23:52:13 -04:00
anti	65ddb0b359	feat: add OpenTelemetry distributed tracing across all DECNET services Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead). When enabled, traces flow through FastAPI routes, background workers (collector, ingester, profiler, sniffer, prober), engine/mutator operations, and all DB calls via TracedRepository proxy. Includes Jaeger docker-compose for local dev and 18 unit tests.	2026-04-15 23:23:13 -04:00
anti	a1ca5d699b	fix: use dedicated thread pools for collector and sniffer workers The collector spawned one permanent thread per Docker container via asyncio.to_thread(), saturating the default asyncio executor. This starved short-lived to_thread(load_state) calls in get_deckies() and get_stats_summary(), causing the SSE stream and deckies endpoints to hang indefinitely while other DB-only endpoints worked fine. Give the collector and sniffer their own ThreadPoolExecutor so they never compete with the default pool.	2026-04-15 22:57:03 -04:00
anti	11d749f13d	fix: wire prober tcpfp_fingerprint events into sniffer_rollup for OS/hop detection The active prober emits tcpfp_fingerprint events with TTL, window, MSS etc. from the attacker's SYN-ACK. These were invisible to the behavioral profiler for two reasons: 1. target_ip (prober's field name for attacker IP) was not in _IP_FIELDS in collector/worker.py or correlation/parser.py, so the profiler re-parsed raw_lines and got attacker_ip=None, never attributing prober events to the attacker profile. 2. sniffer_rollup only handled tcp_syn_fingerprint (passive sniffer) and ignored tcpfp_fingerprint (active prober). Prober events use different field names: window_size/window_scale/sack_ok vs window/wscale/has_sack. Changes: - Add target_ip to _IP_FIELDS in collector and parser - Add _PROBER_TCPFP_EVENT and _INITIAL_TTL table to behavioral.py - sniffer_rollup now processes tcpfp_fingerprint: maps field names, derives OS from TTL via _os_from_ttl, computes hop_distance = initial_ttl - observed - Expand prober DEFAULT_TCPFP_PORTS to [22,80,443,8080,8443,445,3389] for better SYN-ACK coverage on attacker machines - Add 4 tests covering prober OS detection, hop distance, and field mapping	2026-04-15 17:36:40 -04:00
anti	a4798946c1	fix: add remote_addr to IP field lookup so http/https/k8s events are attributed correctly Templates for http, https, k8s, and docker_api log the client IP as remote_addr (Flask's request.remote_addr) instead of src_ip. The collector and correlation parser only checked src_ip/src/client_ip/remote_ip/ip, so every request event from those services was stored with attacker_ip="Unknown" and never associated with any attacker profile. Adding remote_addr to _IP_FIELDS in both collector/worker.py and correlation/parser.py fixes attribution. The profiler cursor was also reset to 0 so the worker performs a cold rebuild and re-ingests existing events with the corrected field mapping.	2026-04-15 17:23:33 -04:00
anti	935a9a58d2	fix: reopen collector log handles after deletion or log rotation Replaces the single persistent open() with inode-based reopen logic. If decnet.log or decnet.json is deleted or renamed by logrotate, the next write detects the stale inode, closes the old handle, and creates a fresh file — preventing silent data loss to orphaned inodes.	2026-04-15 14:04:54 -04:00
anti	f6cb90ee66	perf: rate-limit connect/disconnect events in collector to spare ingester Connection-lifecycle events (connect, disconnect, accept, close) fire once per TCP connection. During a portscan or credential-stuffing run this firehoses the SQLite ingester with tiny WAL writes and starves all reads until the queue drains. The collector now deduplicates these events by (attacker_ip, decky, service, event_type) over a 1-second window before writing to the .json ingestion stream. The raw .log file is untouched, so rsyslog/SIEM still see every event for forensic fidelity. Tunable via DECNET_COLLECTOR_RL_WINDOW_SEC and DECNET_COLLECTOR_RL_EVENT_TYPES.	2026-04-15 12:04:04 -04:00
anti	df3f04c10e	revert: undo service badge filter, parser normalization, and SSH relay Reverts commits `8c249f6`, `a6c7cfd`, `7ff5703`. The SSH log relay approach requires container redeployment and doesn't retroactively fix existing attacker profiles. Rolling back to reassess the approach.	2026-04-14 02:14:46 -04:00
anti	7ff5703250	feat: SSH log relay emits proper DECNET syslog for sshd events New log_relay.py replaces raw 'cat' on the rsyslog pipe. Intercepts sshd and bash lines and re-emits them as structured RFC 5424 events: login_success, session_opened, disconnect, connection_closed, command. Parsers updated to accept non-nil PROCID (sshd uses PID).	2026-04-14 02:07:35 -04:00
anti	a6c7cfdf66	fix: normalize SSH bash CMD lines to service=ssh, event_type=command The SSH honeypot logs commands via PROMPT_COMMAND logger as: <14>1 ... bash - - - CMD uid=0 pwd=/root cmd=ls These lines had service=bash and event_type=-, so the attacker worker never recognized them as commands. Both the collector and correlation parsers now detect the CMD pattern and normalize to service=ssh, event_type=command, with uid/pwd/command in fields.	2026-04-14 01:54:36 -04:00
anti	035499f255	feat: add component-aware RFC 5424 application logging system - Modify Rfc5424Formatter to read decnet_component from LogRecord and use it as RFC 5424 APP-NAME field (falls back to 'decnet') - Add get_logger(component) factory in decnet/logging/__init__.py with _ComponentFilter that injects decnet_component on each record - Wire all five layers to their component tag: cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester, routers), mutator -> 'mutator', collector -> 'collector' - Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each layer per the defined vocabulary; DEBUG calls are suppressed unless DECNET_DEVELOPER=true - Add tests/test_logging.py covering factory, filter, formatter component-awareness, fallback behaviour, and level gating	2026-04-13 07:39:01 -04:00
anti	c384a3103a	refactor: separate engine, collector, mutator, and fleet into independent subpackages - decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed - decnet/collector/ — Docker log streaming (moved from web/collector.py) - decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code) - decnet/fleet.py — shared decky-building logic extracted from cli.py Cross-contamination eliminated: - web router no longer imports from decnet.cli - mutator no longer imports from decnet.cli - cli no longer imports from decnet.web - _kill_api() moved to cli (process management, not engine concern) - _compose_with_retry duplicate removed from mutator	2026-04-12 00:26:22 -04:00

21 Commits