DECNET

Author	SHA1	Message	Date
anti	efc98285aa	fix(webhook/worker): self-heal when bus starts late or restarts Before: if the bus was unreachable at worker start, we logged "running in idle mode" once and parked on shutdown forever. systemd doesn't guarantee bus is fully up before the webhook worker starts, so a race on boot left the worker permanently dead until restart. Now: wrap the whole bus-use in an outer reconnect loop. while not shutdown: try: connect() except: sleep(RECONNECT_SECS) ; continue try: run_with_bus(...) # heartbeat + dispatch except: log+close ; reconnect on next iter Clean consequence: if the bus dies mid-operation the dispatch loop's subscriptions raise inside the consumer tasks, `_run_with_bus` exits, the outer loop closes the stale connection and reconnects. No partial state leaks across epochs — fresh bus, fresh subs, fresh heartbeat. Interval is 60s by default, overridable via DECNET_WEBHOOK_BUS_RECONNECT_SECS. Shutdown wakes the wait so systemctl stop doesn't hang for a minute. Test added: flaky get_bus that fails once, then returns a live FakeBus — asserts retry + successful delivery. get_app_bus() in decnet/bus/app.py already has a 2s backoff retry so the FastAPI hot path self-heals; this commit brings the standalone webhook worker in line with the same posture.	2026-04-24 16:39:38 -04:00
anti	f0ee6ff97e	feat(workers): enroll webhook worker in the Workers panel registry Add "webhook" to KNOWN_WORKERS + the start-all preferred order so the Config → Workers panel picks up the row automatically: heartbeat subscription, start/stop controls via the existing systemd helper (decnet-webhook.service.j2 already lands via decnet init's unit glob), and the status-dot lifecycle all come for free. Placed between mutator and the swarm-only agent/forwarder/updater trio — matches the intended startup sequence (bus → api → data-plane workers → egress → swarm management). No frontend change needed; Config.tsx reads the worker list dynamically from GET /api/v1/workers.	2026-04-24 16:34:14 -04:00
anti	ba155b70e1	fix(cli/db-reset): drive table list from SQLModel.metadata, not hardcoded The hardcoded _DB_RESET_TABLES tuple had drifted — session_profile, smtp_targets, and webhook_subscriptions were all missing, so `decnet db-reset --i-know-what-im-doing drop-tables` silently left them behind. Running it on a post-webhook install then letting SQLModel.metadata.create_all() re-create tables produced a partial schema: old rows survived, new columns didn't land, and endpoints 500'd on the missing columns (e.g. auto_disabled_at after the circuit breaker merge). Replace the hardcoded list with `SQLModel.metadata.sorted_tables`, reversed for DROP safety (children first). Any future model addition is auto-enrolled — no manual step, no more drift. No behavior change on reset semantics; the SET FOREIGN_KEY_CHECKS=0 fence still covers any edge case the sort order misses.	2026-04-24 16:31:10 -04:00
anti	2bcef50ac5	feat(webhooks): circuit breaker auto-disables misbehaving subscriptions After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed deliveries, the worker calls trip_webhook_circuit(uuid, ts) which flips enabled=False and stamps auto_disabled_at. The worker sets its reload flag so the next dispatch epoch stops consuming events for the tripped sub entirely — one dead receiver can't poison the shared egress pool anymore. Operator clears the trip via PATCH — setting enabled=True when the sub was previously disabled clears auto_disabled_at, zeros consecutive_failures, and clears last_error. Admin-pause → re-enable hits the same path harmlessly. Three observable states now distinguishable in the UI: - Active enabled=True, auto_disabled_at=NULL - Admin-paused enabled=False, auto_disabled_at=NULL - Tripped enabled=False, auto_disabled_at=<ts> UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and a "N TRIPPED" count in the page header. Hover tooltip tells the operator how to reset ("Re-enable via Edit"). record_webhook_failure now returns the new consecutive_failures count so the worker can compare against the threshold without a second roundtrip. trip_webhook_circuit is idempotent — re-tripping just re-stamps auto_disabled_at. Closes THREAT_MODEL WH-02 and DEBT-037 §1.	2026-04-24 16:24:33 -04:00
anti	ee682eef65	feat(web/webhooks): surface manual FIRE button per row The per-row test-delivery action already existed as an icon-only ⚡ zap in the ACTIONS column — backed by POST /webhooks/{uuid}/test, which fires a synthetic test.ping event through the normal HMAC- signed delivery path with retries disabled. Too easy to miss. Replace the icon-only button with a labeled [⚡ FIRE] violet-accented button so it reads as an emphasized dev-tool action right next to edit/delete. Tooltip now spells out the backend endpoint and "fire a synthetic test event" intent. No backend change. Widens the actions column to 180px to accommodate the label.	2026-04-24 16:15:47 -04:00
anti	731063b96e	chore(scripts): mock webhook receiver for local DECNET testing Python stdlib ThreadingHTTPServer that accepts any POST path, optionally verifies HMAC against --secret / $DECNET_MOCK_SECRET, and pretty-prints each delivery with topic / event-id / signature status. Pass --fail 503 to exercise the worker's retry/backoff path. Point a webhook at http://localhost:8765/ and you'll see every delivery land with color-coded HMAC OK / MISMATCH / UNVERIFIED badges. No deps.	2026-04-24 16:13:59 -04:00
anti	4d10eba7a7	fix(web/webhooks): match LiveLogs page-header convention The webhooks page used a bespoke .webhooks-header wrapper that didn't line up with the rest of the dashboard (Fleet / Logs / Swarm all use the .<page>-root + .page-header + .page-title-group + .actions pattern). Swapped to that convention: - .webhooks-root wrapper, matching .logs-root / .fleet-root spacing. - H1 "WEBHOOKS" in .page-title-group; subtitle shows `N CONFIGURED · M ENABLED [· K FAILING] [· L INSECURE]` in .page-sub, same voice as the LOGS stream summary. - Actions (CREATE WEBHOOK, DELETE SELECTED) sit in .actions. - Table lives in a proper .logs-section shell with a .section-header carrying the Webhook icon + "SUBSCRIPTIONS" title. - All scoped button overrides (violet/alert/warn/ghost) copied from the LiveLogs scope so theme switches behave identically. Also improve error messaging: extractErrorDetail now maps 401 to "Session expired" and 403 to "Insufficient permissions (admin only)" instead of falling through to the generic "Failed to load webhooks". Helps users who hit the page as viewer or with a stale token see why it failed.	2026-04-24 16:11:20 -04:00
anti	59c405d9e5	feat(web): Webhooks page + ALERTS nav group New /webhooks admin page with table-based subscription management: - CREATE WEBHOOK (inline form row — no modal) with simple-event checkboxes (AttackerDetail / DeckyStatus / SystemStatus) that expand to bus-topic patterns server-side, and an advanced-mode textarea for raw NATS-style patterns. - Bulk-select + DELETE SELECTED with two-click arm pattern. - Per-row test-ping (zap), pencil edit, and delete actions. - Last-fired timestamp column. - Yellow banner surfacing insecure_url warnings (WH-03): http:// is allowed but flagged so operators see it on every page load. - Post-create secret modal — the secret is shown exactly once with a COPY button and a clear "won't see this again" notice. Sidebar nav regrouped: /live-logs and /webhooks now live under a new ALERTS NavGroup (Bell icon). The alertCount badge rides the Live Logs sub-item. Command palette gains a "Webhooks" GO TO entry with the `G W` chord. Side-fix: useFocusSearch.ts was failing the build under verbatimModuleSyntax (pre-existing, unrelated). Split the React import to satisfy tsc; no behavioural change.	2026-04-24 16:03:53 -04:00
anti	c2ff8d1a4f	docs(debt): DEBT-037 — webhook delivery guarantees beyond MVP The webhook MVP shipped with deliberate deferrals; this entry names them so future PRs know exactly what's left to close: circuit breaker, dead-letter table, delivery audit log, batch/coalescing, per-subscription rate limiting, payload templates per destination, and secret encryption at rest. Non-negotiable even at MVP scope (HMAC signing, bus-off degraded mode, jittered retry backoff) is called out explicitly to prevent future contributors from weakening it under the banner of "simplification."	2026-04-24 16:03:33 -04:00
anti	638236113d	feat(webhooks): non-blocking http:// warning + WH-03 accepted risk WebhookResponse now carries a `warnings: list[str]` field. When the subscription's URL starts with http://, an `insecure_url` advisory is surfaced on every GET/CREATE without blocking the request. HMAC still detects tampering regardless of transport — only read-confidentiality is lost over plaintext — and test/dev environments without TLS stay usable. Matches the operator-trust posture already established by DA-06 (admin-on-admin protection is out of scope). The alternative — hard rejection at admin time — was considered and declined; warning-plus- visibility is the right shape. THREAT_MODEL WH-03 accepted risk registered; revisit triggers are multi-admin delegation, a regulated customer, or an operator ticket asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.	2026-04-24 15:53:30 -04:00
anti	f84bf82f6c	docs(webhook): roadmap tick + threat-model component - DEVELOPMENT.md: tick the "Real-time alerting" roadmap item with a note that Slack/Telegram-specific senders remain per-destination follow-ups (they accept generic webhook payloads already). - THREAT_MODEL.md: new Component 2 — DECNET↔External webhook destination. DFD, full STRIDE table, WH-01 (secret at rest) and WH-02 (half-dead-receiver retry waste) registered as accepted risks pointing at DEBT-037 for post-MVP hardening. Checklist lists two open items: OpenAPI schema omits `secret`, and http:// URL rejection at admin time.	2026-04-24 15:48:14 -04:00
anti	e6127a81a1	feat(webhook): worker + CLI + systemd unit Introduces the `decnet webhook` long-running worker that consumes the internal bus and POSTs matching events to configured subscriptions. Design: one task per (subscription, pattern) pair. Each task opens its own bus subscription, iterates events, and dispatches via the shared deliver() client. No intermediate queue, no in-memory filter matching — the bus's own pattern matcher is the filter. Reloads on `system.webhook.subscriptions_changed` signals from the CRUD router, with a 60s fallback timer in case a signal is lost. Shutdown propagates via CancelledError on the outer task; all inner subscription tasks are cancelled and awaited in a finally block. Bus unavailable → worker stays up in idle mode per the DEBT-031 pattern, logging one warning. Registered as a master-only CLI command (agents don't configure webhooks — the subscription store lives on master). systemd unit mirrors the profiler template; added to decnet.target Wants= list so `systemctl start decnet.target` brings it up alongside everything else. `decnet init` auto-picks up the new .service.j2 via its existing `glob("decnet-*.service.j2")` sweep.	2026-04-24 15:46:11 -04:00
anti	b70845a85d	feat(webhooks): subscription CRUD + HMAC-signed delivery client Introduces the webhook egress foundation — a new WebhookSubscription table, admin-gated CRUD under /api/v1/webhooks, and the shared delivery client that both the test-ping route and the upcoming worker will use. No worker yet; this commit is API + model + client only. Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands to bus-topic patterns at the router layer; storage is always the raw pattern list. Advanced mode lets admins supply raw NATS-style patterns directly. Filter-at-subscribe: the worker (next commit) will subscribe to the union of patterns across enabled subscriptions. Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature), retry on 429/5xx/network errors with jittered backoff, no-retry on 4xx. Secrets never leave the server on GET/LIST — only the create response carries the secret for copy-out. CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after every mutation so the (future) worker can hot-reload. Opens DEBT-037 for the deferred items (circuit breaker, dead-letter, batch delivery, payload templates, secret-at-rest).	2026-04-24 15:30:05 -04:00
anti	162f7c1194	feat(api/sse): per-user connection cap + viewer-safe invariant New decnet/web/sse_limits.py provides sse_connection_slot, an async context manager that counts live SSE connections per user UUID and raises 429 when a per-user cap is exceeded (default 5, override via DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their first async with, so the cap check fires before any stream data is yielded. The cap must sit inside the generator — StreamingResponse returns before the generator body runs, so a handler-level wrapper would release the slot immediately. Put prefetch + slot + loop all under the one async with. Also documents F6/I (role leakage) as mitigated-by-construction via handler docstrings: every event type on both streams wraps data already reachable via viewer-gated REST, so no per-event filter is needed until a new event family is introduced. The invariant is written into the handler docstrings so a future PR can't silently add admin-only events. Resolves THREAT_MODEL F6/I and F6/D.	2026-04-24 15:01:20 -04:00
anti	df84981954	feat(api): pin response_model on dict-returning mutation routes Every mutation route that returned an untyped dict now declares response_model at the decorator. MessageResponse covers the eight {"message": ...} envelopes (change-password, mutate-decky, mutate- interval, update-deployment-limit, update-global-mutation-interval, delete-user, update-user-role, reset-user-password). Purpose-built models cover the richer shapes (DeployResponse for /deckies/deploy, PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans, UserResponse for /config/users). 204-No-Content and Response/ ORJSONResponse routes stay as-is. The wire shape for clients is unchanged — the envelopes already only shipped a message field. What changes is that a handler which accidentally returns a richer dict (e.g. a full user row including password_hash) would be silently stripped to the declared fields at serialization time. Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs and /attackers search routes LIKE-scan unbounded columns, but both are admin-gated, limit-capped, and operator rate-limit scope per DA-04. FTS5 stays a performance TODO, not a security blocker.	2026-04-24 14:27:58 -04:00
anti	a935bf2663	feat(api): cap offset on list-topologies and transcript endpoints The other five query endpoints (/logs, /attackers, /attacker-commands, /bounties, /topologies/{id}) already declared le=2147483647 on offset; these two were inconsistently uncapped. Bring them in line to close the F4/D deep-pagination row. Also resolves F4/T (ORM sort injection — already mitigated by the regex pattern on /attackers sort_by, no other route accepts a column name) and F4/D (limit cap — already universal) with code pointers.	2026-04-24 14:14:25 -04:00
anti	e53b580767	test(api): RBAC contract test — viewer JWT on every classified route New test walks app.routes, classifies each APIRoute as admin/viewer/open by identity-matching require_admin / require_viewer closures inside the route's dependency tree, then asserts: - admin routes return 403 to a viewer JWT - viewer routes return neither 401 nor 403 to a viewer JWT SSE routes skipped (separate scope under F6). Role hints deliberately NOT encoded in the OpenAPI spec — classification stays server-side so /openapi.json can't be used to enumerate admin routes. Resolves THREAT_MODEL F2/I + F5/E; paired with the existing test_schemathesis.py::test_auth_enforcement (401-half coverage).	2026-04-24 14:00:12 -04:00
anti	99ccd41bb5	feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options Harden the attacker-controlled artifact download path (F7) with explicit response headers instead of relying on Starlette's defaults (which only emit attachment for non-ASCII filenames and never set nosniff). Also resolves the THREAT_MODEL F7 path-traversal row (containment check was already in _resolve_artifact_path) and the fleet-deploy detail=str(e) audit (all four sites are admin-gated deliberate validator UX or structured worker-response fields).	2026-04-24 13:24:34 -04:00
anti	ec1079e78b	feat(profiler): wire p0f-v2 matcher into sniffer_rollup priority chain The ~30-signature hand-rolled p0f-lite table in decnet/sniffer/p0f.py misses most real-world attackers (yesterday's SLOW SCAN being a textbook case — 9 hours of events, 19 hits, os_guess = NULL). The 375-sig vendored p0f v2 DB was already there; this commit actually calls it. New resolution chain in sniffer_rollup: 1. Enabled OS-fingerprint providers (p0f-v2 default, via DECNET_OSFP_PROVIDERS) tried in declared order. Provider with highest-confidence match across all enabled sources wins. 2. Modal os_guess label from the sniffer's hand-rolled p0f.py. Kept as fallback because v2's DB predates post-2006 kernels. 3. TTL bucket (linux / windows / embedded). Coarse but never wrong. Wiring details: - _match_via_osfp_providers: never raises — factory / provider failures collapse to None and the chain falls through to the old modal-label / TTL path. A corrupt .fp file or misconfigured DECNET_OSFP_PROVIDERS must never wedge a profile rebuild. - tcp_fp_context tracks whether the LATEST tcp_fp snapshot came from a passive SYN ('syn' → p0f.fp) or an active prober probe ('synack' → p0fa.fp). Routes to the right sig list. - initial-TTL normalisation via decnet.sniffer.p0f.initial_ttl. Observation's TTL may be N hops below the OS's initial; v2 signatures match on the canonical bucket. Soft-field semantics on Signature.score(): df and total_len are now skip-checked when the observation is missing them. Sniffer doesn't currently emit either SD field; a literal-constraint sig shouldn't hard-reject a match solely because of upstream incompleteness. Hard fields (window, ttl, options_sig, quirks) still hard-reject on absent/mismatched input — those are the real discriminators. Promote df / total_len back to hard the moment the sniffer starts emitting them. +2 integration tests on TestSnifferRollup, +2 soft-field tests on test_signature. Full regression: 166 tests across tests/prober/osfp + tests/profiler all green.	2026-04-24 11:56:50 -04:00
anti	8a430bf725	feat(prober/osfp): P0fV2Provider + factory dispatch - decnet/prober/osfp/p0f/provider.py: P0fV2Provider loads the four vendored .fp files into per-context signature lists (syn / synack / rst / stray) and matches via highest-specificity score across the relevant list. Also auto-picks up p0f-decnet.fp if present (GPL-3.0 additions land there later, empty for now). - decnet/prober/osfp/factory.py: get_provider / get_all_providers / reset_cache, mirrors decnet/geoip/factory exactly. Env-dispatched via DECNET_OSFP_PROVIDERS (default "p0f-v2"). Reserved names "nmap-osdb" (pending Fyodor's grant) and "decnet-observed" (our future curated DB) raise NotImplementedError — visible on the factory surface so a typo doesn't silently fall through. - decnet/prober/osfp/__init__.py now re-exports the public API so callers use `from decnet.prober.osfp import get_provider` without reaching into submodules (upholds the provider-subpackage rule). 15 new provider+factory tests covering: - All four DB contexts load (262/61/46/6 sigs per inventory). - Known-good Linux 2.6 SYN + Linux 2.2 SYN-ACK match end-to-end. - Unknown observations / contexts return None, not raise. - Factory memoises, env override honoured, unsupported names raise. - Reserved names raise NotImplementedError (not silent None). `sniffer_rollup` wiring lands in the next commit.	2026-04-24 11:50:46 -04:00
anti	41ff6b4b03	feat(prober/osfp): p0f v2 .fp parser + Signature scoring First code layer of the OS-fingerprinting work on top of yesterday's vendored p0f v2 database. Three new modules, all pure (no I/O outside of the parser's file read): - decnet/prober/osfp/base.py — Provider protocol + OsMatch dataclass matching the established Provider convention in decnet/geoip and decnet/bus. Docstring spells out the never-raise invariant: malformed input returns None, so a single bad event can't wedge a whole attacker-profile rebuild. - decnet/prober/osfp/p0f/signature.py — Signature dataclass + three predicate helpers (WindowSpec / IntSpec / OptionToken) encoding the p0f v2 DSL's wildcard / modulo / MSS-multiple / MTU-multiple semantics. Scoring is our extension on top of upstream p0f's first-match-wins policy: each signature carries a precomputed specificity in [0, 1] so the factory can pick the most-specific match when multiple signatures fire against one observation. - decnet/prober/osfp/p0f/format.py — .fp line parser. Every shipped field variant from the DSL spec at the top of p0f.fp is covered (Snn / Tnn / %nnn / * for window; T0 vs T; -/@/* os-genre prefixes; quirks as concatenated single-letter flags; '.' sentinels for no-options / no-quirks). Malformed lines log a warning and skip instead of aborting the whole file — 1 bad row must not cost the other 374. 20 parser tests + 14 scoring tests. Full vendored-DB smoke tests confirm all 375 signatures parse round-trip (262 SYN + 61 SYN-ACK + 46 RST + 6 stray) and every computed specificity lands in [0, 1].	2026-04-24 11:47:54 -04:00
anti	620e1f5b1d	feat(prober): vendor p0f v2 TCP/IP fingerprint database (LGPL-2.1 → GPLv3 via §3) Ships the p0f v2.0.8 signature database for passive + active OS fingerprinting. 375 total signatures across four probe contexts: - p0f.fp (262 sigs) — passive SYN fingerprints - p0fa.fp ( 61 sigs) — SYN-ACK response, for active probes - p0fr.fp ( 46 sigs) — RST response quirks - p0fo.fp ( 6 sigs) — "stray" packet fingerprints Replaces reliance on the 10-signature hand-rolled p0f-lite table in decnet/sniffer/p0f.py for any match job the upstream DB covers. Keeping the hand-rolled table as a fallback for modern kernels the v2 DB pre-dates — v2 froze in 2006 so post-Win10 / post-Linux-3.x kernels won't match against upstream directly. DECNET-authored additions will go in a sibling p0f-decnet.fp under GPLv3 (not yet committed; added as the ingester observes real honeypot traffic). Provenance (full chain in data/README.md): - Source: Debian snapshot of p0f_2.0.8.orig.tar.gz - SHA1 matches Debian-recorded 7b4d5b2f24af4b5a299979134bc7f6d7b1eaf875 - Files byte-identical to upstream tarball (verified by hash) License chain: - Upstream: LGPL-2.1 (doc/COPYING preserved verbatim as data/LICENSE.p0f-upstream, Michal Zalewski's copyright intact). - DECNET uses the LGPL-2.1 §3 explicit permission to convert to any version of the GPL. These files, as consumed in DECNET, are effectively GPL-3.0. Chain documented in data/README.md so an auditor sees the full reasoning. - LGPL-2.1 → GPL-3.0 §3 conversion is a settled compat path; same mechanism the kernel uses for LGPL userland glue and many other projects apply daily. Rejected path — nmap-os-db under NPSL — because NPSL adds restrictions GPLv3 §7 prohibits us from accepting. An email is out to Fyodor requesting an open-source-author exception grant, but we don't block on it: p0f v2 is a genuine accuracy improvement in its own right, and adding nmap-osdb later (if granted) plugs into the same provider interface with zero refactor. Directory layout mirrors the established provider-subpackage pattern (see decnet/geoip/, decnet/bus/) per the feedback_provider_ subpackages memory: base + factory + impl/ subpackages, no flat files. Parser + matcher + factory wiring land in the next commit sequence.	2026-04-24 11:39:33 -04:00
anti	011445b77a	chore(license): add GPL-3.0-or-later LICENSE + pyproject metadata DECNET had no LICENSE file and no license metadata in pyproject.toml despite intent being GPLv3. Legally that meant the code was "all rights reserved" by default, so anyone distributing it (including via GitHub clones, mirrors, or the forthcoming swarm enroll bundles) was technically in violation even though the operator's own intent was copyleft. - Add canonical GPL-3.0 text from gnu.org/licenses/gpl-3.0.txt as LICENSE (verbatim, 674 lines). - Add license = "GPL-3.0-or-later" and license-files = ["LICENSE"] to pyproject.toml [project] (SPDX identifier per PEP 639). - Add the matching OSI classifier plus a few other standard ones (Python 3.11, Linux, Security, Network Monitoring, Beta) that pyproject was silently missing. Prereq for the forthcoming p0f-db vendoring: establishing DECNET's own license explicitly closes the first question an auditor would ask about any third-party data we embed.	2026-04-24 11:35:59 -04:00
anti	1e7703d64d	refactor(db): name the keystroke-dynamics thresholds + add max_pause_gap Follow-ups on `9232031` per review: - Module-level constants KD_PAUSE_BURST_MAX_S (0.2s), KD_PAUSE_THINK_MAX_S (1.5s), KD_START_OF_ACTION_IDLE_S (2.0s). Docstrings reference them by name; future calibration against real session data only has to touch one place. Threshold for "started a new action" raised from 1s → 2s — 1s catches too much mid-command hesitation to be empirically bimodal. - New column kd_max_pause_gap (seconds). The distracted bucket count alone can't distinguish one 3s pause from three 60s pauses; max-gap carries that signal in one cheap scalar (vs widening the histogram to a fourth bucket). - Scope-framing docstring above the whole kd_* section: intended use is session clustering / tooling attribution, explicitly NOT biometric identity, admission decisions, or ML-driven user ID. Keeps a future well-intentioned contributor from walking the project into legal/ethics territory by accident. - TODO comment on kd_top_bigrams: v1's JSON-in-TEXT is fine for "show the top digraphs on the attacker page". If bigram-similarity queries become hot, promote to a session_bigram_stats(sid, bigram, count, mean_iat_s) table or Postgres JSONB + GIN. Neither changes the write-side ingester materially. No new migration helper — pre-v1 schema additions go through create_all on fresh DBs; the existing _migrate_session_profile_table stays but does not get extended. Alembic lands at v1 and sweeps all the ad-hoc migrations at once.	2026-04-24 10:49:38 -04:00
anti	9232031ec7	feat(db): extend SessionProfile schema with DEBT-036 keystroke features Adds the three signal columns motivated by the manual keystroke analysis in DEBT-036 directly to the SessionProfile table. Pre-v1 so we modify the schema in place — Alembic arrives at v1. Columns: - kd_top_bigrams (TEXT) — JSON of top-N most-common digraphs with mean IAT per bigram. Complements kd_digraph_simhash ("same typist?") with "same typist in same mental state?" (tired / rested / distracted shifts bigram-specific IATs measurably). - kd_start_of_action_latency (REAL/DOUBLE) — median IAT of the first keystroke after an idle gap > 1s. Separates "initiating a command" from "executing a remembered one"; real humans have measurable start-of-action latency, bots don't. - kd_pause_hist_burst / _think / _distracted (INT) — three-bucket histogram (counts, <0.2s / 0.2-1.5s / >1.5s). More discriminating than the existing flat burst_ratio / think_ratio pair: C2 operators concentrate in burst with a thin tail; opportunistic humans have a fat think bucket and a long distracted tail. Both backends get an idempotent ADD COLUMN migration (_migrate_session_profile_table) wired into initialize() alongside the existing _migrate_attackers_table path — guards on PRAGMA table_info (SQLite) / information_schema.COLUMNS (MySQL) so reruns are safe. PII discipline comment on kd_digraph_simhash and kd_top_bigrams: both operate on bigram CHARACTERS, never on raw input stream content. Attacker passwords typed over SSH must not land here. Test updated for the MySQL initialize() migration-order contract.	2026-04-24 10:45:48 -04:00
anti	3787f7e5ec	docs(debt): DEBT-036 — session-profile ingester (keystroke dynamics) The SessionProfile SQLModel table has shipped with every column nullable since session-recording v1 landed — because the ingester that populates them from the [t,"i",d] events in the transcript shards does not exist yet (known as gap #2 in SIGNAL_CAPTURE_AUDIT). A manual keystroke-dynamics pass over one real session (wget scanme. nmap.orgh) trivially recovered CoV ≈ 0.74 (human band), a 467 ms semantic pause before the URL argument, tight intra-word bigrams (ge 79 ms, t<space> 83 ms), and slow start-of-action latency (w→g 225 ms) — all signals the existing schema columns were designed to hold. So the missing piece is purely the ingester. Entry captures: - the manual case as the motivating + sanity-check target (ingester should produce CoV ≈ 0.74 ± 0.05 on the same shard), - three schema extensions the manual analysis suggests beyond what the table carries today: kd_start_of_action_latency_ms, kd_pause_hist_{burst,think,distracted}, kd_top_bigrams, - a non-PII discipline line: raw keystroke content (including captured passwords) MUST NOT land in SessionProfile columns — only timing and frequency aggregates. Poll-driven ingestion can ship first; the bus-trigger path piggybacks on DEBT-031's deferred session-boundary topics.	2026-04-24 10:41:55 -04:00
anti	df67cb8a46	fix(web/session): don't stopPropagation on drawer panel — breaks player clicks The drawer used onClick={onClose} on the backdrop + onClick={e => e.stopPropagation()} on the panel to stop inside-clicks from closing the drawer. That pattern is fine for most React trees, but React's stopPropagation() also aborts the NATIVE DOM event — and asciinema- player wires its click-to-play handler via document-level event delegation. So every click inside the drawer (including the big play button) died at the panel boundary and never reached the player's dispatcher. Confirmed end-to-end by calling window.__ap. play() directly from DevTools: playback started, cast rendered in full, ended event fired. Swap to the idiomatic target===currentTarget guard on the backdrop so only genuine backdrop clicks close the drawer; everything inside (including native-delegated handlers) gets its events untouched. All the debug instrumentation from `b5c6b8a`, `4424138`, `6d031ae`, and `f032ece` (cast logging, lifecycle listeners, window.__ap) is reverted here — symptom root-cause is known, it was event delegation not the parser or the cast.	2026-04-24 10:35:11 -04:00
anti	6d031ae18c	debug(web/session): expose player instance as window.__ap The parse path works (metadata event fires with duration: 24.58s, idle event fires); next unknown is whether clicking play even reaches core.play(). Stash the player on window so the operator can call __ap.play() from DevTools to diff UI-click vs direct-call behaviour and see whether 'play' / 'playing' events fire. To be reverted once we pin the failure.	2026-04-24 10:31:31 -04:00
anti	442413870d	fix(web/session): subscribe to metadata/playing/idle/errored/reset/seeked too The original short subscribe list missed 'metadata' — which is the one that carries the parsed duration + theme + marker info AFTER _initializeDriver (the step that actually parses the cast). Without it we only saw 'ready' (= UI mounted, parse not yet run) and jumped to conclusions about the parser. Add the full lifecycle set so the next repro pins which step the player is actually getting stuck at.	2026-04-24 10:28:28 -04:00
anti	b5c6b8a073	fix(web/session): preload cast so parse runs at mount, not click Without preload:true the player only parses the recording when the user first clicks play. Any parse error during that lazy step bypasses our lifecycle instrumentation (we only see "ready", which just means UI mounted), and from the user's POV the play button stays black because they never see the actual failure. Forcing preload makes the driver's init() run synchronously-ish with the "ready" dispatch, so getDuration() resolves to a real number (or we see an "errored" event with a payload that tells us why).	2026-04-24 10:25:42 -04:00
anti	4a8b13b392	fix(web/session): instrument player lifecycle to catch async init failures The sync try/catch around AsciinemaPlayer.create() misses async failures in the player's internal init() promise — those land as unhandled rejections and are invisible from the component's POV. Subscribe to every lifecycle event (ready / play / pause / ended / error / errored / loading) and log the resolved duration. If the parser produces zero events despite a well-formed cast, duration resolves to 0 / NaN / rejected — one of those signals will point at whichever frame the render path is silently failing at.	2026-04-24 10:21:26 -04:00
anti	f032ece678	fix(web/session): log the cast to console when player mounts Diagnostic for the persistent "player mounts with chrome but plays black" symptom after the blob-URL fix. The player now gets {data: cast} correctly and parses at least enough to render the control bar, but duration shows --:-- and the terminal stays blank. Log the first 400 chars of the built cast + event/cols/rows so the operator can confirm in DevTools whether the malformed input is the cast itself or something downstream in the asciinema parser.	2026-04-24 10:17:57 -04:00
anti	e684feb1fe	fix(web/session): feed asciinema-player inline data, not a Blob URL SessionDrawer built a cast blob, pushed it through URL.createObjectURL, and passed the blob URL to AsciinemaPlayer.create(). That's racy with useEffect's cleanup: each new page of events re-fires the effect, the cleanup revokes the URL, and the player's already-in-flight async loadRecording() lands on a dead URL with no visible error — result was a centered play button with an empty black pane, playback never starts. asciinema-player v3's recording driver accepts {data: <string>} as a first-class source (see core-DnNOMtZn.js:905-930 doFetch — string/ ArrayBuffer data is wrapped in `new Response(value)` and handed to the parser). Skip the blob detour entirely, pass the cast text inline. Also filter events to valid asciicast channels (o/i/r) before feeding so a future stray SD field can't derail the parser, and log mount errors to console for next-time debugging.	2026-04-24 01:26:07 -04:00
anti	ec2360a5da	docs(debt): DEBT-035 — artifacts written as the container uid, not the API's Tracks the durable follow-up to `323077b`. The transcripts soft-fail shipped in that commit keeps the API from 500-ing on /var/lib/decnet/artifacts/** permission mismatches, but the real issue is that decoy containers write artifacts under a uid the API can't read — today's workaround is a manual `sudo chown -R` after every new deploy. Three design options documented (container-runs-as-host-uid, setgid + shared group, inotify sidecar) with a recommendation, plus an acceptance criterion: fresh init + deploy + record session → the API can read the transcripts with no manual chown.	2026-04-24 01:21:09 -04:00
anti	323077b383	fix(web/transcripts): fall back to shard-scan when Log row has no shard_path sessrec.c emits the session_recorded SD blob with sid/service/src_ip/ duration_s/bytes/truncated — it never emitted shard_path. The web handler still asked for fields.shard_path, got "", tripped the sessions-YYYY-MM-DD.jsonl basename regex and returned 400 "invalid shard name" for every legitimate transcript request. Handler now: - Fast-paths when fields.shard_path IS present and validates (for any future emitter or ingester that backfills it). - Otherwise enumerates sessions-YYYY-MM-DD.jsonl shards under ARTIFACTS_ROOT/{decky}/{service}/transcripts/ (newest first) and returns the first one whose per-sid index contains our sid. - Security invariant preserved: only files whose basename matches the _SHARD_BASENAME_RE are ever opened, and they always resolve inside ARTIFACTS_ROOT. A forged fields.shard_path is silently ignored. - Soft-fails OSError/PermissionError on the transcripts dir (decky containers often write it with a uid the API can't read) — returns 404 instead of a 500 traceback. test_forged_shard_path_blocked updated to match the new semantics: forgery is ignored, the real shard is served via fallback. The invariant (no /etc/passwd access) is still asserted by the fact that status is 200 with data from the test shard.	2026-04-24 01:18:40 -04:00
anti	215251a122	fix(deploy): template --group on the bus ExecStart too decnet-bus.service.j2 ran with User={{ user }} / Group={{ group }} but the actual bus CLI invocation hardcoded --group decnet. The bus chowns /run/decnet/bus.sock to that group at 0660 — so when an operator ran `decnet init --group anti`, the socket ended up owned by decnet:decnet while every worker (agent, api, collector, forwarder, prober, updater) ran as anti and got EACCES on connect(). Each worker's bus-wiring catches the error, logs a warning, sets bus=None, and carries on — which is correct for the data-plane but silently kills Workers-panel heartbeats (run_health_heartbeat(None, ...) no-ops). So half the worker grid showed UNKNOWN even though systemctl confirmed the processes were alive. Swap the hardcoded --group decnet for --group {{ group }} so the socket is owned by the same group the workers run under.	2026-04-24 01:09:55 -04:00
anti	e4ccf30133	fix(init): template the polkit rule on --group too polkit rule 50-decnet-workers.rules hardcoded isInGroup("decnet"), so when 'decnet init --group anti' installed systemd units as User=anti / Group=anti, the API (running as anti) could no longer systemctl start/stop decnet-*.service — polkit fell back to 'interactive authentication required', which in a daemon context is a hard fail: START FAILED · COLLECTOR — Failed to start decnet-collector.service: Access denied as the requested operation requires interactive authentication. Rename the rule to .j2, parameterise the group on {{ group }}, and route _install_polkit through _render_template / _write_rendered_if_changed. Now the polkit rule matches whatever group was passed to 'decnet init'. Test fixture updated to seed the .j2 variant.	2026-04-24 01:07:16 -04:00
anti	08436433ef	fix(deploy): relocate StandardOutput/StandardError below multi-line ExecStart Four templates use backslash line-continuation on ExecStart (decnet-bus, decnet-forwarder, decnet-listener, decnet-updater). My earlier sed inserted StandardOutput= and StandardError= right after the first ExecStart= line, which split the command and systemd fed those two lines back to the binary as extra positional arguments — the bus in particular crashed with: Got unexpected extra argument (StandardOutput=append:/var/log/decnet/decnet.bus.log) Walk the ExecStart block (follow \-continuation lines) and insert the two Standard* directives AFTER the last continuation line. The nine single-line ExecStart templates are unaffected in shape but re-written through the same path to keep the whole set uniform.	2026-04-24 01:03:58 -04:00
anti	311da4098e	fix(logging): don't crash the process if the system log can't be opened _configure_logging opened InodeAwareRotatingFileHandler against DECNET_SYSTEM_LOGS (default: relative decnet.system.log) without guarding OSError. Under systemd with ProtectSystem=full + ProtectHome=read-only and no writable path baked into the unit, the first import of decnet.config raised OSError and the daemon died before it could even print a useful error — the root-cause log line showed up in journalctl as a stack trace rather than a warning. Wrap the handler attachment in try/except OSError and log a single WARNING via the already-installed stream handler. stderr is always attached, so losing the file handler means operators tail journalctl / docker logs instead — the daemon keeps running.	2026-04-24 01:00:42 -04:00
anti	d4b714dc39	fix(deploy): wire per-unit log files on master systemd services The agent-side enroll-bundle templates (decnet/web/templates/) always set DECNET_SYSTEM_LOGS + StandardOutput/StandardError to a per-unit file under /var/log/decnet. The master-side init templates (deploy/) never did, so every 'decnet init'-installed service: - inherited the default DECNET_SYSTEM_LOGS=decnet.system.log — a relative path, landing in the unit's WorkingDirectory. All 13 units shared the same cwd and fought for the same file, or more often just failed to write it under ProtectSystem=full, - emitted stdout/stderr to the journal by default, which is fine for uvicorn's INFO banter but makes per-service grepping a pain when you're chasing a single worker's trace. Mirror the agent-side wiring on all 13 master templates: - Environment=DECNET_SYSTEM_LOGS=/var/log/decnet/decnet.<name>.log - StandardOutput=append:/var/log/decnet/decnet.<name>.log - StandardError=append:/var/log/decnet/decnet.<name>.log /var/log/decnet is already in ReadWritePaths so ProtectSystem=full stays compatible. Operators now get a dedicated /var/log/decnet/decnet.<unit>.log per service, both from the app's structured logger and from any stray stderr — journalctl still works too, but no longer the only option.	2026-04-24 00:57:23 -04:00
anti	c282f74bd4	fix(web/dashboard): wrap long kv-chips instead of blowing out the EVENT column Key:value chips in the live-feed event cell used the default .chip style, which is white-space: nowrap + inline-flex. A long cmd: value (attacker-controlled shell strings, URLs, base64 payloads) stretched the chip horizontally past the column, pushing the whole table into horizontal scroll and clipping subsequent columns off-screen. Add a chip-kv variant that allows the value to wrap inside a max-width: 100% chip (word-break: break-word, overflow-wrap: anywhere for dense strings with no natural break). The key-label stays on the first line via flex-shrink: 0. Short values (uid: 0, user: root) stay tight; long ones wrap onto multiple lines inside the chip. Also set minWidth: 0 on the EVENT td + nested flex containers so flex children honour the column width instead of growing to fit content. Added title={k: v} on each chip for full-value hover in case the wrap is still clipped.	2026-04-24 00:51:31 -04:00
anti	bfff212a05	fix(api): gate embedded Docker-log collector on DECNET_EMBED_COLLECTOR The API lifespan unconditionally spawned log_collector_worker, appending every container line to DECNET_INGEST_LOG_FILE. On hosts that also run decnet-collector.service (installed by 'decnet init') that's two tailers writing the same events to the same file — the ingester then inserts each event twice and the dashboard shows every command duplicated. Add DECNET_EMBED_COLLECTOR (default false), matching the existing DECNET_EMBED_PROFILER and DECNET_EMBED_SNIFFER pattern directly above this block. Single-process dev setups without systemd can flip it on to restore the all-in-one behaviour; multi-process production gets the single-writer invariant by default.	2026-04-24 00:47:37 -04:00
anti	edc8297af3	fix(init): gate userdel/groupdel on --purge to avoid nuking the operator Every plain `decnet deinit` ran userdel + groupdel unconditionally. In dev the operator may pass `--user $USER --group $USER` to avoid file ownership churn against a source checkout — at which point deinit would cheerfully delete their own login account. Move user/group removal behind --purge, matching the existing behaviour for /var/lib/decnet + /var/log/decnet. Help text updated: --purge now clearly advertises that it also wipes the service user/group, with an explicit warning to only run it when `decnet init` created the account in the first place. Test updated: plain --deinit must NOT invoke userdel/groupdel; --deinit --purge must.	2026-04-24 00:38:51 -04:00
anti	38832d87d5	fix(init): thread --user / --group through systemd unit templates Every decnet-*.service.j2 hardcoded User=decnet / Group=decnet. The init CLI accepted --user / --group and used them for useradd, chown, /etc/decnet ownership and ReadWritePaths — but the Jinja context omitted them entirely, so sudo decnet init --install-dir $PWD --user anti --group anti rendered User=decnet Group=decnet into every unit, which at best ran the workers as a user that didn't match the files (fails to read the venv / config), and at worst spun a parallel system user the operator never asked for. Swap the hardcoded lines to {{ user }} / {{ group }} across all 13 templates and add both to the Jinja context in _install_units.	2026-04-24 00:36:23 -04:00
anti	51012eaa67	feat(init): decouple venv from install_dir; fail loud if no venv exists The systemd unit templates hardcoded {{ install_dir }}/venv/bin/decnet. On production hosts enroll_bootstrap.sh creates exactly that path so it worked. On dev boxes where the operator runs `sudo decnet init` against a source checkout with a differently-named venv (.venv, .311, .312), every decnet-*.service looped forever in auto-restart with: Failed at step EXEC spawning .../venv/bin/decnet: No such file or directory Templates now use {{ venv_dir }} as an independent Jinja2 var. `decnet init` adds --venv-dir (explicit override), otherwise autodetects: 1. $VIRTUAL_ENV (only when inside --install-dir, so a user-home venv never gets baked into a root-owned unit), 2. {install_dir}/venv (production default; what enroll_bootstrap creates), 3. {install_dir}/{.venv,.311,.312,.313} (common dev conventions). Init aborts before any file writes if nothing resolves — an operator-friendly error beats journalctl spam on every unit restart. python3-venv doesn't set a persistent system variable — $VIRTUAL_ENV lives in the activated shell only — so this has to be decided + baked in at init time; there's no way for systemd to "inherit the current venv" at unit start. Test mode (--prefix) skips venv validation so the existing test suite doesn't need to stub up a venv tree per case.	2026-04-24 00:29:49 -04:00
anti	cb692d570a	feat(cli): status queries systemd for every decnet-* unit 'decnet status' used to psutil-scan for cmdlines matching hand-coded service launch args. That worked on dev boxes running workers via 'python -m decnet.cli ...' but missed the systemd reality on real hosts: units may be installed but not started, failed, or in auto-restart — all invisible to a cmdline grep. New behaviour: status calls `systemctl list-units --type=service --all --output=json 'decnet-.service'` and renders the unit/load/active/ sub/description matrix. One view works for masters, agents, and mixed hosts — iterates over whatever 'decnet-' units were installed by 'decnet init' / the enroll-bundle. Agent/master mode filtering is no longer needed in the CLI; the host literally does not have master-only units installed if it enrolled as an agent. The psutil path survives as a fallback for boxes without systemd (dev laptops, CI containers, minimal init systems) so the command stays useful there. Clearly labelled 'psutil fallback' in the table title so operators know which view they're looking at.	2026-04-24 00:23:00 -04:00
anti	d61e143b71	fix(stress): unblock Locust runs from login rate-limit self-DoS Locust spawns N virtual users (default 1000), all from 127.0.0.1 as admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so the 11th on_start() got 429 and a RuntimeError. A @task(2) login in the task weights turned the whole run into a 429 factory even after ramp-up. And _login_with_retry treated 429 as non-retryable, so there was no graceful degradation path. Three changes, one root cause: - decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true). When false, slowapi's Limiter(enabled=False) makes @limiter.limit a no-op. Default ships unchanged; nobody should ever release with this off. - tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the uvicorn subprocess env. Stress tests measure throughput, not rate limiting. - tests/stress/locustfile.py: drop the @task(2) login — it added zero coverage (every user already logs in at on_start) and only generated contention. Teach _login_with_retry to honour 429 + Retry-After so a Locust pointed at a limiter-enabled server degrades gracefully instead of crashing on_start.	2026-04-24 00:13:15 -04:00
anti	ae92948e22	test(live): align mqtt/postgres/mysql live tests with honeypot + loop realities Three unrelated test-correctness fixes exposed by running tests/live: - test_mqtt_live: honeypot defaults to auth-required (post-2018 realistic broker). Anonymous CONNECT is rejected with CONNACK rc=5, which the "accept" / "subscribe" tests misread as a failure. Pass MQTT_ACCEPT_ALL=1 via a new env= override on the live_service factory so only those two tests opt into accept-all. - test_postgres_live::test_auth_hash_logged: connected with dbname='prod', which isn't in the honeypot's per-instance DB list, so Postgres (correctly) rejected at startup before asking for a password — blowing past the auth event the test asserts on. Target 'postgres' (always in _BASE_DBS) to reach the auth stage. - test_mysql_backend_live: the module-scoped mysql_test_db_url fixture is bound to the module loop, but function-scoped tests default to their own per-function loops. Any reuse of the asyncmy pool then tripped "Future attached to a different loop". Pin the whole module with pytest.mark.asyncio(loop_scope='module').	2026-04-23 22:06:55 -04:00
anti	26d04d5eb8	fix(db): SessionProfile.kd_digraph_simhash must be BINARY(8), not BLOB MySQL can't index a BLOB/TEXT column without a prefix length, so create_all() on a fresh MySQL schema blew up with "BLOB/TEXT column 'kd_digraph_simhash' used in key specification without a key length". SimHashes are a fixed 8 bytes — the variable-length type was a SQLAlchemy-side auto-mapping from 'Optional[bytes]', not an actual schema requirement. Switch to BINARY(8), which is portable: MySQL gets a fixed-width indexable BINARY, SQLite treats it as BLOB and doesn't care about key length.	2026-04-23 22:06:38 -04:00
anti	f0b0967b16	chore(gitignore): ignore dev-host noise (.311 venv, wiki clone, scratch logs) - .311/ and .3[0-9][0-9]/ + .venv/ — cpython-version-suffixed venvs (common convention) now covered alongside the existing .venv/. - wiki-checkout/ — local nested clone of the wiki; never a submodule. - hang.log / schem / .pytest.log — scratch dumps from saved pytest output redirections. - deps.txt — pydeps-style dependency graph from local analysis runs. No tracked files affected; just stops new working-tree noise from showing up in git status.	2026-04-23 21:52:40 -04:00

1 2 3 4 5 ...

643 Commits