Commit Graph

300 Commits

Author SHA1 Message Date
c78ab032bd fix(xff): truncate LEAKED IPs + ROTATION badge for rotation attacks
`for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...`
was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row,
drowning the rest of the ORIGIN section. The 100-IP wall is itself a
signal (WAF-bypass-list probing) that deserves a short badge, not a
flood.

Backend:
- get_attacker_ip_leaks gains `limit: int = 10` parameter — caller
  only ever needs a sample, not the full set.
- New count_attacker_ip_leaks() returns the unbounded COUNT(*) via
  one cheap SQL aggregate.
- Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N}
  so the UI can render a rotation badge independent of list length.

UI:
- New LeakedIPsRow component. First 5 distinct IPs rendered inline
  with hover tooltips (unchanged). When > 5, a `+ N more` expand
  button reveals the rest of the sample; when total exceeds the
  10-row cap, a subtle `(+M beyond sample)` note appears.
- When total ≥ 20, a red `ROTATION · N` tag renders leading the
  row with a tooltip explaining the semantic: "almost certainly
  XFF-rotation / WAF-bypass probing, not a real attribution leak."

DB churn is deliberately not capped — 100k rows × ~500 B is tolerable.
If it becomes a problem we can add an ingester-side count-and-skip;
for now the UX fix is the whole story.

Added test_ip_leaks_total_reported_separately_from_list asserting
the endpoint shape matches what the UI consumes.
2026-04-24 18:25:46 -04:00
ca39552692 feat(ua): classify User-Agent into scanner/cli/library/bot/nonstandard
Every http_useragent bounty now carries a `category` label plus an
optional tool name and a signals list. The main analytic win is the
`nonstandard` bucket — UAs like "FUCKYOU/1.0" or custom one-off
scanner labels that don't match any known pattern, which today
silently blend into the generic fingerprint list.

Buckets (priority order):

- scanner: nmap, nuclei, sqlmap, gobuster, nikto, masscan, zgrab,
  ffuf, wpscan, katana, burp, acunetix, nessus, openvas, arachni,
  whatweb, wappalyzer, etc.
- cli:  curl, wget, httpie, xh, fetch.
- library: python-requests, aiohttp, httpx, urllib, Go stdlib, Java,
  okhttp, Apache HttpClient, axios, node-fetch, got, undici, PHP,
  Guzzle, Ruby stdlib, Faraday, .NET, PostmanRuntime, Insomnia, etc.
- bot:  anything containing bot / crawler / spider / slurp / monitor
  (catches Googlebot, bingbot, Baiduspider — many of which ship a
  Mozilla/5.0 prefix, so the bot check runs BEFORE the browser
  regex).
- browser: Mozilla/5.0-prefixed UAs that aren't bots.
- nonstandard: anything else. The interesting bucket.
- empty: literal empty User-Agent header.

Side signals computed regardless of category: suspicious_short (<8
chars), suspicious_long (>512 chars), nonprintable (control chars),
injection_like (SQLi / XSS / path-traversal / Log4Shell markers).
A sqlmap UA with a literal SQL-injection payload embedded fires
category=scanner + injection_like — the combination tells the
analyst the tool is being operated manually vs. on default config.

Classification is deterministic (same UA string → same tuple) so
add_bounty's payload-hash dedup continues to collapse repeat rows.

UI renderer upgraded from FpGeneric to a dedicated FpUserAgent that
colours the category tag by risk (scanner=alert-red,
nonstandard=warn-yellow, browser=accent-green, etc.) and renders
each signal as its own chip. Makes the interesting rows pop in the
fingerprints panel.

Also fixed: the ingester was using `_headers.get("User-Agent") or
_headers.get("user-agent")`, which short-circuits away empty-string
UAs. An explicit empty UA is itself a signal (real clients always
send something) — now captured.
2026-04-24 18:17:18 -04:00
6d1d69443a fix(xff): split leak from spoof — loopback/private claims aren't leaks
An attacker hitting /admin with `X-Forwarded-For: 127.0.0.1` was
previously flagged as an IP leak. It isn't — that's the classic
IP-allowlist / WAF-bypass payload ("treat me as localhost and skip
your auth checks"). Misclassifying it as "LEAKED IPs" in the UI
confuses analysts and burns trust in the signal.

Split by claim category. After pulling the left-most claimed IP
from the proxy header, classify:

- public (routable) → bounty_type=ip_leak (real attribution leak;
  the attacker's upstream proxy forwarded their real IP).
- loopback / private / link-local / multicast / reserved /
  unspecified → bounty_type=fingerprint, fingerprint_type=
  spoofed_source (WAF-bypass / allowlist-probing attempt; the
  attacker is telling us they know what XFF does).
- unparseable → dropped.

Same extraction pipeline; diverges only at the last step. A new
shared _classify_proxy_header_claim returns (kind, payload);
_detect_ip_leak keeps its public-only contract for backward-
compat; _detect_spoofed_source is the new sibling.

UI renderer FpSpoofedSource shows the claimed IP in warn color with
the claim_category tag (LOOPBACK / PRIVATE / ...) and a WAF-BYPASS
ATTEMPT badge — distinct visual from the "LEAKED IPs" row which
stays reserved for genuine public-IP leaks.

Test addresses updated: RFC 5737 doc ranges (198.51.100.0/24,
203.0.113.0/24) are flagged `is_reserved` in Python's ipaddress
module, so they now correctly belong to the spoof bucket — tests
that meant to exercise real public IPs now use 8.8.8.8 / 1.1.1.1 /
Cloudflare DNS. Added eleven new tests locking the classifier +
the two detectors' mutual exclusion.
2026-04-24 18:06:29 -04:00
2c876b4d86 fix(bounties): strip per-request fields from fingerprint payloads
add_bounty dedups on (attacker_ip, bounty_type, full payload JSON).
Three fingerprint-family bounties (http_useragent, ip_leak,
http_quirks) were including method/path / header_count in their
payloads — fields that vary per request — so a scanner hitting 100
paths produced 100 rows instead of 1, which is what was swelling
AttackerDetail.

Payloads now carry identity-only fields:

- http_useragent: {fingerprint_type, value}. UA + path combinations
  no longer collide; one row per distinct User-Agent string.
- ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}.
  One row per distinct (proxy source, leaked IP, leaking header)
  triple; repeat hits with the same header on different paths dedup.
- http_quirks: {fingerprint_type, order_hash, order, casing_hash,
  casing_category, stable_count, tool_guess}. No more header_count
  (included volatile headers; Cookie-presence variance broke dedup).

Per-request context (path, method, etc.) was never load-bearing for
analysts — the logs table already answers "when + where" at
per-event resolution. The bounty table is for stable identity.

UI:
- FpHttpQuirks renderer drops the method/path footer line and the
  header_count/duplicates tags; shows stable_count instead.
- LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for
  "Leaked via X; source 203.0.113.42" — same information, stable.

Tests add a "payload stable across paths and methods" assertion on
http_quirks — locks the contract so a future regression that sneaks
a per-request field back in fails loudly.

Existing duplicate bounty rows don't retroactively collapse.
Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and
restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type,
payload) — trivial but not automated.
2026-04-24 17:58:54 -04:00
dccb410bb3 feat(http): header-quirks fingerprint — order + casing + tool guess
Per-request HTTP fingerprint derived from the header dict we already
log. Captures:

- order_hash: SHA-256 prefix (16 hex) over the lowercased header-name
  sequence, minus volatile/per-request headers (Content-Length,
  Cookie, Authorization, XFF family, trace IDs). Stable identity for
  a given client stack regardless of which target / path is hit.
- casing_hash: same shape but over the per-header casing category
  (Title-Case / lower / UPPER / mixed). Attackers frequently spoof
  User-Agent but forget their stack sends `user-agent` while browsers
  send `User-Agent`.
- tool_guess: prefix match against curl / python-requests /
  Go-http-client / nmap-nse signatures. Cheap, best-effort — the
  hash is the hard signal.
- duplicates: reserved for when the HTTP template switches from
  dict(request.headers) to a list form; today it always fires empty
  because dict() collapses duplicates.

Payload is a fingerprint bounty (bounty_type="fingerprint",
fingerprint_type="http_quirks"). Bounty dedup collapses identical
hashes per attacker — one row per distinct fingerprint — so a chatty
scanner doesn't spam the vault, but a tool-chain change from the
same IP surfaces as a new row.

UI renderer (FpHttpQuirks) shows the two hashes, tool guess badge in
violet, casing/count tags, and a collapsible header-order list.
Added to the passiveTypes group so it nests with JA3/JA4L/etc. in
the AttackerDetail fingerprints panel.

One library note: the naive "title-case" classifier failed on tokens
like `X-Forwarded-For` because Python's "".islower() returns False
so `p[1:].islower()` rejects single-letter tokens like the `X`.
Fix: explicitly accept single-char tokens when uppercase.
2026-04-24 17:51:40 -04:00
2a0c5ca410 feat(attackers): XFF mismatch detection — attacker IP leak bounties
Attackers routinely front their scanners with VPNs/proxies, so the
TCP source we log is the proxy egress, not the real host. But a
surprising number of attacker setups are misconfigured: the proxy
forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP
/ CDN-variant) header. From our side that's a free attribution leak.

New _detect_ip_leak extractor in decnet/web/ingester.py fires at
ingest time per HTTP request. Logic:

1. Require service=http, source_ip present, headers present.
2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or
   CIDRs) → legitimate reverse-proxy forwarding, skip.
3. Walk proxy-family headers in priority order: Forwarded (RFC 7239)
   → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP.
4. Extract the left-most parseable IP from the winning header.
5. If that IP differs from the TCP source → emit a bounty with
   bounty_type="ip_leak" carrying {source_ip, real_ip_claim,
   source_header, headers_seen, path, method}.

Storage is the existing Bounty table — no schema change; de-dup is
handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so
repeat requests with the same leaked IP don't spam.

AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN
listing distinct real_ip_claim values; hover tooltip shows the source
header + path of the most recent leak. Only shown when at least one
ip_leak bounty exists.

RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4,
IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only
IPs that actually parse.

Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For
mismatches". Phase 3 of the three-phase Attacker Intelligence series
(phases 1: scanned-vs-interacted, 2: PTR records already shipped).

DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's
"revisit when verified-proxy config lands" note — same token set
future rate-limit work will consume.
2026-04-24 17:39:03 -04:00
5a34371009 feat(attackers): PTR record (reverse DNS) enrichment
Resolve each attacker IP's rDNS name once at first sighting, store on
Attacker.ptr_record, render on AttackerDetail under ORIGIN. Many
attackers run infrastructure with forgotten rDNS that instantly
identifies them once surfaced: scan-node-42.shodan.io,
shady-vps.leasecloud.net, etc.

Resolver lives in decnet/geoip/ptr.py — colocated with enrich_ip
because the shape matches (take an IP, return supplementary
metadata, never raise). Uses the OS resolver via socket.gethostbyaddr
offloaded to the default executor, wrapped with asyncio.wait_for
timeout=2s so a slow authoritative NS can't stall the profiler tick.

Profiler side: _WorkerState grows a ptr_attempted: set[str] bounding
resolution to once per worker lifetime. Cold-start batches resolve
concurrently (Semaphore(_PTR_CONCURRENCY=10)) so a backlog doesn't
serialize 2s ceilings. _build_record gains a keyword-only ptr_record
parameter that, when _UNSET, omits the key from the record dict —
upsert_attacker's attribute-merge loop then preserves whatever's
stored on the row. Explicit None is a "fresh failed attempt" signal
and gets written through.

Env kill-switch DECNET_PTR_ENABLED=false for locked-down deploys
where egress DNS is forbidden. Private / loopback / link-local /
multicast / reserved addresses short-circuit before any DNS call.
IPv6 reverse DNS works transparently through the stdlib resolver.

Schema change — run once on upgrade:

  ALTER TABLE attackers
    ADD COLUMN ptr_record VARCHAR(256) NULL DEFAULT NULL;

Or drop-and-recreate on dev boxes (db-reset's SQLModel.metadata-driven
table discovery now picks it up automatically since ba155b7).

tests/conftest.py disables DECNET_PTR_ENABLED globally for the same
reason it disables DECNET_GEOIP_ENABLED — unit tests must never hit
the network. tests/geoip/test_ptr.py re-enables explicitly via an
autouse fixture.
2026-04-24 17:26:40 -04:00
351a8939c3 feat(attackers): scanned vs. interacted service bucketing on detail page
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.

Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:

- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
  SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
  activity (file_captured/upload/download_attempt/retr), pub/sub
  (publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
  unknown_*).
- Everything else defaults to scan. Conservative by design: new
  template verbs show up as "scanned" until explicitly promoted.

Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.

Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.

Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.

Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
2026-04-24 17:12:20 -04:00
ce6b4a4174 fix(web/api): scope DB-retry sleep so tests don't starve background tasks
test_lifespan_db_retry patched decnet.web.api.asyncio.sleep to skip the
DB-retry backoff. Problem: asyncio is a shared module — the patch leaks
to every caller that looked up asyncio.sleep via `import asyncio`,
including run_health_heartbeat's own sleep loop. That heartbeat task
spawns inside the same lifespan; with its sleep mocked, the while-loop
spins tight, starves cancellation, and leaves an orphan task that
pytest-timeout eventually signals — surfacing as the 'Task exception
was never retrieved' warnings the user saw when running the suite.

Fix: give decnet.web.api a local binding `_retry_sleep = asyncio.sleep`
for the DB-retry wait, and have the test patch that instead. Narrowly
scoped, no impact on asyncio.sleep callers elsewhere.

Test timing before: 12s with --timeout=10 (interrupted by signal).
Test timing after: 0.58s. Full tests/web slice: 27s → 7.1s with the
spurious warnings gone.
2026-04-24 17:11:44 -04:00
efc98285aa fix(webhook/worker): self-heal when bus starts late or restarts
Before: if the bus was unreachable at worker start, we logged
"running in idle mode" once and parked on shutdown forever. systemd
doesn't guarantee bus is fully up before the webhook worker starts,
so a race on boot left the worker permanently dead until restart.

Now: wrap the whole bus-use in an outer reconnect loop.

  while not shutdown:
    try: connect()
    except: sleep(RECONNECT_SECS) ; continue
    try: run_with_bus(...)       # heartbeat + dispatch
    except: log+close ; reconnect on next iter

Clean consequence: if the bus dies mid-operation the dispatch loop's
subscriptions raise inside the consumer tasks, `_run_with_bus` exits,
the outer loop closes the stale connection and reconnects. No partial
state leaks across epochs — fresh bus, fresh subs, fresh heartbeat.

Interval is 60s by default, overridable via
DECNET_WEBHOOK_BUS_RECONNECT_SECS. Shutdown wakes the wait so
systemctl stop doesn't hang for a minute.

Test added: flaky get_bus that fails once, then returns a live
FakeBus — asserts retry + successful delivery.

get_app_bus() in decnet/bus/app.py already has a 2s backoff retry so
the FastAPI hot path self-heals; this commit brings the standalone
webhook worker in line with the same posture.
2026-04-24 16:39:38 -04:00
2bcef50ac5 feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.

Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.

Three observable states now distinguishable in the UI:
- Active              enabled=True,  auto_disabled_at=NULL
- Admin-paused        enabled=False, auto_disabled_at=NULL
- Tripped             enabled=False, auto_disabled_at=<ts>

UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").

record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.

Closes THREAT_MODEL WH-02 and DEBT-037 §1.
2026-04-24 16:24:33 -04:00
638236113d feat(webhooks): non-blocking http:// warning + WH-03 accepted risk
WebhookResponse now carries a `warnings: list[str]` field. When the
subscription's URL starts with http://, an `insecure_url` advisory is
surfaced on every GET/CREATE without blocking the request. HMAC still
detects tampering regardless of transport — only read-confidentiality
is lost over plaintext — and test/dev environments without TLS stay
usable.

Matches the operator-trust posture already established by DA-06
(admin-on-admin protection is out of scope). The alternative — hard
rejection at admin time — was considered and declined; warning-plus-
visibility is the right shape.

THREAT_MODEL WH-03 accepted risk registered; revisit triggers are
multi-admin delegation, a regulated customer, or an operator ticket
asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.
2026-04-24 15:53:30 -04:00
e6127a81a1 feat(webhook): worker + CLI + systemd unit
Introduces the `decnet webhook` long-running worker that consumes the
internal bus and POSTs matching events to configured subscriptions.

Design: one task per (subscription, pattern) pair. Each task opens
its own bus subscription, iterates events, and dispatches via the
shared deliver() client. No intermediate queue, no in-memory filter
matching — the bus's own pattern matcher is the filter. Reloads on
`system.webhook.subscriptions_changed` signals from the CRUD router,
with a 60s fallback timer in case a signal is lost.

Shutdown propagates via CancelledError on the outer task; all inner
subscription tasks are cancelled and awaited in a finally block.
Bus unavailable → worker stays up in idle mode per the DEBT-031
pattern, logging one warning.

Registered as a master-only CLI command (agents don't configure
webhooks — the subscription store lives on master). systemd unit
mirrors the profiler template; added to decnet.target Wants= list so
`systemctl start decnet.target` brings it up alongside everything
else. `decnet init` auto-picks up the new .service.j2 via its
existing `glob("decnet-*.service.j2")` sweep.
2026-04-24 15:46:11 -04:00
b70845a85d feat(webhooks): subscription CRUD + HMAC-signed delivery client
Introduces the webhook egress foundation — a new WebhookSubscription
table, admin-gated CRUD under /api/v1/webhooks, and the shared
delivery client that both the test-ping route and the upcoming worker
will use. No worker yet; this commit is API + model + client only.

Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands
to bus-topic patterns at the router layer; storage is always the raw
pattern list. Advanced mode lets admins supply raw NATS-style patterns
directly. Filter-at-subscribe: the worker (next commit) will subscribe
to the union of patterns across enabled subscriptions.

Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature),
retry on 429/5xx/network errors with jittered backoff, no-retry on
4xx. Secrets never leave the server on GET/LIST — only the create
response carries the secret for copy-out.

CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after
every mutation so the (future) worker can hot-reload.

Opens DEBT-037 for the deferred items (circuit breaker, dead-letter,
batch delivery, payload templates, secret-at-rest).
2026-04-24 15:30:05 -04:00
162f7c1194 feat(api/sse): per-user connection cap + viewer-safe invariant
New decnet/web/sse_limits.py provides sse_connection_slot, an async
context manager that counts live SSE connections per user UUID and
raises 429 when a per-user cap is exceeded (default 5, override via
DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their
first async with, so the cap check fires before any stream data is
yielded.

The cap must sit inside the generator — StreamingResponse returns
before the generator body runs, so a handler-level wrapper would
release the slot immediately. Put prefetch + slot + loop all under
the one async with.

Also documents F6/I (role leakage) as mitigated-by-construction via
handler docstrings: every event type on both streams wraps data
already reachable via viewer-gated REST, so no per-event filter is
needed until a new event family is introduced. The invariant is
written into the handler docstrings so a future PR can't silently
add admin-only events.

Resolves THREAT_MODEL F6/I and F6/D.
2026-04-24 15:01:20 -04:00
e53b580767 test(api): RBAC contract test — viewer JWT on every classified route
New test walks app.routes, classifies each APIRoute as admin/viewer/open
by identity-matching require_admin / require_viewer closures inside the
route's dependency tree, then asserts:
  - admin routes return 403 to a viewer JWT
  - viewer routes return neither 401 nor 403 to a viewer JWT
SSE routes skipped (separate scope under F6). Role hints deliberately
NOT encoded in the OpenAPI spec — classification stays server-side so
/openapi.json can't be used to enumerate admin routes.

Resolves THREAT_MODEL F2/I + F5/E; paired with the existing
test_schemathesis.py::test_auth_enforcement (401-half coverage).
2026-04-24 14:00:12 -04:00
99ccd41bb5 feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options
Harden the attacker-controlled artifact download path (F7) with explicit
response headers instead of relying on Starlette's defaults (which only
emit attachment for non-ASCII filenames and never set nosniff). Also
resolves the THREAT_MODEL F7 path-traversal row (containment check was
already in _resolve_artifact_path) and the fleet-deploy detail=str(e)
audit (all four sites are admin-gated deliberate validator UX or
structured worker-response fields).
2026-04-24 13:24:34 -04:00
ec1079e78b feat(profiler): wire p0f-v2 matcher into sniffer_rollup priority chain
The ~30-signature hand-rolled p0f-lite table in decnet/sniffer/p0f.py
misses most real-world attackers (yesterday's SLOW SCAN being a
textbook case — 9 hours of events, 19 hits, os_guess = NULL). The
375-sig vendored p0f v2 DB was already there; this commit actually
calls it.

New resolution chain in sniffer_rollup:

  1. Enabled OS-fingerprint providers (p0f-v2 default, via
     DECNET_OSFP_PROVIDERS) tried in declared order. Provider with
     highest-confidence match across all enabled sources wins.
  2. Modal os_guess label from the sniffer's hand-rolled p0f.py.
     Kept as fallback because v2's DB predates post-2006 kernels.
  3. TTL bucket (linux / windows / embedded). Coarse but never wrong.

Wiring details:

- _match_via_osfp_providers: never raises — factory / provider
  failures collapse to None and the chain falls through to the
  old modal-label / TTL path. A corrupt .fp file or misconfigured
  DECNET_OSFP_PROVIDERS must never wedge a profile rebuild.
- tcp_fp_context tracks whether the LATEST tcp_fp snapshot came
  from a passive SYN ('syn' → p0f.fp) or an active prober probe
  ('synack' → p0fa.fp). Routes to the right sig list.
- initial-TTL normalisation via decnet.sniffer.p0f.initial_ttl.
  Observation's TTL may be N hops below the OS's initial; v2
  signatures match on the canonical bucket.

Soft-field semantics on Signature.score(): df and total_len are now
skip-checked when the observation is missing them. Sniffer doesn't
currently emit either SD field; a literal-constraint sig
shouldn't hard-reject a match solely because of upstream
incompleteness. Hard fields (window, ttl, options_sig, quirks)
still hard-reject on absent/mismatched input — those are the real
discriminators. Promote df / total_len back to hard the moment the
sniffer starts emitting them.

+2 integration tests on TestSnifferRollup, +2 soft-field tests on
test_signature. Full regression: 166 tests across tests/prober/osfp
+ tests/profiler all green.
2026-04-24 11:56:50 -04:00
8a430bf725 feat(prober/osfp): P0fV2Provider + factory dispatch
- decnet/prober/osfp/p0f/provider.py: P0fV2Provider loads the four
  vendored .fp files into per-context signature lists (syn / synack /
  rst / stray) and matches via highest-specificity score across the
  relevant list. Also auto-picks up p0f-decnet.fp if present (GPL-3.0
  additions land there later, empty for now).
- decnet/prober/osfp/factory.py: get_provider / get_all_providers /
  reset_cache, mirrors decnet/geoip/factory exactly. Env-dispatched
  via DECNET_OSFP_PROVIDERS (default "p0f-v2"). Reserved names
  "nmap-osdb" (pending Fyodor's grant) and "decnet-observed" (our
  future curated DB) raise NotImplementedError — visible on the
  factory surface so a typo doesn't silently fall through.
- decnet/prober/osfp/__init__.py now re-exports the public API so
  callers use `from decnet.prober.osfp import get_provider` without
  reaching into submodules (upholds the provider-subpackage rule).

15 new provider+factory tests covering:
- All four DB contexts load (262/61/46/6 sigs per inventory).
- Known-good Linux 2.6 SYN + Linux 2.2 SYN-ACK match end-to-end.
- Unknown observations / contexts return None, not raise.
- Factory memoises, env override honoured, unsupported names raise.
- Reserved names raise NotImplementedError (not silent None).

`sniffer_rollup` wiring lands in the next commit.
2026-04-24 11:50:46 -04:00
41ff6b4b03 feat(prober/osfp): p0f v2 .fp parser + Signature scoring
First code layer of the OS-fingerprinting work on top of yesterday's
vendored p0f v2 database. Three new modules, all pure (no I/O outside
of the parser's file read):

- decnet/prober/osfp/base.py — Provider protocol + OsMatch dataclass
  matching the established Provider convention in decnet/geoip and
  decnet/bus. Docstring spells out the never-raise invariant: malformed
  input returns None, so a single bad event can't wedge a whole
  attacker-profile rebuild.

- decnet/prober/osfp/p0f/signature.py — Signature dataclass + three
  predicate helpers (WindowSpec / IntSpec / OptionToken) encoding the
  p0f v2 DSL's wildcard / modulo / MSS-multiple / MTU-multiple
  semantics. Scoring is our extension on top of upstream p0f's
  first-match-wins policy: each signature carries a precomputed
  specificity in [0, 1] so the factory can pick the most-specific
  match when multiple signatures fire against one observation.

- decnet/prober/osfp/p0f/format.py — .fp line parser. Every shipped
  field variant from the DSL spec at the top of p0f.fp is covered
  (Snn / Tnn / %nnn / * for window; T0 vs T; -/@/* os-genre prefixes;
  quirks as concatenated single-letter flags; '.' sentinels for
  no-options / no-quirks). Malformed lines log a warning and skip
  instead of aborting the whole file — 1 bad row must not cost the
  other 374.

20 parser tests + 14 scoring tests. Full vendored-DB smoke tests
confirm all 375 signatures parse round-trip (262 SYN + 61 SYN-ACK +
46 RST + 6 stray) and every computed specificity lands in [0, 1].
2026-04-24 11:47:54 -04:00
9232031ec7 feat(db): extend SessionProfile schema with DEBT-036 keystroke features
Adds the three signal columns motivated by the manual keystroke
analysis in DEBT-036 directly to the SessionProfile table. Pre-v1 so
we modify the schema in place — Alembic arrives at v1.

Columns:
- kd_top_bigrams (TEXT) — JSON of top-N most-common digraphs with
  mean IAT per bigram. Complements kd_digraph_simhash ("same typist?")
  with "same typist in same mental state?" (tired / rested / distracted
  shifts bigram-specific IATs measurably).
- kd_start_of_action_latency (REAL/DOUBLE) — median IAT of the first
  keystroke after an idle gap > 1s. Separates "initiating a command"
  from "executing a remembered one"; real humans have measurable
  start-of-action latency, bots don't.
- kd_pause_hist_burst / _think / _distracted (INT) — three-bucket
  histogram (counts, <0.2s / 0.2-1.5s / >1.5s). More discriminating
  than the existing flat burst_ratio / think_ratio pair: C2 operators
  concentrate in burst with a thin tail; opportunistic humans have a
  fat think bucket and a long distracted tail.

Both backends get an idempotent ADD COLUMN migration
(_migrate_session_profile_table) wired into initialize() alongside
the existing _migrate_attackers_table path — guards on PRAGMA
table_info (SQLite) / information_schema.COLUMNS (MySQL) so reruns
are safe.

PII discipline comment on kd_digraph_simhash and kd_top_bigrams:
both operate on bigram CHARACTERS, never on raw input stream content.
Attacker passwords typed over SSH must not land here.

Test updated for the MySQL initialize() migration-order contract.
2026-04-24 10:45:48 -04:00
323077b383 fix(web/transcripts): fall back to shard-scan when Log row has no shard_path
sessrec.c emits the session_recorded SD blob with sid/service/src_ip/
duration_s/bytes/truncated — it never emitted shard_path. The web
handler still asked for fields.shard_path, got "", tripped the
sessions-YYYY-MM-DD.jsonl basename regex and returned
400 "invalid shard name" for every legitimate transcript request.

Handler now:
- Fast-paths when fields.shard_path IS present and validates
  (for any future emitter or ingester that backfills it).
- Otherwise enumerates sessions-YYYY-MM-DD.jsonl shards under
  ARTIFACTS_ROOT/{decky}/{service}/transcripts/ (newest first) and
  returns the first one whose per-sid index contains our sid.
- Security invariant preserved: only files whose basename matches the
  _SHARD_BASENAME_RE are ever opened, and they always resolve inside
  ARTIFACTS_ROOT. A forged fields.shard_path is silently ignored.
- Soft-fails OSError/PermissionError on the transcripts dir (decky
  containers often write it with a uid the API can't read) — returns
  404 instead of a 500 traceback.

test_forged_shard_path_blocked updated to match the new semantics:
forgery is ignored, the real shard is served via fallback. The
invariant (no /etc/passwd access) is still asserted by the fact
that status is 200 with data from the test shard.
2026-04-24 01:18:40 -04:00
e4ccf30133 fix(init): template the polkit rule on --group too
polkit rule 50-decnet-workers.rules hardcoded isInGroup("decnet"),
so when 'decnet init --group anti' installed systemd units as
User=anti / Group=anti, the API (running as anti) could no longer
systemctl start/stop decnet-*.service — polkit fell back to
'interactive authentication required', which in a daemon context is
a hard fail:

  START FAILED · COLLECTOR — Failed to start decnet-collector.service:
  Access denied as the requested operation requires interactive
  authentication.

Rename the rule to .j2, parameterise the group on {{ group }}, and
route _install_polkit through _render_template /
_write_rendered_if_changed. Now the polkit rule matches whatever
group was passed to 'decnet init'.

Test fixture updated to seed the .j2 variant.
2026-04-24 01:07:16 -04:00
edc8297af3 fix(init): gate userdel/groupdel on --purge to avoid nuking the operator
Every plain `decnet deinit` ran userdel + groupdel unconditionally. In
dev the operator may pass `--user $USER --group $USER` to avoid file
ownership churn against a source checkout — at which point deinit
would cheerfully delete their own login account.

Move user/group removal behind --purge, matching the existing
behaviour for /var/lib/decnet + /var/log/decnet. Help text updated:
--purge now clearly advertises that it also wipes the service
user/group, with an explicit warning to only run it when `decnet init`
created the account in the first place.

Test updated: plain --deinit must NOT invoke userdel/groupdel;
--deinit --purge must.
2026-04-24 00:38:51 -04:00
d61e143b71 fix(stress): unblock Locust runs from login rate-limit self-DoS
Locust spawns N virtual users (default 1000), all from 127.0.0.1 as
admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so
the 11th on_start() got 429 and a RuntimeError. A @task(2) login in
the task weights turned the whole run into a 429 factory even after
ramp-up. And _login_with_retry treated 429 as non-retryable, so there
was no graceful degradation path.

Three changes, one root cause:

- decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true).
  When false, slowapi's Limiter(enabled=False) makes @limiter.limit a
  no-op. Default ships unchanged; nobody should ever release with this
  off.
- tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the
  uvicorn subprocess env. Stress tests measure throughput, not rate
  limiting.
- tests/stress/locustfile.py: drop the @task(2) login — it added zero
  coverage (every user already logs in at on_start) and only generated
  contention. Teach _login_with_retry to honour 429 + Retry-After so a
  Locust pointed at a limiter-enabled server degrades gracefully
  instead of crashing on_start.
2026-04-24 00:13:15 -04:00
ae92948e22 test(live): align mqtt/postgres/mysql live tests with honeypot + loop realities
Three unrelated test-correctness fixes exposed by running tests/live:

- test_mqtt_live: honeypot defaults to auth-required (post-2018
  realistic broker). Anonymous CONNECT is rejected with CONNACK rc=5,
  which the "accept" / "subscribe" tests misread as a failure. Pass
  MQTT_ACCEPT_ALL=1 via a new env= override on the live_service factory
  so only those two tests opt into accept-all.
- test_postgres_live::test_auth_hash_logged: connected with
  dbname='prod', which isn't in the honeypot's per-instance DB list, so
  Postgres (correctly) rejected at startup before asking for a
  password — blowing past the auth event the test asserts on. Target
  'postgres' (always in _BASE_DBS) to reach the auth stage.
- test_mysql_backend_live: the module-scoped mysql_test_db_url fixture
  is bound to the module loop, but function-scoped tests default to
  their own per-function loops. Any reuse of the asyncmy pool then
  tripped "Future attached to a different loop". Pin the whole module
  with pytest.mark.asyncio(loop_scope='module').
2026-04-23 22:06:55 -04:00
ea95a009df refactor(tests): move flat tests/*.py into per-subsystem subfolders
Groups every flat test_*.py under the module it exercises, matching the
existing tests/{profiler,sniffer,prober,collector,correlation,cli,web,
topology,swarm,bus,updater,api,docker,geoip,...} layout. New folders:
services/, fleet/, config/, logging/, db/ (+ db/mysql/), telemetry/,
mutator/, core/.

Path-dependent __file__ references bumped an extra .parent in three
files that moved one level deeper:
- tests/sniffer/test_sniffer_ja3.py   (template path)
- tests/services/test_ssh_capture_emit.py (template path)
- tests/cli/test_mode_gating.py  (REPO root)
- tests/web/test_env_lazy_jwt.py (repo var)

Also drops two SQLite runtime artifacts (test_decnet.db-{shm,wal}) that
were leaking into the repo from a previous test run.

Fixes two test_service_isolation cases that patched asyncio.sleep (no
longer on the profiler main-loop hot path — same pre-existing bug I
fixed earlier in test_attacker_worker.py) by patching asyncio.wait_for
and passing interval=0.
2026-04-23 21:34:25 -04:00
1854f9de28 fix(tests): profiler worker tests patched asyncio.sleep, but main loop uses wait_for
Since the event-driven shutdown refactor (0fbb07c), the profiler main
loop is asyncio.wait_for(shutdown.wait(), timeout=interval) — no sleep
on the hot path. The four worker tests that patched asyncio.sleep to
raise CancelledError on the Nth call were silently no-op'ing and
hanging on the real 30 s wait_for timeout.

Replace the sleep patches with a shared _cancel_after helper that
patches wait_for itself. Pass interval=0 so the loop ticks without
delay between iterations.
2026-04-23 21:14:45 -04:00
ffc275f051 feat(geoip): country-code enrichment via RIR delegated-stats
Populates Attacker.country_code + country_source (MVP) using the five
RIR delegated-stats files (ARIN/RIPE/APNIC/LACNIC/AFRINIC). Offline,
license-free, no outbound traffic that could burn honeypot stealth.

- decnet.geoip package with factory/base/lookup + rir/ subpackage
  (fetch/parse/provider) mirroring the db + bus factory convention
- Profiler._build_record calls enrich_ip on every upsert
- Idempotent ALTER TABLE migrations for both SQLite and MySQL
- decnet geoip refresh/lookup CLI (master-only)
- /var/lib/decnet/geoip seeded by decnet init
- DECNET_GEOIP_ENABLED=false kill-switch; set in tests/conftest.py so
  unit tests never trigger the first-access fetch
2026-04-23 21:12:38 -04:00
07bf3dc8cb feat(config): promote /etc/decnet/decnet.ini to real config with domain sections
The config file `decnet init` dropped at /etc/decnet/config.ini was a
stub with a single [decnet] header saying 'reserved for future structured
settings.' Admins who wanted to tune DECNET_API_HOST, DECNET_DB_URL,
DECNET_BATCH_SIZE, etc. had to hunt env.py for the exact variable name
and drop it in .env.local.

Changes:
- decnet/config_ini.py — adds a _DOMAIN_MAP translation table covering
  [api], [web], [database], [bus], [swarm], [logging], [ingester],
  [tracing]. Loads regardless of mode; unknown keys inside a known
  section log a WARNING (operator typos shouldn't be silent).
  Explicit key map (not auto kebab-to-snake) so [web] admin-user lands
  in DECNET_ADMIN_USER without silently renaming the env-var contract
  consumers import from decnet.env.
- decnet/cli/init.py — renames the placeholder target config.ini →
  decnet.ini (unifies with the name already used by load_ini_config and
  the enroll bundle's _render_decnet_ini). Placeholder body now shows
  every domain section as a commented example so admins learn the
  shape by reading. Deinit removes both decnet.ini and the legacy
  config.ini so upgrading hosts leave no orphan file.

Precedence is unchanged: real env > INI > built-in default in env.py.
os.environ.setdefault means systemd EnvironmentFile= and one-off
DECNET_FOO=bar decnet ... invocations always win.

Secrets explicitly NOT moved to the INI:
 - DECNET_JWT_SECRET
 - DECNET_ADMIN_PASSWORD
 - DECNET_DB_PASSWORD
They stay in .env.local / EnvironmentFile= — never in a group-readable
INI, never in a diff, never on the dashboard.

Dev/profiling flags (DECNET_DEVELOPER, DECNET_EMBED_*, DECNET_PROFILE_*)
also stay env-only per maintainer direction — dev knobs shouldn't
be one 'I'll flip this for tonight' away.

Tests: +5 in test_config_ini.py (domain sections load regardless of mode,
env beats INI for domain keys, unknown key warns, absent section is
no-op, role section beats domain section via setdefault precedence). +1
in test_init.py (placeholder writes decnet.ini with every section
header present as commented guidance).

31 tests pass across the two files (was 26).
2026-04-23 18:21:00 -04:00
1753eca198 feat(deploy): templatize systemd services on install_dir via Jinja2
Distros reserve /opt for different things (some package managers own it
outright), and a DECNET install that wants to live at /srv/decnet or
/usr/local/decnet had to hand-edit 13 service files post-install.

Converts every deploy/decnet-*.service to a .j2 template keyed on
{{ install_dir }}, rendered by `decnet init` at install time. All other
paths (log_dir, state_dir, runtime_dir, user, group) stay standard —
only install_dir varies.

Changes:
- deploy/decnet-*.service → deploy/decnet-*.service.j2 (13 files).
- decnet init gains --install-dir (default /opt/decnet, preserves
  existing behaviour byte-for-byte). Validates absolute-path at the
  CLI boundary. Threads through useradd --home-dir and the dir-creation
  list so the filesystem layout matches the rendered templates.
- _install_units renders via Jinja2 with StrictUndefined (typo → loud
  error, not a silent broken unit). SHA over rendered output so
  operators with a custom install_dir get idempotent re-runs.
- decnet.target, tmpfiles.d, polkit rule stay static — they don't
  reference install paths.
- 4 new tests: custom install_dir renders into units, default remains
  /opt/decnet, relative paths rejected, second run with same custom
  dir is idempotent.
2026-04-23 18:08:26 -04:00
4418608a54 fix(bus): silently drop publishes on closed bus instead of raising
Worker bus instances (collector, ingester) close their private buses
in finally blocks on shutdown, but stream threads holding closure
references kept calling publish after close — one `RuntimeError:
publish on closed bus` per stream line, caught by publish_safely
and logged per call, flooding server logs.

Changes:
- `UnixSocketBus.publish()` now drops post-close calls. First drop
  WARNs loudly (bus is critical infra — silent drops would hide real
  problems); subsequent drops on the same instance log at DEBUG to
  prevent the flood. Sticky `_closed_publish_warned` flag, reset
  naturally per new bus instance.
- `make_thread_safe_publisher` short-circuits on a closed bus before
  marshalling a coroutine onto the loop. Avoids the wasted scheduling
  work in the hot shutdown path.

Degradation is safe: callers go through `publish_safely`, which
already treats exceptions as 'dropped notification, DB is source of
truth.' We just stop manufacturing the exception in the first place
for a known-benign condition.
2026-04-23 18:00:47 -04:00
eb2308d9e1 fix(bus): retry app-bus connect with backoff instead of one-shot veto
A startup race between `decnet bus` being ready and the API's lifespan
hitting `get_app_bus()` at api.py:135 would set `_tried = True`
permanently, poisoning the singleton for the rest of the process: the
dashboard shows BUS OFFLINE, topology SSE falls into the bus-is-None
snapshot-only branch, mutator publish calls no-op. Only an API
restart recovered.

Replaces the one-shot veto with a time-gated retry keyed on a
`_last_failure_ts` monotonic timestamp plus a 2 s backoff. Publishers
on the hot path still pay at most one connect attempt every 2 s when
the bus is down, but the singleton auto-recovers within 5 s (one
dashboard poll) once the bus comes up.

The asyncio lock still serialises concurrent callers so the bus server
doesn't get stampeded with parallel connect attempts on startup.
2026-04-23 17:59:17 -04:00
ef4179ea1f feat(api): opaque 500 handler + error_id correlation for unhandled exceptions
Registers a generic @app.exception_handler(Exception) that catches anything
uncaught in route handlers / dependencies. Prod response is opaque:
{detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode
(DECNET_DEVELOPER=True) adds exception_type and traceback fields so
failures are debuggable without tailing server logs.

The error_id is logged alongside the full traceback server-side, letting
operators correlate a user's 500 report with the exact exception via
`grep <error_id> /var/log/decnet.log`.

FastAPI's own HTTPException routing and the existing
RequestValidationError / ValidationError / RateLimitExceeded handlers
still take precedence — this handler only fires on genuinely-uncaught
exceptions.

Flips threat model F1/I 'traceback / stack trace leakage' from ? to M
and logs a follow-up checklist entry for 4 detail=str(e) sites in the
fleet deploy router (admin-gated, different threat class, separate
audit).
2026-04-23 14:07:32 -04:00
2f4f81e5de feat(api): rate-limit /auth/login + scaffold threat model
Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per
5 minutes per-IP AND per-username, tripping either → 429. Per-IP
catches botnets hitting one account; per-username catches distributed
credential stuffing against one account. In-memory storage: dashboard
API is single-process, Redis is disproportionate for v1.

X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy
deployments get one shared bucket per proxy IP. Logged in the threat
model as accepted risk DA-08, to be revisited when a verified-proxy
config lands.

Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element
methodology, system-context DFD, and Dashboard↔API as the first fully
worked component (7 sub-flows, ~50 threat entries). F1 Authn ships
with 3 threats mitigated: rate limit (new), uniform 401 (verified
already in place), bcrypt length clamp (verified already in place via
Pydantic max_length=72).
2026-04-23 13:25:28 -04:00
8cbb7834ef feat(web): SMTP victim-domain + stored-mail panels on attacker detail
Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail
(admin) endpoints, plus two new sections on the attacker detail page:
VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL
with a drawer that decodes headers, lists attachments, and downloads the raw
.eml via the existing artifact endpoint (?service=smtp).
2026-04-22 22:33:53 -04:00
d43303251d feat(profiler): track SMTP victim domains per attacker
New SmtpTarget table records each (attacker, domain) pair observed via
the SMTP honeypots. Only the domain is stored — local-parts are dropped
at ingestion, so this table holds no user-identifying data beyond the
target organisation's identity.

The profiler worker extracts domains from rcpt_to / rcpt_denied /
message_accepted events, normalizes them (lowercase, strip local-part,
drop blocked TLDs), and upserts one row per pair with a running count +
first_seen / last_seen.

Three repo methods shipped:
  * increment_smtp_target(attacker, domain) — upsert + bump
  * list_smtp_targets(attacker) — per-attacker view
  * smtp_target_seen(domain) — cross-attacker aggregate, shaped as the
    federation-gossip RPC that V2 will expose.

The gossip-query shape is load-bearing: each operator can answer
"have any of your attackers targeted corp1.com?" without leaking
which attackers or when — the aggregate returns a bool + total count
+ first/last seen, nothing else.
2026-04-22 22:23:27 -04:00
c50448995b feat(smtp): capture full messages + attachments to disk
SMTP template now writes each accepted DATA body as a .eml file into a
bind-mounted per-decky quarantine dir and emits a `message_stored` log
with sha256, size, decoded headers, and an attachment manifest
(filename + sha256 + size + content-type). Attachment hashing uses the
*decoded* payload so operators can match against VT / MalwareBazaar
directly. Body accumulator is capped at SMTP_MAX_BODY_BYTES (default
10 MB, matching the EHLO SIZE advert) so a streaming client can't OOM
the container.

The existing /api/v1/artifacts/{decky}/{stored_as} endpoint now takes
an optional ?service= query param (defaults to ssh for back-compat)
and can serve .eml files out of the smtp subdir. Forensic metadata
rides the normal log pipeline, same as SSH file_captured.
2026-04-22 22:17:50 -04:00
119b4e8724 feat(db): add session_profile table for keystroke-dynamics fingerprints
New purpose-built table with schema_version column committed from day one
so V2 federation gossip can cluster sessions across operators without
retrofitting. Ships with the empty write path (upsert_session_profile);
ingestion of keystroke features (IKI moments, control-char rates, digraph
SimHash) is tracked as V2 work.

Closes gap #2 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:39:17 -04:00
d3321324eb feat(sniffer): capture SSH client banner from TCP stream
Parse RFC 4253 §4.2 identification strings from the first attacker→decky
data segment on TCP/22; emit ssh_client_banner syslog events and bus
fan-out. Profiler's sniffer_rollup dedupes observed banners into a new
AttackerBehavior.ssh_client_banners JSON column.

Closes gap #3 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:37:01 -04:00
8181f39ae2 feat(profiler): persist raw SSH KEX algorithm ordering
Prober already emits kex_algorithms in hassh_fingerprint syslog events, but
the raw ordered list was only queryable via the generic bounty store. Add a
dedicated AttackerBehavior.kex_order_raw column (TEXT, JSON list) so
post-v1 KEX-order fingerprinting has a typed, indexable home.

Pipeline:
  - sniffer_rollup() now consumes hassh_fingerprint events and collects
    distinct kex_algorithms strings across ports.
  - build_behavior_record() JSON-encodes the list (NULL when empty).
  - sqlmodel_repo._deserialize_behavior() parses it back into a list.

Closes pre-v1 gap #1 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:29:46 -04:00
5704e8fcce fix(topology): delete topology_mutations in delete-cascade
delete_topology_cascade manually deletes status_events, edges, deckies
and lans but overlooked topology_mutations, so deleting any topology
that ever had a mutation enqueued (i.e. edits while active|degraded)
failed with an FK IntegrityError. Add the missing DELETE and extend
the cascade test to seed a mutation row.
2026-04-22 17:50:30 -04:00
91111ea7ee feat(cli): add decnet init --deinit to undo a previous bootstrap
Reverse of init, step-by-step: systemctl disable --now decnet.target,
remove every decnet-*.service + decnet.target unit file, drop the
polkit rule, drop the tmpfiles.d entry, daemon-reload, remove
/etc/decnet + /etc/decnet/config.ini, /run/decnet, /opt/decnet, and
userdel/groupdel the decnet identity.

Preserves /var/lib/decnet and /var/log/decnet by default — those
hold operator data. Pass `--deinit --purge` to rm -rf them too.
Idempotent on a clean host (every step prints [SKIP]). Honours
--dry-run.

5 new tests cover the full-undo path, --purge, idempotent clean-host
deinit, dry-run side-effect-free behaviour, and the --purge without
--deinit guard.
2026-04-22 14:31:56 -04:00
3dae44c652 feat(cli): add decnet init one-shot master-host bootstrap
Creates the decnet system user/group, installs every unit file from
deploy/ into /etc/systemd/system, drops the polkit rule, seeds
/opt/decnet + /var/{lib,log}/decnet + /etc/decnet + /run/decnet,
writes a placeholder /etc/decnet/config.ini, applies the new
tmpfiles.d entry so /run/decnet survives reboots, daemon-reloads,
and `systemctl enable --now decnet.target`.

Idempotent (re-runs print [SKIP] on already-configured items),
--dry-run previews the plan without touching anything, --no-start
defers the target start, --force overwrites even matching unit
files. Master-only (added to MASTER_ONLY_COMMANDS).

9 orchestration tests cover the non-root gate, dry-run, useradd/
groupadd argv, SKIP on present user/group, unit-file idempotency,
--force overwrite, --no-start suppression, happy path, and the
"deploy/ not found" error message.
2026-04-22 14:28:11 -04:00
13ea916943 feat(workers): add start + start-all endpoints (systemd supervisor)
POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown
worker, 503 if the unit file is not installed, 502 if systemctl
returns non-zero (stderr snippet in detail, full stack logged).
Admin only.

POST /api/v1/workers/start-all — best-effort: walks the worker list
in dependency order (bus → api → data-plane), skips already-active
and uninstalled units, aggregates outcomes into
{started, already_running, failed[]}. Returns 200 even on partial
failure; the caller reads the three lists.

Both endpoints delegate to the systemd_control helper, so the attack
surface for "what gets executed" is locked to `decnet-<validated-name>
.service` at two layers (router KNOWN_WORKERS + helper regex).
2026-04-22 14:12:29 -04:00
0fbb07c2ec feat(workers): bus-backed Workers panel (registry, control, installed flag)
Ships the backend half of Config → Workers:

* Worker registry aggregates `system.*.health` + `system.bus.health`
  heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop
  out of a 90s window (3× the 30s heartbeat interval).
* `GET /api/v1/workers` returns the snapshot plus `bus_connected`
  (so the UI can explain "all UNKNOWN" when the bus socket is down)
  and a per-row `installed` flag populated from
  `systemctl list-unit-files decnet-*.service` (cached 30s).
* `POST /api/v1/workers/{name}/stop` publishes a stop intent on
  `system.<name>.control`; workers listen via the shared control
  listener in `bus/publish.py`.
* Heartbeat + control listener wired into collector / profiler /
  sniffer / prober / mutator worker loops. API self-heartbeats too
  so the panel always has one ground-truth row.
* Topic helper `system_control(name)` + tests covering builder
  validation, control listener shutdown path, and the API surface
  (auth gating, bus-connected field, unknown-name 404).

Adds `StartFailure` / `StartAllResponse` models in anticipation of
the upcoming start endpoints (DEBT-034).
2026-04-22 14:10:39 -04:00
fcaac648a4 feat(web): add systemd_control helper for worker unit management
Thin async wrapper over `systemctl` — never shell=True, always
create_subprocess_exec. Unit names are built from
`decnet-<validated-name>.service`; the regex check is defence in depth
on top of the router-level KNOWN_WORKERS validation.

Exposes start / stop / is_active / list_installed; last is cached for
30s to keep the Workers panel cheap under REFRESH spam. On non-systemd
hosts list_installed returns an empty set, so the UI renders with
every row marked not-installed instead of 500-ing.
2026-04-22 14:08:35 -04:00
a63708a3d1 test(templates): cover instance_seed helper and update service tests
Add tests/service_testing/test_instance_seed.py — pins NODE_NAME to assert
determinism of seeded functions and sweeps NODE_NAMEs to assert cross-fleet
divergence. Conftest gains load_real_instance_seed() so template tests see
the real seeding behavior instead of a stub. Existing template tests updated
to pin NODE_NAME and match seeded outputs.
2026-04-22 09:24:28 -04:00
6725197d58 test(web): transcripts API + attacker-transcripts router coverage
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
2026-04-21 23:11:40 -04:00
6e522c5a55 feat(web): transcripts API + repository lookups
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
2026-04-21 23:06:39 -04:00