Commit Graph

8 Commits

Author SHA1 Message Date
2c876b4d86 fix(bounties): strip per-request fields from fingerprint payloads
add_bounty dedups on (attacker_ip, bounty_type, full payload JSON).
Three fingerprint-family bounties (http_useragent, ip_leak,
http_quirks) were including method/path / header_count in their
payloads — fields that vary per request — so a scanner hitting 100
paths produced 100 rows instead of 1, which is what was swelling
AttackerDetail.

Payloads now carry identity-only fields:

- http_useragent: {fingerprint_type, value}. UA + path combinations
  no longer collide; one row per distinct User-Agent string.
- ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}.
  One row per distinct (proxy source, leaked IP, leaking header)
  triple; repeat hits with the same header on different paths dedup.
- http_quirks: {fingerprint_type, order_hash, order, casing_hash,
  casing_category, stable_count, tool_guess}. No more header_count
  (included volatile headers; Cookie-presence variance broke dedup).

Per-request context (path, method, etc.) was never load-bearing for
analysts — the logs table already answers "when + where" at
per-event resolution. The bounty table is for stable identity.

UI:
- FpHttpQuirks renderer drops the method/path footer line and the
  header_count/duplicates tags; shows stable_count instead.
- LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for
  "Leaked via X; source 203.0.113.42" — same information, stable.

Tests add a "payload stable across paths and methods" assertion on
http_quirks — locks the contract so a future regression that sneaks
a per-request field back in fails loudly.

Existing duplicate bounty rows don't retroactively collapse.
Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and
restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type,
payload) — trivial but not automated.
2026-04-24 17:58:54 -04:00
dccb410bb3 feat(http): header-quirks fingerprint — order + casing + tool guess
Per-request HTTP fingerprint derived from the header dict we already
log. Captures:

- order_hash: SHA-256 prefix (16 hex) over the lowercased header-name
  sequence, minus volatile/per-request headers (Content-Length,
  Cookie, Authorization, XFF family, trace IDs). Stable identity for
  a given client stack regardless of which target / path is hit.
- casing_hash: same shape but over the per-header casing category
  (Title-Case / lower / UPPER / mixed). Attackers frequently spoof
  User-Agent but forget their stack sends `user-agent` while browsers
  send `User-Agent`.
- tool_guess: prefix match against curl / python-requests /
  Go-http-client / nmap-nse signatures. Cheap, best-effort — the
  hash is the hard signal.
- duplicates: reserved for when the HTTP template switches from
  dict(request.headers) to a list form; today it always fires empty
  because dict() collapses duplicates.

Payload is a fingerprint bounty (bounty_type="fingerprint",
fingerprint_type="http_quirks"). Bounty dedup collapses identical
hashes per attacker — one row per distinct fingerprint — so a chatty
scanner doesn't spam the vault, but a tool-chain change from the
same IP surfaces as a new row.

UI renderer (FpHttpQuirks) shows the two hashes, tool guess badge in
violet, casing/count tags, and a collapsible header-order list.
Added to the passiveTypes group so it nests with JA3/JA4L/etc. in
the AttackerDetail fingerprints panel.

One library note: the naive "title-case" classifier failed on tokens
like `X-Forwarded-For` because Python's "".islower() returns False
so `p[1:].islower()` rejects single-letter tokens like the `X`.
Fix: explicitly accept single-char tokens when uppercase.
2026-04-24 17:51:40 -04:00
2a0c5ca410 feat(attackers): XFF mismatch detection — attacker IP leak bounties
Attackers routinely front their scanners with VPNs/proxies, so the
TCP source we log is the proxy egress, not the real host. But a
surprising number of attacker setups are misconfigured: the proxy
forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP
/ CDN-variant) header. From our side that's a free attribution leak.

New _detect_ip_leak extractor in decnet/web/ingester.py fires at
ingest time per HTTP request. Logic:

1. Require service=http, source_ip present, headers present.
2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or
   CIDRs) → legitimate reverse-proxy forwarding, skip.
3. Walk proxy-family headers in priority order: Forwarded (RFC 7239)
   → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP.
4. Extract the left-most parseable IP from the winning header.
5. If that IP differs from the TCP source → emit a bounty with
   bounty_type="ip_leak" carrying {source_ip, real_ip_claim,
   source_header, headers_seen, path, method}.

Storage is the existing Bounty table — no schema change; de-dup is
handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so
repeat requests with the same leaked IP don't spam.

AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN
listing distinct real_ip_claim values; hover tooltip shows the source
header + path of the most recent leak. Only shown when at least one
ip_leak bounty exists.

RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4,
IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only
IPs that actually parse.

Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For
mismatches". Phase 3 of the three-phase Attacker Intelligence series
(phases 1: scanned-vs-interacted, 2: PTR records already shipped).

DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's
"revisit when verified-proxy config lands" note — same token set
future rate-limit work will consume.
2026-04-24 17:39:03 -04:00
351a8939c3 feat(attackers): scanned vs. interacted service bucketing on detail page
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.

Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:

- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
  SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
  activity (file_captured/upload/download_attempt/retr), pub/sub
  (publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
  unknown_*).
- Everything else defaults to scan. Conservative by design: new
  template verbs show up as "scanned" until explicitly promoted.

Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.

Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.

Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.

Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
2026-04-24 17:12:20 -04:00
ce6b4a4174 fix(web/api): scope DB-retry sleep so tests don't starve background tasks
test_lifespan_db_retry patched decnet.web.api.asyncio.sleep to skip the
DB-retry backoff. Problem: asyncio is a shared module — the patch leaks
to every caller that looked up asyncio.sleep via `import asyncio`,
including run_health_heartbeat's own sleep loop. That heartbeat task
spawns inside the same lifespan; with its sleep mocked, the while-loop
spins tight, starves cancellation, and leaves an orphan task that
pytest-timeout eventually signals — surfacing as the 'Task exception
was never retrieved' warnings the user saw when running the suite.

Fix: give decnet.web.api a local binding `_retry_sleep = asyncio.sleep`
for the DB-retry wait, and have the test patch that instead. Narrowly
scoped, no impact on asyncio.sleep callers elsewhere.

Test timing before: 12s with --timeout=10 (interrupted by signal).
Test timing after: 0.58s. Full tests/web slice: 27s → 7.1s with the
spurious warnings gone.
2026-04-24 17:11:44 -04:00
ea95a009df refactor(tests): move flat tests/*.py into per-subsystem subfolders
Groups every flat test_*.py under the module it exercises, matching the
existing tests/{profiler,sniffer,prober,collector,correlation,cli,web,
topology,swarm,bus,updater,api,docker,geoip,...} layout. New folders:
services/, fleet/, config/, logging/, db/ (+ db/mysql/), telemetry/,
mutator/, core/.

Path-dependent __file__ references bumped an extra .parent in three
files that moved one level deeper:
- tests/sniffer/test_sniffer_ja3.py   (template path)
- tests/services/test_ssh_capture_emit.py (template path)
- tests/cli/test_mode_gating.py  (REPO root)
- tests/web/test_env_lazy_jwt.py (repo var)

Also drops two SQLite runtime artifacts (test_decnet.db-{shm,wal}) that
were leaking into the repo from a previous test run.

Fixes two test_service_isolation cases that patched asyncio.sleep (no
longer on the profiler main-loop hot path — same pre-existing bug I
fixed earlier in test_attacker_worker.py) by patching asyncio.wait_for
and passing interval=0.
2026-04-23 21:34:25 -04:00
fcaac648a4 feat(web): add systemd_control helper for worker unit management
Thin async wrapper over `systemctl` — never shell=True, always
create_subprocess_exec. Unit names are built from
`decnet-<validated-name>.service`; the regex check is defence in depth
on top of the router-level KNOWN_WORKERS validation.

Exposes start / stop / is_active / list_installed; last is cached for
30s to keep the Workers panel cheap under REFRESH spam. On non-systemd
hosts list_installed returns an empty set, so the UI renders with
every row marked not-installed instead of 500-ing.
2026-04-22 14:08:35 -04:00
cbb394a160 feat(ingester): publish system.log per committed batch (DEBT-031 worker 6)
Ingester connects the bus at startup, emits a batch-committed summary
(component/flushed/position) after each successful _flush_batch.  Zero-
row flushes are suppressed so the topic stays meaningful.

Complements the collector's per-line system.log publishes: collector
signals ingress, ingester signals DB-persisted progress.  Federation
forwarder (worker 8) will subscribe to the batch-committed leaf to
trigger its upstream push.

Bus stays optional: publish_safely swallows failures, get_bus() can
return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully
functional.
2026-04-21 16:58:49 -04:00