DECNET

Author	SHA1	Message	Date
anti	590c2b0fac	feat(correlation): credential-reuse engine + reuse-correlate worker Adds CorrelationEngine.correlate_credential_reuse + the `decnet reuse-correlate` long-running worker. The worker mirrors the mutator's bus-wake + slow-tick pattern: wakes on credential.captured and attacker.observed for sub-second latency, falls back to a 60s poll if the bus is unavailable, and publishes credential.reuse.detected once per new or grown CredentialReuse row (group-deduped so a 5-cred reuse doesn't emit 5 partial events). The web ingester now publishes credential.captured after every successful Credential upsert; bus + new repo helper find_credential_reuse_candidates feed the engine pass.	2026-04-26 03:37:49 -04:00
anti	e696c2beb3	refactor(ingester): drop legacy cred adapter — DEBT-039 closed Phase 3/3 of DEBT-039. Now that all six cred-emitting services (SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP) emit the universal `secret_b64`-bearing SD shape, the ingester's legacy fork has no live emitters to handle. Deletes: - `_ingest_credential_legacy()` — synthesized native fields from username+password - The `elif _fields.get("username") and _fields.get("password")` branch in `_extract_bounty` - `_printable_filter()` — only the legacy adapter called it; the native branch trusts the emitter (encode_secret() in Python or sd_escape() in C) to have already sanitized - The legacy-adapter test cases in tests/web/test_ingester.py; their coverage moved to tests/services/test_cred_emitters.py per-service in Phase 2 The cred path is now single-shape end-to-end. A pre-migration log row carrying only username+password silently produces no Credential write — by design, since no current emitter writes that shape and keeping a code path alive for theoretical legacy data risks masking emitter regressions. Pre-v1: any historical Bounty cred rows from before commit `2f47f67` stay untouched. DEBT-039 marked resolved with summary of the three commits and the silent-loss bug fix for Redis + LDAP that fell out of execution.	2026-04-25 06:04:09 -04:00
anti	2f47f67eef	feat(creds): future-proof Credential storage model Replaces the opaque Bounty.bounty_type='credential' path with a dedicated `credentials` table whose schema is forward-compatible across every auth-bearing service in the fleet. Hoisted indexed columns (secret_sha256, principal, service, attacker_ip) carry the universal reuse-analytics signal; service-specific JSON keys ride in `fields`. Cross-service reuse queries become an indexed lookup on secret_sha256 instead of JSON_EXTRACT scans. Schema decisions baked in (per ANTI): - New `Credential` table, not extension to Bounty - Hoisted `principal` column for cross-service principal-reuse - Standardized JSON keys: every payload carries secret_b64 + secret_printable + principal universally; service-specific extras (user, domain, dn, mech, …) ride alongside The auth-helper SD-block emits the new shape natively. The ingester forks at _extract_bounty: - Native shape (SSH/Telnet, future emitters): secret_b64 present → direct upsert_credential - Legacy shape (FTP/POP3/IMAP/SMTP today): username + password → adapter synthesizes secret_{b64,sha256,printable} on the fly, upserts into the same Credential table. Tracked as DEBT-039; one-shot bridge until those service templates migrate. Defense-in-depth across five layers (input validation): - C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424 escape rules for \\, ", ]; b64 preserves exact bytes - Ingester native branch: rejects malformed secret_b64 (regex), drops the credential row but keeps the underlying Log - Ingester legacy adapter: same printable-ASCII filter as the C code; sha256 + b64 over the original utf-8 bytes (lossless, even when secret_printable is sanitized) - DB column caps with truncation warning; sha256 always over the full pre-truncation bytes so reuse queries match across truncation - JSON serialized with ensure_ascii=True so utf8mb4 columns stay safe even with non-ASCII service-specific keys Bounty.bounty_type='credential' is no longer written. Pre-v1: no historical backfill; existing rows stay untouched but unused. 595 tests pass; new tests cover the model + repo (upsert dedup, null-principal independence, cross-service reuse, filters), both ingester branches, b64 validation, sanitization preserving the fingerprinting signal in b64.	2026-04-25 05:29:26 -04:00
anti	c78ab032bd	fix(xff): truncate LEAKED IPs + ROTATION badge for rotation attacks `for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...` was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row, drowning the rest of the ORIGIN section. The 100-IP wall is itself a signal (WAF-bypass-list probing) that deserves a short badge, not a flood. Backend: - get_attacker_ip_leaks gains `limit: int = 10` parameter — caller only ever needs a sample, not the full set. - New count_attacker_ip_leaks() returns the unbounded COUNT(*) via one cheap SQL aggregate. - Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N} so the UI can render a rotation badge independent of list length. UI: - New LeakedIPsRow component. First 5 distinct IPs rendered inline with hover tooltips (unchanged). When > 5, a `+ N more` expand button reveals the rest of the sample; when total exceeds the 10-row cap, a subtle `(+M beyond sample)` note appears. - When total ≥ 20, a red `ROTATION · N` tag renders leading the row with a tooltip explaining the semantic: "almost certainly XFF-rotation / WAF-bypass probing, not a real attribution leak." DB churn is deliberately not capped — 100k rows × ~500 B is tolerable. If it becomes a problem we can add an ingester-side count-and-skip; for now the UX fix is the whole story. Added test_ip_leaks_total_reported_separately_from_list asserting the endpoint shape matches what the UI consumes.	2026-04-24 18:25:46 -04:00
anti	ca39552692	feat(ua): classify User-Agent into scanner/cli/library/bot/nonstandard Every http_useragent bounty now carries a `category` label plus an optional tool name and a signals list. The main analytic win is the `nonstandard` bucket — UAs like "FUCKYOU/1.0" or custom one-off scanner labels that don't match any known pattern, which today silently blend into the generic fingerprint list. Buckets (priority order): - scanner: nmap, nuclei, sqlmap, gobuster, nikto, masscan, zgrab, ffuf, wpscan, katana, burp, acunetix, nessus, openvas, arachni, whatweb, wappalyzer, etc. - cli: curl, wget, httpie, xh, fetch. - library: python-requests, aiohttp, httpx, urllib, Go stdlib, Java, okhttp, Apache HttpClient, axios, node-fetch, got, undici, PHP, Guzzle, Ruby stdlib, Faraday, .NET, PostmanRuntime, Insomnia, etc. - bot: anything containing bot / crawler / spider / slurp / monitor (catches Googlebot, bingbot, Baiduspider — many of which ship a Mozilla/5.0 prefix, so the bot check runs BEFORE the browser regex). - browser: Mozilla/5.0-prefixed UAs that aren't bots. - nonstandard: anything else. The interesting bucket. - empty: literal empty User-Agent header. Side signals computed regardless of category: suspicious_short (<8 chars), suspicious_long (>512 chars), nonprintable (control chars), injection_like (SQLi / XSS / path-traversal / Log4Shell markers). A sqlmap UA with a literal SQL-injection payload embedded fires category=scanner + injection_like — the combination tells the analyst the tool is being operated manually vs. on default config. Classification is deterministic (same UA string → same tuple) so add_bounty's payload-hash dedup continues to collapse repeat rows. UI renderer upgraded from FpGeneric to a dedicated FpUserAgent that colours the category tag by risk (scanner=alert-red, nonstandard=warn-yellow, browser=accent-green, etc.) and renders each signal as its own chip. Makes the interesting rows pop in the fingerprints panel. Also fixed: the ingester was using `_headers.get("User-Agent") or _headers.get("user-agent")`, which short-circuits away empty-string UAs. An explicit empty UA is itself a signal (real clients always send something) — now captured.	2026-04-24 18:17:18 -04:00
anti	6d1d69443a	fix(xff): split leak from spoof — loopback/private claims aren't leaks An attacker hitting /admin with `X-Forwarded-For: 127.0.0.1` was previously flagged as an IP leak. It isn't — that's the classic IP-allowlist / WAF-bypass payload ("treat me as localhost and skip your auth checks"). Misclassifying it as "LEAKED IPs" in the UI confuses analysts and burns trust in the signal. Split by claim category. After pulling the left-most claimed IP from the proxy header, classify: - public (routable) → bounty_type=ip_leak (real attribution leak; the attacker's upstream proxy forwarded their real IP). - loopback / private / link-local / multicast / reserved / unspecified → bounty_type=fingerprint, fingerprint_type= spoofed_source (WAF-bypass / allowlist-probing attempt; the attacker is telling us they know what XFF does). - unparseable → dropped. Same extraction pipeline; diverges only at the last step. A new shared _classify_proxy_header_claim returns (kind, payload); _detect_ip_leak keeps its public-only contract for backward- compat; _detect_spoofed_source is the new sibling. UI renderer FpSpoofedSource shows the claimed IP in warn color with the claim_category tag (LOOPBACK / PRIVATE / ...) and a WAF-BYPASS ATTEMPT badge — distinct visual from the "LEAKED IPs" row which stays reserved for genuine public-IP leaks. Test addresses updated: RFC 5737 doc ranges (198.51.100.0/24, 203.0.113.0/24) are flagged `is_reserved` in Python's ipaddress module, so they now correctly belong to the spoof bucket — tests that meant to exercise real public IPs now use 8.8.8.8 / 1.1.1.1 / Cloudflare DNS. Added eleven new tests locking the classifier + the two detectors' mutual exclusion.	2026-04-24 18:06:29 -04:00
anti	2c876b4d86	fix(bounties): strip per-request fields from fingerprint payloads add_bounty dedups on (attacker_ip, bounty_type, full payload JSON). Three fingerprint-family bounties (http_useragent, ip_leak, http_quirks) were including method/path / header_count in their payloads — fields that vary per request — so a scanner hitting 100 paths produced 100 rows instead of 1, which is what was swelling AttackerDetail. Payloads now carry identity-only fields: - http_useragent: {fingerprint_type, value}. UA + path combinations no longer collide; one row per distinct User-Agent string. - ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}. One row per distinct (proxy source, leaked IP, leaking header) triple; repeat hits with the same header on different paths dedup. - http_quirks: {fingerprint_type, order_hash, order, casing_hash, casing_category, stable_count, tool_guess}. No more header_count (included volatile headers; Cookie-presence variance broke dedup). Per-request context (path, method, etc.) was never load-bearing for analysts — the logs table already answers "when + where" at per-event resolution. The bounty table is for stable identity. UI: - FpHttpQuirks renderer drops the method/path footer line and the header_count/duplicates tags; shows stable_count instead. - LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for "Leaked via X; source 203.0.113.42" — same information, stable. Tests add a "payload stable across paths and methods" assertion on http_quirks — locks the contract so a future regression that sneaks a per-request field back in fails loudly. Existing duplicate bounty rows don't retroactively collapse. Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type, payload) — trivial but not automated.	2026-04-24 17:58:54 -04:00
anti	dccb410bb3	feat(http): header-quirks fingerprint — order + casing + tool guess Per-request HTTP fingerprint derived from the header dict we already log. Captures: - order_hash: SHA-256 prefix (16 hex) over the lowercased header-name sequence, minus volatile/per-request headers (Content-Length, Cookie, Authorization, XFF family, trace IDs). Stable identity for a given client stack regardless of which target / path is hit. - casing_hash: same shape but over the per-header casing category (Title-Case / lower / UPPER / mixed). Attackers frequently spoof User-Agent but forget their stack sends `user-agent` while browsers send `User-Agent`. - tool_guess: prefix match against curl / python-requests / Go-http-client / nmap-nse signatures. Cheap, best-effort — the hash is the hard signal. - duplicates: reserved for when the HTTP template switches from dict(request.headers) to a list form; today it always fires empty because dict() collapses duplicates. Payload is a fingerprint bounty (bounty_type="fingerprint", fingerprint_type="http_quirks"). Bounty dedup collapses identical hashes per attacker — one row per distinct fingerprint — so a chatty scanner doesn't spam the vault, but a tool-chain change from the same IP surfaces as a new row. UI renderer (FpHttpQuirks) shows the two hashes, tool guess badge in violet, casing/count tags, and a collapsible header-order list. Added to the passiveTypes group so it nests with JA3/JA4L/etc. in the AttackerDetail fingerprints panel. One library note: the naive "title-case" classifier failed on tokens like `X-Forwarded-For` because Python's "".islower() returns False so `p[1:].islower()` rejects single-letter tokens like the `X`. Fix: explicitly accept single-char tokens when uppercase.	2026-04-24 17:51:40 -04:00
anti	2a0c5ca410	feat(attackers): XFF mismatch detection — attacker IP leak bounties Attackers routinely front their scanners with VPNs/proxies, so the TCP source we log is the proxy egress, not the real host. But a surprising number of attacker setups are misconfigured: the proxy forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP / CDN-variant) header. From our side that's a free attribution leak. New _detect_ip_leak extractor in decnet/web/ingester.py fires at ingest time per HTTP request. Logic: 1. Require service=http, source_ip present, headers present. 2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or CIDRs) → legitimate reverse-proxy forwarding, skip. 3. Walk proxy-family headers in priority order: Forwarded (RFC 7239) → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP. 4. Extract the left-most parseable IP from the winning header. 5. If that IP differs from the TCP source → emit a bounty with bounty_type="ip_leak" carrying {source_ip, real_ip_claim, source_header, headers_seen, path, method}. Storage is the existing Bounty table — no schema change; de-dup is handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so repeat requests with the same leaked IP don't spam. AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN listing distinct real_ip_claim values; hover tooltip shows the source header + path of the most recent leak. Only shown when at least one ip_leak bounty exists. RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4, IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only IPs that actually parse. Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For mismatches". Phase 3 of the three-phase Attacker Intelligence series (phases 1: scanned-vs-interacted, 2: PTR records already shipped). DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's "revisit when verified-proxy config lands" note — same token set future rate-limit work will consume.	2026-04-24 17:39:03 -04:00
anti	351a8939c3	feat(attackers): scanned vs. interacted service bucketing on detail page Adds a new card on AttackerDetail: SCANNED · N services \| INTERACTED WITH · M services. Distinguishes port-scanners (N high, M=0) from actual engagement (M>0) at a glance — the analyst's first question when triaging a new attacker row. Classifier lives in decnet/correlation/event_kinds.py, a single source of truth for the event-type vocabulary: - INTERACTION_EVENT_TYPES — command-family (command/exec/query/...), SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload activity (file_captured/upload/download_attempt/retr), pub/sub (publish/subscribe), recorded TTY sessions. - NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/ unknown_*). - Everything else defaults to scan. Conservative by design: new template verbs show up as "scanned" until explicitly promoted. Bucket logic: a service is "interacted" if ≥1 of its events classifies as interaction; otherwise "scanned" if ≥1 scan event; noise-only services drop. Disjoint by construction. Deliberate no-schema path: compute on-the-fly in the detail endpoint via SELECT DISTINCT service, event_type FROM logs. Small result set (tens of pairs per attacker), cost is trivial vs. the existing behavior/commands queries. Trade-off: one more DB round-trip per detail view in exchange for zero ALTER TABLE migration pain and immediate classifier-change feedback loop. Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of interactions that carry executable text), with a comment pointing at the new canonical module. Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral Profiling — Services actively interacted with".	2026-04-24 17:12:20 -04:00
anti	ce6b4a4174	fix(web/api): scope DB-retry sleep so tests don't starve background tasks test_lifespan_db_retry patched decnet.web.api.asyncio.sleep to skip the DB-retry backoff. Problem: asyncio is a shared module — the patch leaks to every caller that looked up asyncio.sleep via `import asyncio`, including run_health_heartbeat's own sleep loop. That heartbeat task spawns inside the same lifespan; with its sleep mocked, the while-loop spins tight, starves cancellation, and leaves an orphan task that pytest-timeout eventually signals — surfacing as the 'Task exception was never retrieved' warnings the user saw when running the suite. Fix: give decnet.web.api a local binding `_retry_sleep = asyncio.sleep` for the DB-retry wait, and have the test patch that instead. Narrowly scoped, no impact on asyncio.sleep callers elsewhere. Test timing before: 12s with --timeout=10 (interrupted by signal). Test timing after: 0.58s. Full tests/web slice: 27s → 7.1s with the spurious warnings gone.	2026-04-24 17:11:44 -04:00
anti	ea95a009df	refactor(tests): move flat tests/.py into per-subsystem subfolders Groups every flat test_.py under the module it exercises, matching the existing tests/{profiler,sniffer,prober,collector,correlation,cli,web, topology,swarm,bus,updater,api,docker,geoip,...} layout. New folders: services/, fleet/, config/, logging/, db/ (+ db/mysql/), telemetry/, mutator/, core/. Path-dependent __file__ references bumped an extra .parent in three files that moved one level deeper: - tests/sniffer/test_sniffer_ja3.py (template path) - tests/services/test_ssh_capture_emit.py (template path) - tests/cli/test_mode_gating.py (REPO root) - tests/web/test_env_lazy_jwt.py (repo var) Also drops two SQLite runtime artifacts (test_decnet.db-{shm,wal}) that were leaking into the repo from a previous test run. Fixes two test_service_isolation cases that patched asyncio.sleep (no longer on the profiler main-loop hot path — same pre-existing bug I fixed earlier in test_attacker_worker.py) by patching asyncio.wait_for and passing interval=0.	2026-04-23 21:34:25 -04:00
anti	fcaac648a4	feat(web): add systemd_control helper for worker unit management Thin async wrapper over `systemctl` — never shell=True, always create_subprocess_exec. Unit names are built from `decnet-<validated-name>.service`; the regex check is defence in depth on top of the router-level KNOWN_WORKERS validation. Exposes start / stop / is_active / list_installed; last is cached for 30s to keep the Workers panel cheap under REFRESH spam. On non-systemd hosts list_installed returns an empty set, so the UI renders with every row marked not-installed instead of 500-ing.	2026-04-22 14:08:35 -04:00
anti	cbb394a160	feat(ingester): publish system.log per committed batch (DEBT-031 worker 6) Ingester connects the bus at startup, emits a batch-committed summary (component/flushed/position) after each successful _flush_batch. Zero- row flushes are suppressed so the topic stays meaningful. Complements the collector's per-line system.log publishes: collector signals ingress, ingester signals DB-persisted progress. Federation forwarder (worker 8) will subscribe to the batch-committed leaf to trigger its upstream push. Bus stays optional: publish_safely swallows failures, get_bus() can return None, DECNET_BUS_ENABLED=false leaves the ingestion loop fully functional.	2026-04-21 16:58:49 -04:00

14 Commits