Commit Graph

9 Commits

Author SHA1 Message Date
afe02af5c2 feat(creds): NTLMSSP Type 3 parser + DEBT-040 for SMB/RDP/NLA framers
Ships the load-bearing primitive both Phase 5 (SMB) and Phase 7
(RDP NLA) need: a standalone NTLMSSP Type 3 (AUTHENTICATE_MESSAGE)
parser per MS-NLMP §2.2.1.3.

Surface:
  parse_type3(blob) -> dict | None
  find_ntlmssp(buf) -> int   # locate NTLMSSP\\0 inside SPNEGO outer

Returns the universal Credential SD shape:
  username + domain (decoded UTF-16-LE or ASCII per NEGOTIATE_UNICODE)
  principal = "DOMAIN\\\\username"
  secret_kind = "ntlmssp_v1" (24-byte fixed) or "ntlmssp_v2" (variable)
  secret_b64 = base64 of NtChallengeResponse — canonical hashcat input
               (-m 5500 v1, -m 5600 v2)

Bounds-checked for untrusted-input safety. Anonymous binds (empty NT
response) return None — no credential to record.

7 unit tests cover NTLMv1/v2 distinction, ASCII vs Unicode strings,
empty-domain shape, malformed signature/type rejection, and SPNEGO-
wrapped find_ntlmssp() lookup.

DEBT-040 opens to track the three remaining protocol framers that
will consume this parser:
  - SMB: hand-rolled SMB2 + Session Setup framer (~200 LoC) replacing
    Impacket's opaque SimpleSMBServer
  - RDP basic auth: TPKT/X.224/MCS framer for legacy plaintext path
    (~150 LoC)
  - RDP NLA: TLS upgrade + CredSSP TSRequest parser, reuses parse_type3
    via the SPNEGO inner blob (~250 LoC)

These are substantial protocol implementations each — landing them
inline with Phase 1-3+6's cred coverage rollout would have inflated
the session beyond reasonable scope. Cred-reuse analytics already work
across the 12 services covered in this session; the deferred three
just round out the fleet.
2026-04-25 07:19:30 -04:00
9777aa7677 feat(creds): Phase 6 — MongoDB SCRAM credential capture
Plugs the cred-coverage gap for MongoDB. The template previously
parsed only the wire opcode + length and discarded the BSON body
entirely, so SCRAM-SHA-{1,256} client-proofs flowed straight through
without ever landing in the Credential table.

Adds an inline minimal BSON walker (~100 LoC) covering the 7 type
codes auth commands actually use: string, doc, array, binary, bool,
int32, int64. Hand-rolled rather than pulling pymongo as a runtime
dep — the parser is bounds-checked for untrusted-input safety
(won't loop on malformed length fields).

Wire flow MongoDB clients use for auth:
- OP_MSG body section (kind=0) → BSON doc with `saslStart` field
  carrying mechanism + payload (SCRAM client-first-message:
  "n,,n=<user>,r=<nonce>"). Username extracted, pinned to the
  per-connection _sasl_username + _sasl_mechanism state.
- Subsequent OP_MSG with `saslContinue` → SCRAM client-final-message
  ("c=biws,r=<combined>,p=<base64 client-proof>"). The `p=` value is
  the credential — emitted as secret_kind=scram_sha256 (or _sha1 /
  _unknown depending on the prior saslStart's mechanism), principal
  = the pinned username, secret_b64 = base64 of the decoded proof.

Reuse semantics: same client-proof across two auth attempts only
matches when both server salt and password were identical (proofs
include the salt). So cross-session reuse correlates only on
credential reuse against the same MongoDB account on the same decky
— honest, non-misleading signal.

680 tests pass across services, service_testing, db, web/ingester,
and core/fingerprinting (the broader scope my recent commits
touched). Phases 4, 5, 7 still pending (RDP basic-auth, SMB
NTLMSSP, RDP NLA).
2026-04-25 07:15:44 -04:00
e4bf8fa012 feat(creds): Phase 3 — HTTP/HTTPS POST form body cred extraction
Login forms (wp-login.php, phpMyAdmin, Joomla, etc.) ship a
`Content-Type: application/x-www-form-urlencoded` body with field
names like username/user/email/log/pwd/password. The HTTP/HTTPS
templates already captured the body as opaque bytes; now they parse
common login-form shapes into the universal credential SD shape.

Adds canonical templates/syslog_bridge.py:
extract_form_credentials(body, content_type) -> dict | None.

Field-name matching is case-insensitive and covers:
  Principal: username, user, email, login, userid, account, log,
             user_login (WordPress), uname / pma_username (phpMyAdmin)
  Secret:    password, pass, pwd, passwd, passwort, mot_de_passe,
             user_password (WordPress), pma_password (phpMyAdmin)

The HTTP/HTTPS log_request handlers now call:
  cred = classify_authorization(...) or extract_form_credentials(...)
— Authorization wins when present (current session credential beats
a follow-up form change), but POSTs to /wp-login.php with no Auth
header still surface their cleartext creds.

Secret-without-principal is intentional: a reset-confirm or auto-
fill abuse may carry a password without any field that maps to our
principal list. The cred row writes with principal=None — the
sha256 still correlates across services for reuse analytics.

The body capture cap bumped from 512 → 4096 chars so reasonable
form bodies aren't truncated before the cred extractor sees them;
the body stored in fields.body stays at 512 chars (display-friendly).

36 helper + emitter tests pass. Phases 4-7 still pending.
2026-04-25 07:10:05 -04:00
0c1316f74c feat(creds): Phase 2 — MySQL handshake hash + MSSQL Login7 plaintext
Closes the cred-coverage gap for two database services that had been
capturing only the username:

- MySQL — extends _handle_packet to read the auth-response after the
  null-terminated username. mysql_native_password puts a 1-byte
  length followed by 20 bytes: SHA1(password) XOR SHA1(salt +
  SHA1(SHA1(password))). Plaintext irrecoverable, lands as
  secret_kind="mysql_native_password" with the 20 hash bytes in
  secret_b64. Hash is canonical for "hashcat -m 11200" if an operator
  ever wants to crack offline.

- MSSQL — fixes a pre-existing bug AND adds password capture. The
  prior _parse_login7_username read offsets 36/38, which is actually
  ibHostName/cchHostName in the Login7 layout — username sat at
  40/42 and was never touched. Replaced with _parse_login7_creds()
  reading the correct offsets (40 username, 44 password). Login7
  password is XOR-then-nibble-swap obfuscated against 0xa5;
  _deobfuscate_login7_password reverses it. Plaintext-recoverable,
  lands as secret_kind="plaintext".

The pre-existing test_login7_auth_logged_and_closes only verified the
error response ships and the connection closes; it didn't validate
the parsed username, so the hostname-as-username bug was silent. New
tests cover both the deobfuscation algorithm directly and the full
ingester round-trip for both services.

Sync: copies the canonical syslog_bridge.py into mysql/ and mssql/
template build contexts so service_testing tests load the version
with classify_authorization + encode_secret available.

37 tests pass in the touched scope. Phases 3-7 still pending.
2026-04-25 07:07:33 -04:00
3404e3b3a6 feat(creds): Phase 1 — Authorization header + SNMP community capture
Closes the cred-coverage gap for 7 services that already had the data
on the wire but never landed it in the Credential table:

- SNMP — community string lands as secret_kind="snmp_community",
  principal=None (v1/v2c has no per-user identity, the community IS
  the auth).
- SIP — Digest response hash, previously buried in the auth= header
  dump, now classify_authorization()-extracted.
- HTTP / HTTPS — Authorization header was in the headers JSON but
  never extracted. Now Basic decodes to plaintext, Bearer →
  http_bearer (principal=None), Digest → http_digest_md5.
- K8s — already extracted Authorization but didn't normalize. Service-
  account JWTs flow through as Bearer.
- Docker API — headers absent entirely. Adds the headers JSON dump
  and runs Authorization through the classifier.
- Elasticsearch — five distinct request handlers; each gains a
  per-handler _cred_fields() helper.

Adds canonical templates/syslog_bridge.py:classify_authorization().
Recognised: Basic / Bearer / Token / Digest. Unknown schemes (NTLM,
AWS4-HMAC, Negotiate) return None; the header still rides in the
ambient SD-block but isn't normalized as a credential. The SD shape
on the wire collapses sip_digest_md5 into http_digest_md5 — same
algorithm, so cross-protocol reuse correlates correctly when (rare)
nonce collisions allow.

Drive-by repair of tests/core/test_fingerprinting.py:

- The pre-existing `test_http_useragent_extracted` asserted both that
  add_bounty was called exactly once AND that the UA payload carried
  `path` and `method` fields. Both wrong since this session opened:
  the http_quirks fingerprint added later fires too, and the UA
  payload never actually included path/method despite the assertion.
- Adds `path`/`method` to the UA fingerprint payload (real operator
  value: "Nikto hit /admin" beats "Nikto seen on this decky").
- Replaces `assert_awaited_once` with a `_find_ua_bounty()` helper
  that filters add_bounty calls by `fingerprint_type`. New fingerprint
  families landing later won't retroactively break old tests.
- Updates the two credential-bearing tests to use the post-DEBT-039
  native shape (`secret_b64` / `principal`) and `upsert_credential`,
  not the deleted legacy `username+password` adapter.

Also rebuilds the per-service fake `syslog_bridge` modules in
tests/service_testing/{conftest,test_imap,test_pop3,test_snmp,test_mqtt,test_smtp}.py
to expose `encode_secret` + `classify_authorization`. Service templates
that import either now no longer fail at test collection.

173 tests pass in the touched scope. Phases 2-7 still pending.
2026-04-25 07:04:10 -04:00
6b16c844b6 fix(creds): MQTT regression + secret_kind for hash credentials
Honest correction to the "every cred-emitting service" claim. Audit
of templates/* found three gaps:

1. MQTT — was working through the legacy adapter, silently dropped
   when Phase 3 (e696c2b) deleted it. Now migrated to encode_secret()
   alongside the others.
2. Postgres — `auth, pw_hash=…` event captures the MD5
   challenge-response the attacker sent. Plaintext irrecoverable, so
   it never fit the (principal, secret_b64=raw_bytes) shape. Lands
   in Credential as secret_kind="postgres_md5_challenge".
3. VNC — `auth_response, response=…hex` event captures the 16-byte
   DES-encrypted challenge. Same situation as Postgres: plaintext
   irrecoverable. Lands as secret_kind="vnc_des_response".

Adds a `secret_kind` discriminator column to Credential (default
"plaintext", indexed). The dedup tuple gains secret_kind so two
credentials with the same sha256 but different kinds are
fundamentally different rows — different challenges produce
different bytes for the same plaintext password, so cross-kind
reuse matches are meaningless and would only confuse analytics.

The model now genuinely covers every cred-emitting service in the
fleet:

  plaintext        SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP,
                   MQTT
  postgres_md5_*   Postgres
  vnc_des_response VNC

Username-only services (MySQL/MSSQL — TDS pre-encryption captures
the user but never sees the password byte) intentionally don't feed
Credential — they're recon signals, not cred attempts.

40 tests pass in the touched scope. New cases: secret_kind dedups
independently in the repo; Postgres MD5 + VNC DES emitters thread
through; MQTT round-trips through the native branch.
2026-04-25 06:16:57 -04:00
abb4dd9fc0 feat(templates): migrate six cred emitters to native shape
Phase 2/3 of DEBT-039. Switches FTP, POP3, IMAP, SMTP, Redis, and
LDAP from the legacy `username=` + `password=` SD-block shape to the
universal credential shape (`principal=` + `secret_printable=` +
`secret_b64=`) the new Credential storage model expects.

Pattern is uniform across all six services:

    _log("auth_attempt", username=u, principal=u, **encode_secret(pw))

Each service emits the canonical SD keys. The ingester's native-shape
branch (introduced in 2f47f67) now writes their cred attempts
directly without going through the legacy adapter. Once Phase 3
removes the adapter the contract becomes single-shape.

Per-service notes:
- POP3 / IMAP — `status="success"|"failed"` renamed to
  `outcome="success"|"failure"` to match Credential.outcome's
  vocabulary; the ingester reads outcome directly.
- SMTP — AUTH path migrated; in addition the existing mail_from
  event now exposes a parsed `domain=` field alongside the original
  `value=` so future "what domains do attackers spoof from" analytics
  have an indexed field. Not stored in Credential — regular Log row.
- Redis — was silently dropped by the legacy adapter (no `username`
  field). Native branch handles `principal=None` correctly. BONUS
  FIX: the Redis 6+ ACL syntax `AUTH <user> <pw>` now captures the
  ACL username as principal (was previously discarded).
- LDAP — was silently dropped by the legacy adapter (no `password`
  recognition for the `bind` event). Now lands as
  `principal=<dn>`. BONUS FIX.

Tests (tests/services/test_cred_emitters.py, 9 cases):
- per-service native-shape ingest path produces correct Credential
  rows; outcome maps for POP3/IMAP; principal=None for legacy Redis
  AUTH; principal=dn for LDAP.
- mail_from event does NOT trigger a credential write (it's a
  Log-only observation, not auth).
- 0xff/NUL/ANSI bytes in passwords survive losslessly through
  secret_b64 even when secret_printable is sanitized.

Phase 3 deletes the legacy adapter once all migrations land — the
adapter has no live emitters to handle anymore.
2026-04-25 05:43:51 -04:00
aebb9f81c6 feat(templates): encode_secret() helper in canonical syslog_bridge
Phase 1/3 of DEBT-039. Adds the Python emitter-side counterpart to
auth-helper.c's sd_escape + base64 logic so service templates can
emit the universal credential SD shape with a single spread:

    _log("auth_attempt", principal=user, **encode_secret(password))

secret_printable mirrors the C helper's [0x20, 0x7f) → '?' contract;
secret_b64 preserves the ORIGINAL utf-8 bytes losslessly so non-ASCII
or control-byte payloads survive as fingerprinting signal even when
the printable form sanitizes them.

The canonical syslog_bridge.py is what _sync_logging_helper()
propagates into per-template build contexts at deploy time, so any
service that imports its local syslog_bridge picks this up
automatically on next rebuild.

Phase 2 migrates the six cred-emitting service templates (FTP, POP3,
IMAP, SMTP, Redis, LDAP) onto this helper. Phase 3 deletes the
ingester's legacy adapter once nothing emits the old shape.
2026-04-25 05:37:44 -04:00
ea95a009df refactor(tests): move flat tests/*.py into per-subsystem subfolders
Groups every flat test_*.py under the module it exercises, matching the
existing tests/{profiler,sniffer,prober,collector,correlation,cli,web,
topology,swarm,bus,updater,api,docker,geoip,...} layout. New folders:
services/, fleet/, config/, logging/, db/ (+ db/mysql/), telemetry/,
mutator/, core/.

Path-dependent __file__ references bumped an extra .parent in three
files that moved one level deeper:
- tests/sniffer/test_sniffer_ja3.py   (template path)
- tests/services/test_ssh_capture_emit.py (template path)
- tests/cli/test_mode_gating.py  (REPO root)
- tests/web/test_env_lazy_jwt.py (repo var)

Also drops two SQLite runtime artifacts (test_decnet.db-{shm,wal}) that
were leaking into the repo from a previous test run.

Fixes two test_service_isolation cases that patched asyncio.sleep (no
longer on the profiler main-loop hot path — same pre-existing bug I
fixed earlier in test_attacker_worker.py) by patching asyncio.wait_for
and passing interval=0.
2026-04-23 21:34:25 -04:00