feat(creds): future-proof Credential storage model

Replaces the opaque Bounty.bounty_type='credential' path with a
dedicated `credentials` table whose schema is forward-compatible
across every auth-bearing service in the fleet. Hoisted indexed
columns (secret_sha256, principal, service, attacker_ip) carry the
universal reuse-analytics signal; service-specific JSON keys ride
in `fields`. Cross-service reuse queries become an indexed lookup
on secret_sha256 instead of JSON_EXTRACT scans.

Schema decisions baked in (per ANTI):
- New `Credential` table, not extension to Bounty
- Hoisted `principal` column for cross-service principal-reuse
- Standardized JSON keys: every payload carries secret_b64 +
  secret_printable + principal universally; service-specific extras
  (user, domain, dn, mech, …) ride alongside

The auth-helper SD-block emits the new shape natively. The ingester
forks at _extract_bounty:
- Native shape (SSH/Telnet, future emitters): secret_b64 present →
  direct upsert_credential
- Legacy shape (FTP/POP3/IMAP/SMTP today): username + password →
  adapter synthesizes secret_{b64,sha256,printable} on the fly,
  upserts into the same Credential table. Tracked as DEBT-039;
  one-shot bridge until those service templates migrate.

Defense-in-depth across five layers (input validation):
- C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424
  escape rules for \\, ", ]; b64 preserves exact bytes
- Ingester native branch: rejects malformed secret_b64 (regex), drops
  the credential row but keeps the underlying Log
- Ingester legacy adapter: same printable-ASCII filter as the C
  code; sha256 + b64 over the original utf-8 bytes (lossless, even
  when secret_printable is sanitized)
- DB column caps with truncation warning; sha256 always over the
  full pre-truncation bytes so reuse queries match across truncation
- JSON serialized with ensure_ascii=True so utf8mb4 columns stay
  safe even with non-ASCII service-specific keys

Bounty.bounty_type='credential' is no longer written. Pre-v1: no
historical backfill; existing rows stay untouched but unused.

595 tests pass; new tests cover the model + repo (upsert dedup,
null-principal independence, cross-service reuse, filters), both
ingester branches, b64 validation, sanitization preserving the
fingerprinting signal in b64.
This commit is contained in:
2026-04-25 05:29:26 -04:00
parent 50c12d9e16
commit 2f47f67eef
12 changed files with 760 additions and 63 deletions

View File

@@ -13,12 +13,23 @@
* 55555, MSGID `auth_attempt` (matches FTP's existing event type so
* the parser + dashboard pick it up with zero changes).
*
* Two password fields ride in the SD-block:
* password RFC 5424-escaped ASCII-printable, '?' for non-printables.
* FTP-compatible; consumed by existing dashboard rendering.
* password_b64 base64 of the exact PAM_AUTHTOK bytes. Lossless.
* Preserves NUL/0xff/control bytes that the plain field
* would silently drop — useful fingerprinting signal.
* SD-block carries the standardized credential shape (matches
* decnet/web/db/models/logs.py:Credential). Universal keys consumed
* directly by the ingester's native-shape branch:
* principal the human-meaningful identity the attacker sent
* (username for SSH/Telnet; would be a domain for
* SMTP, a DN for LDAP, etc.)
* secret_printable RFC 5424-escaped ASCII-printable, '?' for non-
* printables. Best-effort display form; may be
* lossy on non-UTF8 bytes.
* secret_b64 base64 of the exact PAM_AUTHTOK bytes. Lossless.
* Preserves NUL/0xff/control bytes that the plain
* field would silently drop — useful fingerprinting
* signal that survives display sanitization.
*
* `username` rides alongside as a service-specific identity field for
* SSH/Telnet (mirrors `principal`); future emitters (SMTP, LDAP, …)
* drop `username` in favor of their service-native identity field.
*
* Fail-open: every error path silently exits 0. The PAM line is `optional`
* so a malfunctioning helper must never break sshd auth.
@@ -150,13 +161,19 @@ pw_done:;
b64_encode(pw_raw, pw_len, pw_b64, sizeof(pw_b64));
/* Priority: facility=local0(16), severity=INFO(6) → <16*8+6> = <134>.
* Matches the syslog_bridge.py default exactly. */
* Matches the syslog_bridge.py default exactly.
*
* SD-block keys match the Credential storage model: principal +
* secret_printable + secret_b64 are the universal keys the ingester
* keys off; username is emitted alongside principal so existing
* dashboards that read SSH/Telnet `username=` keep working until
* the cred-reuse UI lands. */
char line[LINE_BUF];
int n = snprintf(line, sizeof(line),
"<134>1 %s %s auth-helper - auth_attempt "
"[relay@55555 username=\"%s\" password=\"%s\" "
"password_b64=\"%s\" src_ip=\"%s\"]\n",
tsbuf, host, user_esc, pw_esc, pw_b64, rhost_esc);
"[relay@55555 username=\"%s\" principal=\"%s\" "
"secret_printable=\"%s\" secret_b64=\"%s\" src_ip=\"%s\"]\n",
tsbuf, host, user_esc, user_esc, pw_esc, pw_b64, rhost_esc);
if (n <= 0 || (size_t)n >= sizeof(line)) return 0;
/* /proc/1/fd/1 is the entrypoint's stdout — the fd Docker captures