feat(creds): future-proof Credential storage model

Replaces the opaque Bounty.bounty_type='credential' path with a
dedicated `credentials` table whose schema is forward-compatible
across every auth-bearing service in the fleet. Hoisted indexed
columns (secret_sha256, principal, service, attacker_ip) carry the
universal reuse-analytics signal; service-specific JSON keys ride
in `fields`. Cross-service reuse queries become an indexed lookup
on secret_sha256 instead of JSON_EXTRACT scans.

Schema decisions baked in (per ANTI):
- New `Credential` table, not extension to Bounty
- Hoisted `principal` column for cross-service principal-reuse
- Standardized JSON keys: every payload carries secret_b64 +
  secret_printable + principal universally; service-specific extras
  (user, domain, dn, mech, …) ride alongside

The auth-helper SD-block emits the new shape natively. The ingester
forks at _extract_bounty:
- Native shape (SSH/Telnet, future emitters): secret_b64 present →
  direct upsert_credential
- Legacy shape (FTP/POP3/IMAP/SMTP today): username + password →
  adapter synthesizes secret_{b64,sha256,printable} on the fly,
  upserts into the same Credential table. Tracked as DEBT-039;
  one-shot bridge until those service templates migrate.

Defense-in-depth across five layers (input validation):
- C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424
  escape rules for \\, ", ]; b64 preserves exact bytes
- Ingester native branch: rejects malformed secret_b64 (regex), drops
  the credential row but keeps the underlying Log
- Ingester legacy adapter: same printable-ASCII filter as the C
  code; sha256 + b64 over the original utf-8 bytes (lossless, even
  when secret_printable is sanitized)
- DB column caps with truncation warning; sha256 always over the
  full pre-truncation bytes so reuse queries match across truncation
- JSON serialized with ensure_ascii=True so utf8mb4 columns stay
  safe even with non-ASCII service-specific keys

Bounty.bounty_type='credential' is no longer written. Pre-v1: no
historical backfill; existing rows stay untouched but unused.

595 tests pass; new tests cover the model + repo (upsert dedup,
null-principal independence, cross-service reuse, filters), both
ingester branches, b64 validation, sanitization preserving the
fingerprinting signal in b64.
This commit is contained in:
2026-04-25 05:29:26 -04:00
parent 50c12d9e16
commit 2f47f67eef
12 changed files with 760 additions and 63 deletions

View File

@@ -110,6 +110,55 @@ class BaseRepository(ABC):
"""Retrieve the total count of bounties, optionally filtered."""
pass
# ---- credentials ---------------------------------------------------
@abstractmethod
async def upsert_credential(self, data: dict[str, Any]) -> int:
"""Insert or upsert a credential attempt; returns the row id.
Dedup tuple: (attacker_ip, decky_name, service, secret_sha256,
principal_or_None). On dedup match, ``attempt_count`` is bumped
and ``last_seen`` updated; the originally-seen ``first_seen``
and ``fields`` JSON are preserved.
"""
pass
@abstractmethod
async def get_credentials(
self,
limit: int = 50,
offset: int = 0,
search: Optional[str] = None,
service: Optional[str] = None,
attacker_ip: Optional[str] = None,
) -> list[dict[str, Any]]:
"""Paginated credential rows, with optional filters."""
pass
@abstractmethod
async def get_total_credentials(
self,
search: Optional[str] = None,
service: Optional[str] = None,
attacker_ip: Optional[str] = None,
) -> int:
"""Total credential count under the same filters as get_credentials."""
pass
@abstractmethod
async def get_credentials_for_attacker(
self, attacker_ip: str
) -> list[dict[str, Any]]:
"""Every credential row from the given attacker IP."""
pass
@abstractmethod
async def get_credential_reuse(
self, secret_sha256: str
) -> list[dict[str, Any]]:
"""Every (attacker, decky, service, principal) row sharing this secret hash."""
pass
@abstractmethod
async def get_state(self, key: str) -> Optional[dict[str, Any]]:
"""Retrieve a specific state entry by key."""