feat(creds): Phase 3 — HTTP/HTTPS POST form body cred extraction

Login forms (wp-login.php, phpMyAdmin, Joomla, etc.) ship a
`Content-Type: application/x-www-form-urlencoded` body with field
names like username/user/email/log/pwd/password. The HTTP/HTTPS
templates already captured the body as opaque bytes; now they parse
common login-form shapes into the universal credential SD shape.

Adds canonical templates/syslog_bridge.py:
extract_form_credentials(body, content_type) -> dict | None.

Field-name matching is case-insensitive and covers:
  Principal: username, user, email, login, userid, account, log,
             user_login (WordPress), uname / pma_username (phpMyAdmin)
  Secret:    password, pass, pwd, passwd, passwort, mot_de_passe,
             user_password (WordPress), pma_password (phpMyAdmin)

The HTTP/HTTPS log_request handlers now call:
  cred = classify_authorization(...) or extract_form_credentials(...)
— Authorization wins when present (current session credential beats
a follow-up form change), but POSTs to /wp-login.php with no Auth
header still surface their cleartext creds.

Secret-without-principal is intentional: a reset-confirm or auto-
fill abuse may carry a password without any field that maps to our
principal list. The cred row writes with principal=None — the
sha256 still correlates across services for reuse analytics.

The body capture cap bumped from 512 → 4096 chars so reasonable
form bodies aren't truncated before the cred extractor sees them;
the body stored in fields.body stays at 512 chars (display-friendly).

36 helper + emitter tests pass. Phases 4-7 still pending.
This commit is contained in:
2026-04-25 07:10:05 -04:00
parent 0c1316f74c
commit e4bf8fa012
30 changed files with 1972 additions and 8 deletions

View File

@@ -18,6 +18,7 @@ from werkzeug.serving import make_server, WSGIRequestHandler
import instance_seed as _seed
from syslog_bridge import (
classify_authorization,
extract_form_credentials,
forward_syslog,
syslog_line,
write_syslog_file,
@@ -99,14 +100,18 @@ def _log(event_type: str, severity: int = 6, **kwargs) -> None:
@app.before_request
def log_request():
cred = classify_authorization(request.headers.get("Authorization"))
body = request.get_data(as_text=True)[:4096]
cred = (
classify_authorization(request.headers.get("Authorization"))
or extract_form_credentials(body, request.headers.get("Content-Type"))
)
_log(
"request",
method=request.method,
path=request.path,
remote_addr=request.remote_addr,
headers=dict(request.headers),
body=request.get_data(as_text=True)[:512],
body=body[:512],
**(cred or {}),
)