fix(collector): unwrap double-wrapped RFC5424 around bash PROMPT_COMMAND

Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
  - kept `Attacker.commands` rows missing `command_text`
  - left R0001–R0030 (the pattern rule pack that matches shell
    commands) with no haystack
  - made `decnet.collector.log` show `event written … type=-`
    for the very lines that should be `type=command`

Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.

Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
This commit is contained in:
2026-05-02 02:32:21 -04:00
parent e08bfc4a73
commit d9d2a80573
4 changed files with 234 additions and 4 deletions

View File

@@ -32,6 +32,21 @@ _RFC5424_RE = re.compile(
r"(.+)$", # 5: SD element + optional MSG
)
# Honeypot SSH PROMPT_COMMAND lines arrive double-wrapped: the
# Docker-stdout collector envelope wraps the inner ``logger
# --rfc5424 --msgid command -t bash …`` line. Outer MSGID is NIL,
# real MSGID lives in the body. Mirrors the unwrap logic in
# ``decnet.collector.worker._INNER_RFC5424_RE`` — the two parsers
# read the same on-wire format.
_INNER_RFC5424_RE = re.compile(
r"^(\d{4}-\d{2}-\d{2}T\S+)\s+" # 1: inner TIMESTAMP
r"(\S+)\s+" # 2: inner HOSTNAME
r"(\S+)\s+" # 3: inner APP-NAME
r"\S+\s+" # PROCID (NIL or PID)
r"(\S+)\s+" # 4: inner MSGID
r"(.+)$", # 5: inner SD/MSG remainder
)
# Structured data block: [relay@55555 k="v" ...]
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
@@ -121,6 +136,21 @@ def parse_line(line: str) -> LogEvent | None:
ts_raw, decky, service, event_type, sd_rest = m.groups()
# Unwrap double-wrapped Docker-stdout envelopes around bash
# PROMPT_COMMAND lines. See ``_INNER_RFC5424_RE`` and the matching
# logic in ``decnet.collector.worker.parse_rfc5424``. Must run
# before the decky/service NIL-guard below — the OUTER decky is
# the docker host, the inner header carries the real source.
if event_type == "-" and sd_rest.startswith("-"):
body = sd_rest[1:].lstrip()
inner = _INNER_RFC5424_RE.match(body)
if inner is not None:
_i_ts, i_host, i_app, i_msgid, i_rest = inner.groups()
decky = i_host
service = i_app
event_type = i_msgid
sd_rest = i_rest
if decky == "-" or service == "-":
return None