fix(profiler/behave_shell): tolerate non-UTF-8 bytes in shard reads
Real-world bug surfaced on the first live decky run: sessrec.c's json_escape (decnet/templates/_shared/sessrec/sessrec.c:111-141) only escapes bytes < 0x20 + DEL — bytes >= 0x80 pass through raw. An attacker pasting Latin-1 / GB18030 / any non-UTF-8 8-bit text yields a shard line that chokes Python's default UTF-8 text-mode read with 'utf-8 codec can't decode byte 0xac'. Three changes: 1. _events_for_sid now opens with errors='surrogateescape', preserving byte fidelity through the JSON parse. Surrogate-half chars correctly fail isascii() / isalpha() so the typed-letter histograms filter them out automatically. Tightening sessrec.c to escape >= 0x80 is filed for v0.2 — that's the proper forensic-data fix; the surrogateescape read makes the engine robust meanwhile. 2. Regression test (test_handler_tolerates_non_utf8_bytes_in_shard) builds a shard with raw 0xAC bytes inside a JSON 'data' string and asserts the handler still persists observations. 3. Collector's _emit_session now logs at WARNING (was DEBUG) when find_shard_with_sid returns None, citing the three usual causes (ARTIFACTS_ROOT perms, _SERVICE_RE whitelist, sessrec/collector race). Surfaces the silent-skip class of bug in seconds instead of hours — the first live run hid a perm mismatch (User=anti without SupplementaryGroups=decnet) for an entire session window before the symptom was traced upstream.
This commit is contained in:
@@ -313,16 +313,34 @@ class _SessionAggregator:
|
||||
# consumer skips honestly. Additive field; existing TTP consumers
|
||||
# ignore it.
|
||||
shard_path: str | None = None
|
||||
resolve_error: str | None = None
|
||||
if sid and decky and service:
|
||||
try:
|
||||
resolved = find_shard_with_sid(decky, service, sid)
|
||||
except (ValueError, OSError, PermissionError) as exc:
|
||||
logger.debug(
|
||||
"collector: shard resolve failed for sid=%s: %s", sid, exc,
|
||||
)
|
||||
resolve_error = f"{type(exc).__name__}: {exc}"
|
||||
resolved = None
|
||||
if resolved is not None:
|
||||
shard_path = str(resolved)
|
||||
if shard_path is None and sid:
|
||||
# Loud-by-default — the BEHAVE-SHELL handler will skip
|
||||
# session.ended events with shard_path=None, so a silent
|
||||
# miss here means the profiler panel never hydrates. Surface
|
||||
# the most common failure modes inline so the operator can
|
||||
# diagnose without grepping decnet/artifacts/shards.py.
|
||||
#
|
||||
# 1. ARTIFACTS_ROOT not readable by the collector's user
|
||||
# (perm 0750 decnet:decnet vs. User=anti without
|
||||
# SupplementaryGroups=decnet).
|
||||
# 2. service whitelist (_SERVICE_RE accepts ssh|telnet only).
|
||||
# 3. sessrec hasn't flushed the shard for this sid yet
|
||||
# (collector tick won the race; next tick recovers).
|
||||
logger.warning(
|
||||
"collector: shard_path=None decky=%s service=%s sid=%s "
|
||||
"(error=%s) — profiler will skip this session.ended; "
|
||||
"check ARTIFACTS_ROOT perms / service whitelist",
|
||||
decky, service, sid, resolve_error or "shard not found",
|
||||
)
|
||||
|
||||
payload: dict[str, Any] = {
|
||||
"session_id": sid or None,
|
||||
|
||||
Reference in New Issue
Block a user