feat(profiler/behave_shell): asciinema parser + paste-burst detection
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will consume: * parse_shard_line / parse_shard turn a shard JSONL line/file into AsciinemaEvents, skipping headers and malformed records. * PasteBurst dataclass + _detect_paste_bursts group consecutive paste-class input events (len(d) >= 4 chars per the prototype's empirical floor) into contiguous bursts, splitting on IAT gaps larger than PASTE_BURST_MAX_IAT_S (200ms). * SessionContext now carries iats and paste_bursts derivations. * Threshold constants harvested from BEHAVE/prototype_extractors/shell/extract.py — calibrated against the five 2026-05-02 shards. Tests cover pure-typed, pure-pasted, mixed streams; close vs far paste events; typed events breaking a burst; PasteBurst immutability; and the JSON parser's junk handling.
This commit is contained in:
@@ -1,14 +1,76 @@
|
||||
"""Asciinema event types.
|
||||
"""Asciinema event types + shard-line parsing helpers.
|
||||
|
||||
The on-disk shard format is a list of 3-tuples ``(t, kind, data)`` where
|
||||
``t`` is seconds since session start (float), ``kind`` is ``'i'`` (input)
|
||||
or ``'o'`` (output), and ``data`` is the captured bytes decoded as a
|
||||
Python ``str``. Step 0 ships only the type aliases — Step 1 fills the
|
||||
parsing helpers and paste-burst detector.
|
||||
Shard lines are JSON objects ``{"sid": ..., "t": float, "ch": "i"|"o",
|
||||
"d": str}`` produced by the DECNET PTY-recording wrapper and held in
|
||||
sensor-side blob storage. The first line of each file is a header
|
||||
(``{"sid": ..., "hdr": {...}}``) which carries no event payload — the
|
||||
parser skips it.
|
||||
|
||||
The on-wire engine input is the simpler 3-tuple ``(t, kind, data)``
|
||||
:data:`AsciinemaEvent`. Workers (``BEHAVE-INTEGRATION.md`` Phase 4)
|
||||
either feed the 3-tuple directly or use :func:`parse_shard_line` to
|
||||
turn a raw JSON string into one.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Literal, Tuple
|
||||
import json
|
||||
from dataclasses import dataclass
|
||||
from typing import Iterable, Iterator, Literal, Tuple
|
||||
|
||||
EventKind = Literal["i", "o"]
|
||||
AsciinemaEvent = Tuple[float, EventKind, str]
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class PasteBurst:
|
||||
"""Contiguous run of paste-class input events.
|
||||
|
||||
A paste-class event is a single input event whose ``data`` length
|
||||
is at least ``PASTE_MIN_CHARS_PER_EVENT`` — terminal pastes from
|
||||
xterm/kitty/iTerm arrive as one bulk write, so checking event size
|
||||
is the cheap-and-correct proxy for the bracketed-paste signal we
|
||||
don't get to see.
|
||||
|
||||
Multiple consecutive paste-class events with low IATs collapse
|
||||
into one ``PasteBurst`` for higher-level reasoning (paste-rate /
|
||||
paste-style classification later).
|
||||
"""
|
||||
|
||||
start_ts: float
|
||||
end_ts: float
|
||||
char_count: int
|
||||
event_count: int
|
||||
|
||||
|
||||
def parse_shard_line(line: str) -> AsciinemaEvent | None:
|
||||
"""Turn one shard JSONL line into an :data:`AsciinemaEvent`.
|
||||
|
||||
Returns ``None`` for the header line and for any line that is not
|
||||
a well-formed event record. Workers must filter ``None``s out
|
||||
before passing to :func:`extract_session`.
|
||||
"""
|
||||
line = line.strip()
|
||||
if not line:
|
||||
return None
|
||||
try:
|
||||
rec = json.loads(line)
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
return None
|
||||
if not isinstance(rec, dict):
|
||||
return None
|
||||
if "hdr" in rec or "t" not in rec or "ch" not in rec:
|
||||
return None
|
||||
t = rec.get("t")
|
||||
ch = rec.get("ch")
|
||||
d = rec.get("d", "")
|
||||
if not isinstance(t, (int, float)) or ch not in ("i", "o") or not isinstance(d, str):
|
||||
return None
|
||||
return (float(t), ch, d)
|
||||
|
||||
|
||||
def parse_shard(lines: Iterable[str]) -> Iterator[AsciinemaEvent]:
|
||||
"""Stream-parse a shard file's lines into events, skipping junk."""
|
||||
for line in lines:
|
||||
ev = parse_shard_line(line)
|
||||
if ev is not None:
|
||||
yield ev
|
||||
|
||||
Reference in New Issue
Block a user