feat(profiler/behave_shell): asciinema parser + paste-burst detection
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will consume: * parse_shard_line / parse_shard turn a shard JSONL line/file into AsciinemaEvents, skipping headers and malformed records. * PasteBurst dataclass + _detect_paste_bursts group consecutive paste-class input events (len(d) >= 4 chars per the prototype's empirical floor) into contiguous bursts, splitting on IAT gaps larger than PASTE_BURST_MAX_IAT_S (200ms). * SessionContext now carries iats and paste_bursts derivations. * Threshold constants harvested from BEHAVE/prototype_extractors/shell/extract.py — calibrated against the five 2026-05-02 shards. Tests cover pure-typed, pure-pasted, mixed streams; close vs far paste events; typed events breaking a burst; PasteBurst immutability; and the JSON parser's junk handling.
This commit is contained in:
@@ -1,11 +1,37 @@
|
||||
"""Numeric thresholds for BEHAVE-SHELL primitive classification.
|
||||
|
||||
Each constant added here cites its calibration source. When the
|
||||
registry's ``notes:`` field disagrees with a constant in this file the
|
||||
registry is authoritative — fix the constant and re-run the
|
||||
calibration grid.
|
||||
Each constant cites its calibration source. When the registry's
|
||||
``notes:`` field disagrees with a constant here, the registry is
|
||||
authoritative — fix the constant, re-run the calibration grid.
|
||||
|
||||
Step 0 ships this file empty by design; thresholds land alongside the
|
||||
feature functions that consume them (Steps 1+).
|
||||
Empirical thresholds inherited from the BEHAVE prototype extractor
|
||||
(``BEHAVE/prototype_extractors/shell/extract.py``); see lines 40-90 of
|
||||
that file for the calibration history. Any change here must keep the
|
||||
five-class grid green.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
# ── paste-burst detection (Step 1) ──────────────────────────────────────────
|
||||
# A single input event with ≥ PASTE_MIN_CHARS_PER_EVENT chars is the
|
||||
# paste-class proxy used by the prototype; xterm-kitty / iTerm / VS Code
|
||||
# pastes arrive as one bulk write.
|
||||
PASTE_MIN_CHARS_PER_EVENT: int = 4
|
||||
|
||||
# Consecutive paste-class events arriving within this IAT collapse into
|
||||
# one PasteBurst record. 200ms is the prototype's IKI burst cap.
|
||||
PASTE_BURST_MAX_IAT_S: float = 0.20
|
||||
|
||||
# ── motor.input_modality (Step 2) ───────────────────────────────────────────
|
||||
# Paste-event ratio thresholds. ≥ 40% paste events → "pasted" (LLM-driven);
|
||||
# ≤ 5% → "typed" (human at the keyboard); in between → "mixed".
|
||||
# Lowered from 0.5 after the 47.6% case in sessions-2026-05-02-with-llm.jsonl
|
||||
# was clearly LLM-driven but missed the 0.5 floor.
|
||||
MODALITY_PASTED_MIN: float = 0.40
|
||||
MODALITY_TYPED_MAX: float = 0.05
|
||||
|
||||
# ── motor.paste_burst_rate (Step 3) ─────────────────────────────────────────
|
||||
# Same paste-event ratio re-bucketed for the "how often does the operator
|
||||
# paste" axis. Coarser than input_modality on purpose: this primitive is the
|
||||
# habit signal, input_modality is the dominant-channel signal.
|
||||
PASTE_RATE_HABITUAL_MIN: float = 0.50
|
||||
PASTE_RATE_OCCASIONAL_MIN: float = 0.10
|
||||
|
||||
Reference in New Issue
Block a user