Commit Graph

1 Commits

Author SHA1 Message Date
289a64014c feat(profiler/behave_shell): G.0 intent lexicon + lexical counter pass
Phase G shared infrastructure (no primitive yet emitted):

* New `_intent.py` — five precomputed first-token-hash sets (recon /
  exfil / persistence / lateral / destructive) with documented
  precedence, plus opsec-history and three lexeme sets (positive /
  negative / obscenity) for the typed-text counter pass. Stop words
  that collide with registry value vocabulary (`no`, `hell`, `ok`)
  are deliberately excluded — the PII regression test catches such
  collisions.

* `_typed_char_histograms()` extended with five integer counters
  populated in the same single-pass walk: `obscenity_hits`,
  `positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
  `bang_run_max`. Longest-suffix match against bounded lexicon
  (`LEXEME_MAX_LEN`); paste-class events excluded.

* `SessionContext` widened by the same five fields. Drives G.5
  (valence), G.6 (arousal), G.8 (frustration_venting) without retaining
  raw operator text.

* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
  caught by pre-commit pip-audit). Adjust ftp template type-ignore
  code from attr-defined to misc to match the new Twisted typing.

PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
2026-05-08 16:27:25 -04:00