feat(profiler/behave_shell): G.0 intent lexicon + lexical counter pass

Phase G shared infrastructure (no primitive yet emitted):

* New `_intent.py` — five precomputed first-token-hash sets (recon /
  exfil / persistence / lateral / destructive) with documented
  precedence, plus opsec-history and three lexeme sets (positive /
  negative / obscenity) for the typed-text counter pass. Stop words
  that collide with registry value vocabulary (`no`, `hell`, `ok`)
  are deliberately excluded — the PII regression test catches such
  collisions.

* `_typed_char_histograms()` extended with five integer counters
  populated in the same single-pass walk: `obscenity_hits`,
  `positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
  `bang_run_max`. Longest-suffix match against bounded lexicon
  (`LEXEME_MAX_LEN`); paste-class events excluded.

* `SessionContext` widened by the same five fields. Drives G.5
  (valence), G.6 (arousal), G.8 (frustration_venting) without retaining
  raw operator text.

* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
  caught by pre-commit pip-audit). Adjust ftp template type-ignore
  code from attr-defined to misc to match the new Twisted typing.

PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
This commit is contained in:
2026-05-08 16:27:25 -04:00
parent a25f4a890d
commit 289a64014c
5 changed files with 387 additions and 13 deletions

View File

@@ -79,7 +79,7 @@ dev = [
"pytest-xdist>=3.8.0",
"pytest-timeout>=2.4.0",
"flask>=3.1.3",
"twisted>=25.5.0",
"twisted>=26.4.0rc2",
"requests>=2.33.1",
"redis>=7.4.0",
"pymysql>=1.1.2",