feat(profiler/behave_shell): emit environmental.keyboard_layout

ANTI authorised dropping the PII boundary for this primitive. ctx
gains typed_unigram_counts / typed_bigram_counts / typed_letter_count
populated during the existing single-pass input walk (paste-class
events excluded).

Two-axis classifier:
* layout-artefact unigrams take priority — q rate above floor with
  low English saturation → azerty; z above floor with y below → qwertz
* fallback to English-bigram saturation: ≥ floor → qwerty, else other

Sample-size floor 200 typed letters; bigram histogram capped at
top-64 to bound memory. Confidence cap stays moderate (0.40-0.55) —
heuristic discriminator.
This commit is contained in:
2026-05-04 00:38:24 -04:00
parent b7ff5d2cc1
commit cd7c7ea5a2
5 changed files with 227 additions and 1 deletions

View File

@@ -238,6 +238,31 @@ SHELL_TYPE_MIN_PROMPTS: int = 3
# noise and skip emission (a single 'C' or 'en' is too thin).
LOCALE_MIN_VALUE_LENGTH: int = 2
# ── environmental.keyboard_layout (Step F.4) ───────────────────────────────
# ANTI authorised dropping the PII boundary for this primitive — typed
# bigram/unigram histograms ride on SessionContext to feed two
# independent layout signals:
#
# 1. English-bigram saturation (presumed-QWERTY signal)
# 2. Layout-artefact unigram rates (q for AZERTY, z/y swap for QWERTZ)
#
# Sample-size floor; below this typed-letter-count we skip emission.
LAYOUT_MIN_TYPED_LETTERS: int = 200
# Cap on bigram histogram size — bound memory while keeping the top
# bigrams that drive the saturation signal.
LAYOUT_BIGRAM_TOP_N: int = 64
# Top-10 English bigrams. Their summed frequency floor presumes QWERTY
# (the dominant English-typing layout).
LAYOUT_TOP_ENG_BIGRAMS: frozenset[str] = frozenset({
"th", "he", "in", "er", "an", "re", "on", "at", "nd", "ha",
})
# Layout-artefact thresholds. Fractions are over total ASCII-letter typed.
LAYOUT_AZERTY_Q_MIN: float = 0.020 # high `q` rate (mistyping AZERTY's `a`)
LAYOUT_AZERTY_ENG_MAX: float = 0.050 # AND low English saturation
LAYOUT_QWERTZ_Z_MIN: float = 0.030 # high `z` rate (German content / QWERTZ)
LAYOUT_QWERTZ_Y_MAX: float = 0.010 # AND `y` swap signature
LAYOUT_QWERTY_ENG_MIN: float = 0.080 # English-bigram saturation floor
# ── motor.keystroke_cadence (Step B.1) ──────────────────────────────────────
# Typing bursts split at gaps > IKI_THINK_MAX_S so think-pauses between
# commands don't inflate the within-burst CV. Mirrors the prototype's