Per-primitive state badge rendered next to each value in the
Behavioural Primitives panel. Five-state vocabulary, frozen, mirrors
decnet/correlation/attribution/aggregate.py:
* STABLE — green, low-key
* DRIFTING — amber, draws the eye
* CONFLICTED — red
* MULTI-ACTOR — purple, loudest (cross-primitive escalation lives
in attribution.multi_actor_suspected, not the
per-primitive badge)
* UNKNOWN — neutral border, no fill
Wiring:
* GET /api/v1/attackers/{id}/attribution on mount + on id change.
Failures swallowed silently (the worker may be off in dev).
* useAttackerStream gains attribution.state_changed +
attribution.multi_actor_suspected named events. The state-changed
handler merges by primitive and locks last_change_ts when the
state did not actually flip (defensive — backend already gates
these on transition, but a future relaxation shouldn't lie about
"stable since X" on the badge tooltip).
* multi_actor_suspected is wired but unused by the badges; the
per-primitive multi_actor signal already shows on each contributing
primitive. The handler is in place so a future "two operators
detected" banner has a live source.
Vitest: 4 new tests (badge renders only for mapped primitives, all
five states render with distinct labels, no badge when prop omitted)
on top of the existing 4. 7 of 7 pass; tsc + vite build clean.
Four synthetic operator-behaviour scenarios at the merger level
(aggregate_observations) that pin v0's calibration:
* Stable HUMAN over 7 sessions -> all primitives stable
* HUMAN switches to LLM mid-week -> primitives flip stable -> drifting
* Two operators alternating -> primitives flag multi_actor
(per-primitive; the cross-
primitive multi_actor_suspected
correlator is exercised by Phase 5)
* Single short session -> all primitives unknown
Plus a threshold-lockdown test that asserts every named constant in
_thresholds.py against its v0 ship value. Anyone adjusting a
threshold without updating the scenarios fails this file.
This closes DEBT-051 at v0 — the attribution engine has a calibrated,
test-locked answer to "is this attacker stable / drifting / showing
multiple operators?" without crossing the persona-attribution bright
line. v1 (cross-attacker clustering, KD simhash linkage signal) is
gated on this v0 surface being stable in production for >= 1 month.
GET /api/v1/attackers/{uuid}/attribution
Returns the merger output for an attacker's identity:
{
"identity_uuid": "abc..." | null,
"primitives": [
{primitive, current_value, state, confidence,
observation_count, last_change_ts, last_observation_ts},
...
]
}
Pre-attribution-worker: identity_uuid=null, primitives=[]. Surfacing
identity_uuid keeps the cross-attacker rollup story visible to the
frontend ahead of v1's clusterer landing.
api_events SSE relay also subscribes to attribution.> and forwards
to the AttackerDetail page filtered on payload.identity_uuid (the
identity is resolved at stream open from the URL's attacker_uuid;
attribution payloads are identity-keyed, not attacker-keyed). New
SSE event names: attribution.state_changed,
attribution.multi_actor_suspected.
Frontend (AttackerDetail.tsx badge rendering, useAttackerStream
consumer) deferred — there's already WIP on AttackerDetail.tsx in
the working tree; merging the badge logic is a separate commit
once that lands.
Tests: 4 endpoint scenarios — 401 unauth, 404 unknown attacker,
200 empty (no stub), 200 with primitive-ordered rows.
Add tick_multi_actor() — periodic walk of attribution_state firing
attribution.profile.multi_actor_suspected when an identity carries
>= MULTI_ACTOR_MIN_PRIMITIVES rows in multi_actor state.
* Repo's list_multi_actor_identities() already filters to >= 2
primitives; the correlator just dispatches.
* In-memory dedup keyed on identity_uuid -> frozenset(primitives):
same set as last fire -> no re-emit. Set grows -> re-emit.
Set shrinks below threshold -> evict so a future re-flap re-fires.
Restart-resets are honest because attribution_state persists; a
v1 multi_actor_suspect_log table can replace this if needed.
* run_attribution_loop() now supervises three concurrent tasks:
observation handler, multi_actor tick loop, health/control. Tick
interval comes from _thresholds.MULTI_ACTOR_TICK_SECS (60s) with
test override.
Tests: 6 scenarios — single-primitive doesn't fire, two-primitive
co-flag fires, dedup blocks unchanged set, set growth re-fires,
threshold drop re-arms, multiple identities fire independently.
attribution_worker.handle_observation_event now executes the full
end-to-end path:
* ensure stub identity (Phase 1)
* observations_for_identity_primitive() — new repo helper joining
observations through attackers.identity_id, so v1's clusterer
gets cross-attacker rollup for free
* aggregate_observations() with ValueKind dispatched off the BEHAVE
PRIMITIVE_REGISTRY; unknown primitives default to categorical
* upsert_attribution_state() — last_change_ts locked when state is
unchanged so the dashboard can render "stable since X"
* publish attribution.profile.state_changed only on transition;
idempotent re-runs over the same observation set fire nothing
(loop-prevention invariant matching ttp.tagged)
Tests:
* 5 end-to-end attribution scenarios over in-memory SQLite + FakeBus.
* test_base_repo's DummyRepo + coverage body now stub every abstract
surface BaseRepository declares — the 6 added by this branch plus
the 12 left un-stubbed by earlier work (BEHAVE Phase 1, TTP
rollups, iter helpers). The coverage test could not previously
even instantiate.
* test_aggregate_categorical's dispatcher rejection updated for the
Phase 3 + 4 contract — ValueError on unknown kinds, not
NotImplementedError.
aggregate_numeric(): EWMA + dispersion (CV) over numeric primitive
values. Stable when CV < 20% AND mean shift < 30%; drifting on >= 30%
mean shift; conflicted on CV > 100%. Confidence is 1 - min(CV, 1).
multi_actor is intentionally NOT a numeric state — bimodal
distributions belong to the categorical detector once the value space
is bucketed.
aggregate_hash(): counts distinct hash values within
HASH_DRIFT_WINDOW_SECS of the most recent observation. 0 rotations =
stable, 1..HASH_DRIFT_MAX = drifting, > HASH_DRIFT_MAX = conflicted.
Reads rotation events; never recomputes hashes (DEBT-032 already
produces them via decnet.correlation.fingerprint_rotation).
aggregate_observations() dispatcher now routes "categorical" |
"numeric" | "hash" | None and rejects unknown kinds with ValueError
(louder than NotImplementedError now that all three v0 mergers
exist). 17 synthetic-input tests cover both new mergers and the
dispatcher.
aggregate_categorical(): pure function over a per-(identity, primitive)
observation list. Five-state vocabulary, last-N=5 window comparison
with one-outlier-tolerant majority threshold:
* unknown — < 3 observations
* stable — recent 5 agree (≥ 4 of 5 share top value), older 5 same
* drifting — recent 5 stable but disagrees with older 5, or older
was conflicted and recent stabilised
* conflicted — recent 5 split, no two-value alternation pattern
* multi_actor — recent 5 split + alternation between exactly two
values (operator A↔B handoff). Confidence capped at 0.6 per
_thresholds.MULTI_ACTOR_MAX_CONFIDENCE; flapping primitives on
flaky networks would otherwise look like two operators.
aggregate_observations() dispatcher honours value_kind="categorical"
(or None) and raises NotImplementedError for "numeric" / "hash" so
Phase 3 lands cleanly. 14 synthetic-input tests cover every state
+ boundary condition.
v0 Phase 1 of ATTRIBUTION-ENGINE.md:
* AttributionStateRow SQLModel keyed on (identity_uuid, primitive)
per ANTI direction — re-keying state rows when the v1 clusterer
merges attackers is the migration debt v0 should not bake in.
ATTRIBUTION-ENGINE.md updated with the deviation note.
* AttributionMixin: ensure_stub_identity_for_attacker, idempotent
upsert_attribution_state, get_attribution_state[_for_identity],
list_multi_actor_identities (the Phase 5 correlator's read).
* attribution.profile.{state_changed,multi_actor_suspected} bus
topics + builder; wiki Service-Bus.md updated separately.
* attribution_worker.py: subscribes to attacker.observation.>,
ensures stub identity per event, logs and continues. No merger,
no state writes, no derived events — Phase 4 wires those.
* attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2
fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
Real-world bug surfaced on the first live decky run: sessrec.c's
json_escape (decnet/templates/_shared/sessrec/sessrec.c:111-141)
only escapes bytes < 0x20 + DEL — bytes >= 0x80 pass through raw.
An attacker pasting Latin-1 / GB18030 / any non-UTF-8 8-bit text
yields a shard line that chokes Python's default UTF-8 text-mode
read with 'utf-8 codec can't decode byte 0xac'.
Three changes:
1. _events_for_sid now opens with errors='surrogateescape', preserving
byte fidelity through the JSON parse. Surrogate-half chars
correctly fail isascii() / isalpha() so the typed-letter
histograms filter them out automatically. Tightening sessrec.c to
escape >= 0x80 is filed for v0.2 — that's the proper forensic-data
fix; the surrogateescape read makes the engine robust meanwhile.
2. Regression test
(test_handler_tolerates_non_utf8_bytes_in_shard) builds a shard
with raw 0xAC bytes inside a JSON 'data' string and asserts the
handler still persists observations.
3. Collector's _emit_session now logs at WARNING (was DEBUG) when
find_shard_with_sid returns None, citing the three usual causes
(ARTIFACTS_ROOT perms, _SERVICE_RE whitelist, sessrec/collector
race). Surfaces the silent-skip class of bug in seconds instead of
hours — the first live run hid a perm mismatch
(User=anti without SupplementaryGroups=decnet) for an entire
session window before the symptom was traced upstream.
Two-half deliverable per BEHAVE-INTEGRATION.md §587-594:
* scripts/behave_shell/replay_calibration.py — Python helper that
drives the production handler against one asciinema shard, mints
a temp SQLite repo + an Attacker per session, captures bus
emissions in-process. Exits non-zero on zero-observation sessions.
* scripts/behave_shell/smoke.sh — bash entry that replays all five
2026-05-02 calibration shards (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL). Auto-activates .311 venv, forces
DECNET_DB_TYPE=sqlite, prints per-class summary. Suitable for CI.
* scripts/behave_shell/README.md — runbook covering both halves.
Pins the manual live-decky procedure (one SSH session per class
against a deployed smoke-decky, expected dominant primitives table,
SQL verification query, AttackerDetail panel check, pass criteria).
* BEHAVE-INTEGRATION.md — Phase 6 completion log appended with
current corpus results table (15 sessions, 424 observations across
the five classes) and a note that the v0 tag (drop -pre) is gated
on the manual live-decky round-trip and lands as a separate
commit.
Live-decky run is intentionally NOT scripted — the integration doc
calls for manual SSH sessions per class so an operator confirms the
bus / collector / disk-reach plumbing under real PTY conditions.
Four tests pin the panel surface:
* Empty-state placeholder renders when no observations.
* Day-one priority primitives sort to the top of their group:
motor.input_modality first in motor; the three cognitive priority
primitives in documented order at the top of cognitive.
* Each row renders primitive leaf, value, and confidence-percent
badge.
* Groups follow the canonical domain order
(motor / cognitive / temporal / operational / environmental /
emotional_valence); unknown domains alphabetise at the end.
Mirrors the Orchestrator.test.tsx harness shape (DEBT-043). Live
update path (useAttackerStream → setObservations) is exercised
indirectly via the static render — the hook is dumb glue and the
state mutation is React-side.
Adds the AttackerDetail.tsx panel that surfaces BEHAVE-SHELL
behavioural primitives. Hydrates from the existing
GET /api/v1/attackers/{uuid} response field 'observations',
live-updates via the new useAttackerStream hook (replace-by-primitive
on every 'observation' SSE event).
* New BehaviouralPrimitivesPanel component, exported for vitest.
* Day-one render priority per BEHAVE-INTEGRATION.md §441-454:
motor.input_modality, cognitive.feedback_loop_engagement,
cognitive.command_branch_diversity,
cognitive.inter_command_latency_class — these four sort to the top
of their respective groups; everything else alphabetises.
* Grouped by top-level domain (motor / cognitive / temporal /
operational / environmental / emotional_valence) with the canonical
domain order; unknown domains alphabetise at the end.
* AttackerData interface gains an 'observations' field.
* Empty-state placeholder when the panel has nothing yet.
* Section collapse state extends to 'behavioural', defaults open.
tsc --noEmit clean. Vitest coverage ships in P5.4.
Per-attacker SSE consumer hook. Mirrors useIdentityStream's shape:
* Connects to /api/v1/attackers/{uuid}/events with ?token= auth.
* Per-event-name dispatch via addEventListener for snapshot,
observation, fingerprint.rotated, attacker.scored.
* Reconnect-on-error backoff (3s).
* Callback refs so consumer rerenders don't tear down the connection.
The 'observation' event handler receives every primitive's update
through one event name; the primitive rides in payload.primitive
(matches the backend's _sse_name_for collapse decision).
Hook coverage rides on P5.4's panel test.
GET /api/v1/attackers/{uuid}/events streams behavioural events for
one attacker. Mirrors decnet/web/router/topology/api_events.py
end-to-end: ?token= auth, require_stream_viewer gate,
sse_connection_slot per-user cap, snapshot-on-connect, three bus
subscriptions (attacker.observation.>, attacker.fingerprint_rotated,
attacker.scored) merged through asyncio.Queue, 15s keepalive,
request.is_disconnected() exit, finally task cancellation.
Per-attacker filter keys on payload['attacker_uuid'] which the
profiler worker stamps onto every published payload (Phase 5 P5.0
amendment) — O(1) drop without a repo round-trip per event.
_sse_name_for derives SSE event names:
attacker.observation.<primitive> → observation.<primitive>
attacker.fingerprint_rotated → fingerprint.rotated
attacker.scored → attacker.scored
10 tests cover snapshot, live forward, per-attacker filter (drops
other attackers' events), fingerprint.rotated forward, 404, 401, and
the sse-name derivation across all four cases. Topology events
regression green.
The profiler worker's per-observation publish now re-merges
attacker_uuid into the bus payload alongside id/ts/v. Same shape as
the existing DECNET-side deviation from BEHAVE's wire-format
docstring (BEHAVE-INTEGRATION.md §339-366) — widens the deviation
by one DECNET denorm field.
Phase 5's per-attacker SSE route can now filter
attacker.observation.* events to one attacker in O(1) without a repo
round-trip per event. identity_ref stays None today (until the
attribution engine ships); attacker_uuid is independent.
Two test changes:
* test_happy_path_persists_and_publishes asserts attacker_uuid is in
every published payload.
* New test_attacker_uuid_in_payload_for_filter pins the field
explicitly and confirms it doesn't conflate with identity_ref.
The profiler worker now consumes attacker.session.ended on the bus
AND walks unprofiled session_recorded log rows on every tick. Both
paths converge on a single handler that:
1. Validates required payload fields (session_id, decky_id, service,
attacker_ip, shard_path).
2. Builds evidence_ref shard:{decky}/{service}/{shard_basename}#{sid}
and skips when has_observations_for_evidence is True (idempotent
re-runs).
3. Resolves attacker_uuid via get_attacker_uuid_by_ip; defers if the
profiler tick hasn't materialised the row yet.
4. Reads the asciinema shard, slices events for the sid, calls
extract_session, persists each Observation via upsert_observation
(per-row; batch transaction filed as follow-up), then publishes
each on the bus best-effort (fire-and-forget per DEBT-029 §6).
Architecture:
* Handler lives in decnet/profiler/behave_shell/_handler.py — pure
function, unit-tested in isolation.
* Worker.py adds _behave_pump (queue feed), _drain_behave_queue
(per-tick drain), _behave_poll_tick (cursor scan over
session_recorded logs), and _payload_from_log_row (Log → bus-shape
payload projection).
* Poll cursor uses a separate state key
(attacker_worker_session_cursor) so the correlation tick's cursor
doesn't conflate.
* has_observations_for_evidence promoted to BaseRepository abstract.
22 new tests across handler / drain / poll layers covering happy
path, all skip paths, isolation against handler exceptions,
idempotency on re-run, and cursor key separation. TTP worker bus
tests still green — payload field is purely additive.
Closes BEHAVE-INTEGRATION.md Phase 4.
Lock the BEHAVE library versions per BEHAVE-INTEGRATION.md
§Versioning. The profiler worker (Phase 4 wiring) imports
`Observation`/`Window` from `decnet_behave_core.spec.envelope` and
`event_topic_for`/`to_event_payload` from
`decnet_behave_shell.spec.event_adapter`; without the pin a broken
wheel or missing install would only show up on first publish.
Four-test smoke pins the public surface: envelope construction,
registry import non-empty, event-adapter topic shape, and the
adapter's id/ts/v exclusion contract.
The collector's _SessionAggregator now resolves the asciinema shard
via find_shard_with_sid and stamps it onto every emitted
attacker.session.ended payload as `shard_path`. None when the shard
isn't on disk yet (collector race with sessrec flush) — consumers
treat that as "skip until next tick".
Additive field; existing TTP worker consumes the same topic and
ignores unknown keys, so no payload-version bump needed. Two new
tests pin the shard-found and shard-missing cases.
Unblocks BEHAVE-INTEGRATION Phase 4: the profiler worker reads
shard_path directly from the payload instead of disk-reaching.
Move `_find_shard_with_sid`, `_resolve_shard`, `_validate_names`,
`_get_index`, and the index cache from
`decnet/web/router/transcripts/api_get_transcript.py` into
`decnet/artifacts/shards.py`. The shared module speaks
`ValueError`; the router keeps thin wrappers that translate to
`HTTPException(400)` so the route's error UX is unchanged.
This unblocks the BEHAVE-INTEGRATION Phase 4 worker wiring — the
profiler worker (and the collector's session aggregator) need to
disk-reach asciinema shards but must not import from a FastAPI
router.
11 new unit tests for the shared helper. Existing transcript router
tests pass (the shard fixture's monkeypatch points at the shared
module's ARTIFACTS_ROOT now).
decnet.profiler.behave_shell.__version__ = '0.1.0-pre'.
The -pre suffix is honest: the extractor is feature-complete (37/37
Tier-A primitives emit, calibration grid honest), but the engine
package — worker wiring, observations writes, AttackerDetail panel —
still rides BEHAVE-INTEGRATION.md Phase 4. The actual 0.1.0 tag
lands when Phase 4 lands.
The marker version-tracks the engine, not the spec library
(decnet-behave-shell already at 0.1.0); they version independently.
Run the five-class calibration grid (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL) against the 2026-05-02 shards.
* Hard gate green for 27 primitives across all 5 shards.
* environmental.keyboard_layout moved from hard gate to
PHASE_F_CONDITIONAL_PRIMITIVES — short SSH-recon corpus maxes at
~90 typed letters per session, well below the LAYOUT_MIN_TYPED_LETTERS
(200) floor. The 200-floor stays per the per-phase "v0 ships when
honest" rule; longer-text corpora will surface the layout signal.
* Three primitives never fire on the 2026-05-02 corpus, all already
conditional and all expected:
- cognitive.error_resilience.frustration_typing
- environmental.locale
- environmental.keyboard_layout
No D / F / G threshold re-tunes needed; only the keyboard_layout
binding-set move. Phase H step log appended to BEHAVE-EXTRACTOR.md
with per-class observation counts.
Static assertion that every Tier-A primitive in PRIMITIVE_REGISTRY
has a slot in the calibration grid (hard gate or conditional set).
Excludes Tier B (8 cross-session primitives) and Tier C (toolchain.*)
by explicit allow-list and prefix filter.
Three checks:
* every Tier-A primitive is covered (forward direction)
* no extractor set drifts from the registry (reverse, catches typos)
* Tier-A count == 37 (design doc invariant)
CI now fails before a registry addition ships without a feature
function.
Widen calibration binding from PHASE_ABCDEF_PRIMITIVES (25) to
PHASE_ABCDEFG_PRIMITIVES (28 hard). Three Phase G primitives that
emit on any session-with-commands ride the hard gate:
* operational.opsec_discipline
* operational.cleanup_behavior
* emotional_valence.stress_response
The remaining five Phase G primitives ride a new
PHASE_G_CONDITIONAL_PRIMITIVES because their sample-size floors make
them legitimately absent from short shards:
* operational.objective (≥ 3 classified commands)
* operational.multi_actor_indicators (≥ 8 commands)
* emotional_valence.arousal (typing bursts)
* emotional_valence.valence (≥ 80 typed letters)
* emotional_valence.frustration_venting (≥ 30 typed letters)
Backwards-compat alias PHASE_ABCDEF_PRIMITIVES kept. Phase G
completion log + checkbox flips in BEHAVE-EXTRACTOR.md.
Tier-A corpus delta: all 37 Tier-A primitives now emit. Phase H
(full-corpus lockdown + v0 release) is next.
Compare median post-error intra-command IATs against baseline
(commands not immediately following an errored command):
* ratio ≥ STRESS_EUSTRESS_RATIO_MIN (1.20) → eustress_positive
* ratio ≤ 1/STRESS_DISTRESS_RATIO_MIN → distress_negative
* otherwise → none
Confidence hard-capped at 0.5; 0.30 below
STRESS_MIN_ERRORED_WITH_IATS (2).
high_agitated when any of:
* caps_run_max ≥ 5
* bang_run_max ≥ 3
* fastest typing burst median IAT < 0.06s with ≥ 30 IATs total
low_calm when slowest qualifying burst median IAT > 0.30s with ≥ 30
IATs. Else medium_engaged. Confidence hard-capped at 0.5; 0.30 below
AROUSAL_MIN_IATS.
Compare median intra-command IATs of the two temporal halves of the
session. ≥ MULTI_ACTOR_HALF_MIN_COMMANDS (4) per half required;
relative delta > MULTI_ACTOR_HANDOFF_DELTA (0.5) → handoff_detected.
team_coordinated is Tier B (cross-session); never emitted from a
single session. Confidence 0.55 with both halves ≥ 8 commands; 0.40
otherwise.
* careful — operator hits OPSEC_HISTORY_TOKENS AND tail-K commands
include _CLEANUP_TOKEN_HASHES (re-imported from temporal.py).
* learning — history hit without cleanup-tail follow-through.
* careless — no history-clearing vocabulary at all.
Confidence 0.45 (small lexicon, soft); 0.30 below
MIN_COMMANDS_FOR_FULL_CONFIDENCE.
Phase G shared infrastructure (no primitive yet emitted):
* New `_intent.py` — five precomputed first-token-hash sets (recon /
exfil / persistence / lateral / destructive) with documented
precedence, plus opsec-history and three lexeme sets (positive /
negative / obscenity) for the typed-text counter pass. Stop words
that collide with registry value vocabulary (`no`, `hell`, `ok`)
are deliberately excluded — the PII regression test catches such
collisions.
* `_typed_char_histograms()` extended with five integer counters
populated in the same single-pass walk: `obscenity_hits`,
`positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
`bang_run_max`. Longest-suffix match against bounded lexicon
(`LEXEME_MAX_LEN`); paste-class events excluded.
* `SessionContext` widened by the same five fields. Drives G.5
(valence), G.6 (arousal), G.8 (frustration_venting) without retaining
raw operator text.
* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
caught by pre-commit pip-audit). Adjust ftp template type-ignore
code from attr-defined to misc to match the new Twisted typing.
PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
Widens the binding calibration set from PHASE_ABCDE_PRIMITIVES (20)
to PHASE_ABCDEF_PRIMITIVES (25). The five new entries:
* environmental.shell_type (per-shard hard gate)
* environmental.terminal_multiplexer (per-shard hard gate)
* environmental.keyboard_layout (per-shard hard gate; PII boundary
lifted by ANTI; emits all 4 registry values)
* environmental.numpad_usage (per-shard hard gate)
* temporal.lifecycle_markers.exit_behavior (resolution of the E.4
hold; uses Command.followed_by_prompt from F.0)
environmental.locale joins a new PHASE_F_CONDITIONAL_PRIMITIVES set
(only fires on shards with an env / locale dump in the output).
Phase F completion log appended to BEHAVE-EXTRACTOR.md. The original
F.0 row hinted at D.0 subsumption; reversed in the log — D.0 is
enriched, not subsumed (regex catches errors when PS1 is suppressed).
Tier-A corpus delta: 25 of 37 primitives now emit. Phase G is next.
Resolves the E.4 hold from Phase E. F.0's Command.followed_by_prompt
gives us the exit-code proxy (prompt-after-last-command) we couldn't
get in Phase E.
Logic: last command without trailing prompt → abrupt; first_token_hash
in {exit, logout, quit, logoff} → graceful; any of the last K=3
commands' first_token_hash in {history, unset, rm, shred, clear, kill}
→ cleanup; else → graceful (clean Ctrl-D / window close).
Sliding-window scan over single-char digit input events. A run of
NUMPAD_RUN_MIN (4) consecutive digit events whose pairwise IATs are
all ≤ NUMPAD_FAST_IAT_S (50ms) → detected. Otherwise → not_detected.
Skips below NUMPAD_MIN_TYPED_CHARS (50) typed chars. Confidence cap
0.50 per the registry's weak-signal flag.
ANTI authorised dropping the PII boundary for this primitive. ctx
gains typed_unigram_counts / typed_bigram_counts / typed_letter_count
populated during the existing single-pass input walk (paste-class
events excluded).
Two-axis classifier:
* layout-artefact unigrams take priority — q rate above floor with
low English saturation → azerty; z above floor with y below → qwertz
* fallback to English-bigram saturation: ≥ floor → qwerty, else other
Sample-size floor 200 typed letters; bigram histogram capped at
top-64 to bound memory. Confidence cap stays moderate (0.40-0.55) —
heuristic discriminator.
Searches ANSI-stripped output for LANG / LC_ALL / LC_CTYPE envvar
substrings emitted by env / locale / printenv. Highest-priority key
wins (LC_ALL > LANG > LC_CTYPE); POSIX value normalised to BCP-47:
en_US.UTF-8 → en-US, pt_BR.UTF-8 → pt-BR, C/POSIX → und. Free-string
registry value emitted directly.
PII discipline: only the parsed locale value enters observations;
surrounding output is read once for matching and dropped.
Scans RAW output (multiplexer escapes are themselves ANSI; never
strip first) for tmux markers (DCS passthrough, focus-reporting,
window-title with tmux marker) and screen markers (DCS, screen-OSC).
Detected → tmux/screen at 0.85; otherwise → none at 0.55. Skips
emission entirely when no commands — silence on a pure-echo or
empty session, per the smoke gates.
When both detected (nested mux), prefer tmux.
Adds PromptLine dataclass + extract_prompt_lines() helper. PromptLine
carries ts, suffix_char ($/#/%/>), raw_line (ANSI-stripped, capped),
is_root flag. Populated during the existing single-pass output-window
walk; SessionContext gains prompt_lines, Command gains
followed_by_prompt.
PII trade-off (ANTI-authorised at Phase F): PS1 text retained on ctx
so F.1 / F.3 / E.4 can read it. Capped at PROMPT_LINE_MAX_CHARS=256.
Observations still only carry derived primitive values.
D.0's regex error helpers stay alongside (NOT subsumed) — they fire
even when PS1 echo is suppressed. F.0 enriches D.0 rather than
replacing it.
F.0's row in BEHAVE-EXTRACTOR.md was forward-only — readers landing
on Phase F couldn't tell that F.0 also has a backlog (E.4 held, D.0
subsumption). Add a 'Carry-overs F.0 must unblock' section to the
Phase F prelude and a back-reference on the F.0 checkbox in the
implementation order checklist.
Widens the binding calibration set from PHASE_ABCD_PRIMITIVES (17) to
PHASE_ABCDE_PRIMITIVES (20). The three shipped Phase E primitives
(session_duration, escalation_pattern, landing_ritual) join the
per-shard hard gate.
E.4 (temporal.lifecycle_markers.exit_behavior) is held at ANTI's
direction pending Phase F.0's prompt parser — abrupt-vs-cleanup
needs exit-code visibility to be honest, and first-token membership
alone over-fires on benign rm / clear mid-session. E.4 picks up at
the tail of Phase F.
Phase E completion log appended to BEHAVE-EXTRACTOR.md; E.1-E.3
checkboxes flipped, E.4 left unchecked with a held note.
Inspect the first N commands; if at least K of their first_token_hashes
match the recon-survey vocabulary (uname/id/whoami/pwd/hostname/w/who),
emit present, else absent. Hashes precomputed at module load; PII-safe.
v0.1 N=5, K=2.
Bin commands into non-overlapping windows of width
max(ESCALATION_WINDOW_MIN_S, duration_s / ESCALATION_WINDOW_TARGET).
CV of per-window counts + zero-window fraction classify bursty /
sustained / erratic. v0.1; corpus re-tune deferred.
Bucket ctx.duration_s against SESSION_DURATION_SHORT_MAX (60s) /
MEDIUM_MAX (600s) / LONG_MAX (3600s); else marathon. Direct
measurement, confidence 0.85. Skip emission only when no commands
and zero duration. New _features/temporal.py module opens Phase E.
Widens the binding calibration set from PHASE_ABC_PRIMITIVES (13) to
PHASE_ABCD_PRIMITIVES (17). The four unconditional Phase D primitives
(cognitive_load, exploration_style, planning_depth, tool_vocabulary)
join the per-shard hard gate. The three error_resilience.* primitives
are conditional on at least one errored command in the shard and
tracked in PHASE_D_CONDITIONAL_PRIMITIVES — excluded from the
per-shard required-emission set, included in the cross-class
discrimination check.
cognitive_load empirical re-tune deferred to the next
BEHAVE_CALIBRATION_DIR run; v0.1 thresholds ship.
Phase D completion log appended to BEHAVE-EXTRACTOR.md; Phase D
checkboxes flipped to [x].
For each errored command, check whether the next command's
first_token_hash is in {man, help, info} (precomputed at module
load). At least one match → present, else absent. The --help / -h
flag forms aren't first tokens; v0.2 will reconsider once arg-token
hashing is justified by corpus.
Compares median within-command IAT for commands following an errored
command vs commands following a successful one. Relative absolute delta
buckets to low / moderate / high. Skips when either group is empty
(no errors, or no clean baseline). v0.1; D.8 re-tunes.