diff --git a/development/ATTRIBUTION-ENGINE.md b/development/ATTRIBUTION-ENGINE.md new file mode 100644 index 00000000..4902d51c --- /dev/null +++ b/development/ATTRIBUTION-ENGINE.md @@ -0,0 +1,572 @@ +# Attribution Engine — Design + +**Status:** pre-implementation. This doc is the spec; code follows. +**Tracks:** DEBT-051 (cross-session BEHAVE primitive aggregation — +named in `BEHAVE-INTEGRATION.md`). +**Depends on:** `IDENTITY_RESOLUTION.md` (substrate shipped — table, +FK, lifecycle topics), `BEHAVE-INTEGRATION.md` (observation +producer), `DEBT-032` (fingerprint rotation, shipped). +**Engine home:** this repo, `decnet/correlation/attribution/` +(sublibrary inside the existing correlation worker — no new daemon). + +## Premise + +DECNET has three layers stacked above raw events. After +`BEHAVE-INTEGRATION.md` ships, we have: + +| Layer | What it stores | What it knows | +|---|---|---| +| **Observation** | `observations` table, one row per (sid, primitive) | "I saw value V for primitive P, sourced from session S, at time T, with confidence C." | +| **Attacker** | `attackers` table, one row per source IP | "These observations all came from IP X." | +| **Identity** | `attacker_identities` table (empty today — `IDENTITY_RESOLUTION.md`) | "These N attacker rows are the same hands." | + +BEHAVE *emits*. Attackers are *observed*. The attribution engine is +the layer that **concludes** — it links observations into identities +and surfaces a per-identity primitive map with explicit merge +semantics. This doc specifies it. + +## The bright line — lifted from BEHAVE, binding here + +The BEHAVE envelope module docstring +(`core/decnet_behave_core/spec/envelope.py:20-26`) draws an explicit +bright line: + +> Explicitly NOT for: identity attribution to named natural persons; +> access or admission decisions; biometric login; ML-driven user +> identification. Those framings push into legal/ethics territory the +> project will not walk into by accident. + +That binding statement carries forward. The attribution engine: + +- **Links observations to opaque identity UUIDs**, never to named + persons. +- **Emits probabilistic linkage**, never certainty. +- **Does not gate access** to anything — it's an analytics surface. +- **Does not output classifier verdicts** about "good" vs "bad" + operators; it surfaces *behavioural coherence* (these observations + cluster) and *behavioural drift* (this identity's primitives are + changing), and stops there. + +Crossing this line is grounds for ripping the engine out and +starting over. + +## What the engine IS, what it IS NOT + +| IS | IS NOT | +|---|---| +| A clusterer + state machine over BEHAVE observations | A keystroke-dynamics extractor (that's the engine in `BEHAVE-EXTRACTOR.md`) | +| The thing that writes `attacker_identities` rows | The thing that decides whether to block/alert/page on an attacker | +| The producer of `attribution.profile.*` events | The producer of `attacker.observation.*` events | +| Honest about uncertainty (every claim carries a confidence) | A binary classifier with an arbitrary threshold | +| Replayable / deterministic given the same observation sequence | A black-box ML model | + +## Architectural placement + +``` +/home/anti/Tools/DECNET/ +├── decnet/correlation/ EXISTING worker — gains a sublibrary + a new trigger +│ ├── worker.py gains attacker.observation.* subscription +│ ├── fingerprint_rotation.py UNCHANGED — already shipped (DEBT-032) +│ └── attribution/ NEW — pure attribution library +│ ├── __init__.py exposes link_observation(), aggregate_identity() +│ ├── linkage.py "which identity does this observation belong to?" +│ ├── aggregate.py per-(identity, primitive) merge state machine +│ ├── _signals/ per-signal scorers (jarm, hassh, kd, c2, ip) +│ └── _thresholds.py named constants, calibration-cited +└── decnet/web/db/models/ + ├── attacker_identities.py EXISTING (IDENTITY_RESOLUTION.md substrate) + └── attribution_state.py NEW — per-(identity, primitive) state rows +``` + +**No new worker.** The existing `decnet-correlation.service` +supervises this codepath. The correlation worker already owns +cross-attacker reasoning (DEBT-032 fingerprint rotation lives there). +Attribution is a natural peer. + +**Audit finding (correlation vs profiler).** Profiler emits +observations per-session (BEHAVE-SHELL extraction). Correlation +consumes observations across sessions and decides identity. Two +roles, two workers, clean cut. **Don't mix them.** + +## Two responsibilities, kept separate + +The engine has **two axes of work**, often confused: + +### Axis 1 — Linkage + +> "This new observation arrived. Which identity does it belong to?" + +Inputs: one observation (just arrived) + the existing identity table. +Output: one of {`assign-to-existing(uuid)`, `create-new()`, +`defer(reason)`}. + +Lives in `attribution/linkage.py`. Reads +`attacker.observation.*` events; writes `attacker_identities` rows +and `attackers.identity_id` FK; emits `identity.formed` / +`identity.observation.linked` (existing topics from +`IDENTITY_RESOLUTION.md`). + +### Axis 2 — Aggregation + +> "Given an identity's full observation history, what's the +> per-primitive summary I should surface to AttackerDetail / +> IdentityDetail?" + +Inputs: all observations linked to one identity. Output: a +per-primitive state map: `{primitive: (current_value, state, confidence, dispersion)}` +where `state ∈ {stable, drifting, conflicted, multi_actor, unknown}`. + +Lives in `attribution/aggregate.py`. Pure function — given the same +observation set, returns the same state map (replayability is +non-negotiable). + +**These two axes are separable.** v0 ships **aggregation only** (over +single-`attacker_uuid` proto-identities), solves DEBT-051. v1 adds +linkage (real clustering across attacker_uuids). v2 adds federation. +This ordering is deliberate — aggregation has narrower failure modes +and doesn't require the linkage signals to be calibrated yet. + +## v0 / v1 / v2 ladder + +### v0 — Aggregation over per-attacker proto-identities + +The substrate of `IDENTITY_RESOLUTION.md` ships empty: every +`attackers` row has `identity_id = NULL`. No clusterer means no +identity rows. v0 sidesteps this honestly: **treat each +`attacker_uuid` as its own proto-identity** and aggregate +observations over it. + +What v0 delivers: +- Per-(attacker_uuid, primitive) merge state machine. +- New `attribution_state` table holding the derived state. +- New `attribution.profile.*` bus topics emitting state transitions. +- AttackerDetail's "current state" panel gains state badges + (`stable / drifting / conflicted`) replacing today's naïve + latest-wins surface from `BEHAVE-INTEGRATION.md` Q3. + +What v0 does NOT do: +- No clustering across IPs. +- No identity rows ever populated. +- `IdentityDetail.tsx` (already built per `IDENTITY_RESOLUTION.md`) + stays unreached — there are no identities yet. + +**v0 closes DEBT-051.** That's the explicit scope. + +### v1 — Linkage (real clustering) + +What changes: +- Clusterer subscribes to high-confidence rotation-resistant signals + (HASSH, payload simhashes, keystroke-dynamics simhash, + C2 callbacks) and groups `attacker_uuid`s under + `attacker_identities.uuid`. +- v0's aggregation engine retargets from `attacker_uuid` to + `identity_uuid` once a cluster forms. +- `identity.formed` / `identity.observation.linked` / + `identity.merged` (existing topics) start firing. +- IdentityDetail.tsx starts seeing rows. + +What v1 does NOT do: +- No federation. Cluster decisions are master-local. +- No retroactive observation re-linking once an identity is committed + (that's a v1.5 problem, "stable" identities should be hard to + un-link silently). + +### v2 — Federation gossip + +What changes: +- Identities + their primitive-state maps gossip over the existing + swarm mTLS infra to peer masters. +- `schema_version` field on `attacker_identities` + (`IDENTITY_RESOLUTION.md` Risk #3) becomes load-bearing. +- Trust model is **social**, not cryptographic + (memory rule: federation trust is invite-based/human). + +Out of scope for this doc beyond noting it exists. Federation gets +its own design pass. + +--- + +## v0 design — Aggregation state machine + +The whole reason DEBT-051 was filed. This is the load-bearing piece. + +### State definitions + +For each `(attacker_uuid, primitive)` pair, the engine maintains a +state from this set: + +| State | Meaning | When to assert | +|---|---|---| +| `unknown` | Insufficient observations to classify | Default; < 3 observations OR all-`unknown` values | +| `stable` | Recent observations agree | Last N observations all share the same value | +| `drifting` | Recent observations disagree with older | Recent N != older N, but recent N is internally consistent | +| `conflicted` | Recent observations disagree with each other | Recent N is split (no majority) | +| `multi_actor` | Strong signal that two operators share access | Conflicted + alternation pattern (operator A → B → A → B), not random flip | + +### Per-primitive merge logic + +The engine carries a per-`ValueKind` merge function. Categorical +primitives dominate the calibration grid; numeric and hash primitives +need different math: + +#### Categorical (`motor.input_modality`, `cognitive.feedback_loop_engagement`, etc.) + +Last-N window comparison. With `N = 5` (configurable in +`_thresholds.py`): + +``` +recent_5 = observations[-5:] +older_5 = observations[-10:-5] # if available + +if all(o.value == recent_5[0].value for o in recent_5): + if older_5 and all(o.value == older_5[0].value for o in older_5): + if recent_5[0].value != older_5[0].value: + state = drifting + else: + state = stable + else: + state = stable # consistent with no older comparison +elif majority_value(recent_5): + state = stable # tolerant — one outlier in five is fine +else: + state = conflicted +``` + +`multi_actor` triggers on conflicted + temporal alternation +(operator A and B observations interleave on a session-level granularity, +not just within one session). Lower-confidence detection; +v0 emits at confidence ≤ 0.6 by design. + +#### Numeric (`toolchain.c2.beacon_interval_ms`, etc.) + +EWMA + dispersion. State = `stable` if dispersion < 20% of mean, +`drifting` if mean shifts > 30% over recent window, `conflicted` +if dispersion > 100%. + +#### Hash (`toolchain.tls.jarm_server`, `toolchain.ssh.hassh_client`) + +Already handled by DEBT-032 fingerprint rotation. Attribution engine +*reads* `attacker.fingerprint_rotated` events, doesn't recompute. +State = `stable` if no rotation, `drifting` if 1-2 rotations, +`conflicted` if > 2 rotations in a tight window. + +### Storage — the `attribution_state` table + +Materialised view of the state machine. Re-derivable from +`observations` + DEBT-032's rotation log; this table is a cache for +cheap reads, not a source of truth. + +```python +# decnet/web/db/models/attribution_state.py + +class AttributionStateRow(SQLModel, table=True): + __tablename__ = "attribution_state" + + # ── key ──────────────────────────────────────────────── + attacker_uuid: UUID = Field(foreign_key="attackers.uuid", primary_key=True) + primitive: str = Field(primary_key=True) + + # ── derived state ────────────────────────────────────── + current_value: dict[str, Any] | str | int | float | bool | list = \ + Field(sa_column=Column(JSON, nullable=False)) + state: str # 'stable' | 'drifting' | 'conflicted' | 'multi_actor' | 'unknown' + confidence: float # engine's confidence in the state assertion (not in any verdict) + observation_count: int # how many observations underlie this state + last_change_ts: float # when state last flipped + last_observation_ts: float # most recent observation that fed this row + + # ── audit ────────────────────────────────────────────── + schema_version: int = 1 # for federation, mirrors AttackerIdentity convention + updated_at: float + + __table_args__ = ( + Index("ix_attribution_state_state", "state"), + Index("ix_attribution_state_last_change", "last_change_ts"), + ) +``` + +`(attacker_uuid, primitive)` is the composite PK — at most one state +row per pair. v1 will rename `attacker_uuid` to a polymorphic +`subject_uuid` keyed on either attackers or identities (deferred — +don't pre-build the polymorphism before clustering ships). + +### Bus topics + +New, distinct from `IDENTITY_RESOLUTION.md`'s `identity.*` lifecycle +topics: + +| Topic | Payload | When | +|---|---|---| +| `attribution.profile.state_changed` | `{attacker_uuid, primitive, old_state, new_state, current_value, confidence, ts}` | State transitions (e.g. `stable` → `drifting`) | +| `attribution.profile.multi_actor_suspected` | `{attacker_uuid, primitives: [], evidence_summary, confidence, ts}` | When ≥ 2 primitives independently signal `multi_actor`; correlation is the trigger, not any single primitive | + +`identity.*` topics from `IDENTITY_RESOLUTION.md` stay reserved for +v1 (clusterer-emitted lifecycle events). v0 doesn't touch them. + +**Wiki:** `Service-Bus.md` documents these in the same commit that +adds the constants (`feedback_wiki_bus_signals`). + +### API surface + +``` +GET /api/v1/attackers/{uuid}/attribution + → { + "primitives": [ + { + "primitive": "motor.input_modality", + "current_value": "pasted", + "state": "stable", + "confidence": 0.91, + "observation_count": 7, + "last_change_ts": 1714521660.456 + }, + ... + ] + } +``` + +AttackerDetail.tsx merges this with the latest-per-primitive query +from `BEHAVE-INTEGRATION.md`. The state badge is the new bit. + +The SSE route from `BEHAVE-INTEGRATION.md` +(`GET /api/v1/attackers/{uuid}/events`) gains forwarded +`attribution.profile.state_changed` events so the badge updates live. + +--- + +## Linkage signals (v1 — not v0) + +For when v0 is stable and we promote attacker_uuid → identity_uuid. +Documented here so v0 doesn't paint into a corner. + +### Signal weights + +Each signal contributes to a linkage score. Two `attacker_uuid`s +with combined score above the threshold get clustered. + +| Signal | Strength | Why | Cost | +|---|---|---|---| +| Same `kd_digraph_simhash` (Hamming distance < 8) | **STRONG** | Keystroke rhythm is hard to fake without effort | Computed at session-end by BEHAVE engine | +| Same C2 callback endpoint | **STRONG** | Operator infra is sticky | Already extracted | +| Same `hassh_client` | MEDIUM | Tools change less than IPs | Already in `attacker_behavior` | +| Same `jarm_server` (if attacker exposes services) | MEDIUM | Probed-attacker substrate (DEBT-032) | Already shipped | +| Same `tcp_fingerprint` cluster | WEAK | OS info, easily collided | Already in `attacker_behavior` | +| Same source IP | **REJECT** | Triggers naïvely on NAT collisions; never use IP alone | n/a | + +### Threshold + +Single combined score, calibrated against: +- **False merges**: two distinct attackers collapsed into one (silent + miscount). HARD failure — engine refuses to merge below ~0.85. +- **Missed merges**: two observations from the same operator + unrelated. Soft failure — operator can review unmerged candidates + in IdentityDetail's "candidate links" panel and merge manually. + +The threshold lives in `_thresholds.py` like the BEHAVE-SHELL +engine's; calibration cycle ships with the linkage code. + +### Soft-merge audit trail + +`attacker_identities.merged_into_uuid` already exists from +`IDENTITY_RESOLUTION.md`. v1 uses it. When the clusterer reverses an +earlier merge (rare but real), the loser row's `merged_into_uuid` is +NULLed and a `attribution.profile.split_proposed` event surfaces in +the operator's review queue. + +--- + +## Phase plan + +Per the "commit per task" + "tests per task" memory rules. Each +phase is one commit. + +### Phase 1 — Schema + topics + empty handler + +- New `attribution_state` SQLModel + migration (none needed pre-v1, + per the memory rule — just edit the model). +- `decnet/bus/topics.py` registers `attribution.profile.*` prefix. +- `decnet/correlation/worker.py` gains an + `attacker.observation.*` subscription handler that does + **nothing yet** — just logs. Proves the wiring. +- Wiki `Service-Bus.md` update co-commits. +- Tests: SQLModel CRUD on `attribution_state`, bus subscription + handler is exercised by FakeBus. + +Commit: `feat(correlation/attribution): substrate + idle handler`. + +### Phase 2 — Categorical merge function + +- `attribution/aggregate.py:_aggregate_categorical(observations) → (value, state, confidence)`. +- Implements the last-N comparison logic above. +- Pure function. Synthetic-input tests covering each state transition + (unknown → stable → drifting → stable, conflicted, multi_actor). +- No DB, no bus, no I/O. + +Commit: `feat(correlation/attribution): categorical merge state machine`. + +### Phase 3 — Hash + numeric merge functions + +- `_aggregate_hash` reads `attacker_fingerprint_rotation` events + (DEBT-032 already produces them). +- `_aggregate_numeric` does EWMA + dispersion. +- Per-`ValueKind` dispatcher in `aggregate.py` picks the right + function. +- Tests for each value-kind path. + +Commit: `feat(correlation/attribution): hash + numeric merge functions`. + +### Phase 4 — Wire into the worker + +- Subscription handler reads each `attacker.observation.*` event, + loads the prior `AttributionStateRow` (if any), runs the merger, + upserts the new state, emits `attribution.profile.state_changed` + on transition. +- Trigger isolation: handler exceptions logged, do not affect + fingerprint-rotation or any other correlator path. +- Tests: end-to-end with FakeBus + in-memory DB, observation-in → + state-row-out + transition-event-out. + +Commit: `feat(correlation/attribution): wire bus handler, persist state`. + +### Phase 5 — `multi_actor_suspected` cross-primitive correlator + +- Periodic tick (every 60s default — configurable) walks + `attribution_state` rows where `state = 'multi_actor'`, groups by + `attacker_uuid`, fires + `attribution.profile.multi_actor_suspected` if ≥ 2 primitives flag + the same attacker_uuid concurrently. +- Tests: synthetic state rows, assert event fires only on co-flag. + +Commit: `feat(correlation/attribution): cross-primitive multi-actor detection`. + +### Phase 6 — API surface + +- `GET /api/v1/attackers/{uuid}/attribution` route + Pydantic model. +- AttackerDetail.tsx renders state badges per primitive in the + Behavioural Primitives panel. +- SSE route forwarding `attribution.profile.state_changed` events + filtered by attacker_uuid. +- Frontend Vitest coverage. + +Commit: `feat(web): expose attribution state on AttackerDetail`. + +### Phase 7 — v0 lockdown + +- Synthetic calibration scenarios (extending the BEHAVE-SHELL + calibration grid concept): + - "Stable HUMAN over 7 sessions" → all primitives `stable` + - "HUMAN switches to LLM mid-week" → primitives flip + `stable` → `drifting` + - "Two operators alternating on shared creds" → ≥ 2 primitives + flag `multi_actor` + - "Single short session" → all primitives `unknown` +- All four scenarios green in CI. + +Commit: `test(correlation/attribution): v0 calibration lockdown`. + +--- + +## Out of scope + +Filed for future paydown when they bite. Do not let them creep into +v0. + +- **Linkage / clustering across attacker_uuids.** That's v1. +- **Federation gossip of identities.** That's v2. +- **Identity-level intel** (`attacker_identity_intel` from + `IDENTITY_RESOLUTION.md`). Different lifecycle, ships with v1. +- **Manual operator merge UI.** Operators can't fix clusterer + mistakes from the dashboard — the read-only API stays read-only + in v0. Editable identity rows are a v1 concern. +- **Retroactive re-aggregation** when thresholds change. v0 + recomputes lazily on next observation per attacker; no batch + re-walk. +- **Confidence calibration against ground truth.** No ground-truth + data exists yet. v0 confidence values are heuristic; calibration + ships when red-team exercises produce labelled trace data. +- **Persona-classification** (e.g. "this identity behaves like a + bot"). The bright line forbids this. State machine emits + *coherence* and *drift*, not classifier labels. + +## Resolved decisions + +- **Where the engine lives.** RESOLVED: + `decnet/correlation/attribution/`, sublibrary inside the existing + correlation worker. No new daemon. Symmetric with BEHAVE-SHELL's + placement under `decnet/profiler/behave_shell/`. +- **Linkage vs aggregation separation.** RESOLVED: two axes, two + modules (`linkage.py` / `aggregate.py`). v0 ships aggregation + only. +- **Topic namespace.** RESOLVED: `attribution.profile.*` for + derived state, distinct from `IDENTITY_RESOLUTION.md`'s + `identity.*` lifecycle topics. The two namespaces compose; they + don't overlap. +- **State machine vocabulary.** RESOLVED: + `unknown / stable / drifting / conflicted / multi_actor`. + Five states, no more (resist the urge to grow the enum). +- **Subject of attribution in v0.** RESOLVED: `attacker_uuid`, + not `identity_uuid`. v1 widens. + +## Real open questions + +These are not stoppers for v0 but need answers before the engine +ships beyond v0. + +1. **`multi_actor` false-positive cost.** A flapping primitive can + look like multi-actor when it's really an operator on a flaky + network or split between phone/laptop. v0's confidence ≤ 0.6 cap + helps but doesn't eliminate it. Open: what's the operator-facing + UX for a `multi_actor` claim that's wrong? +2. **Window size `N`.** v0 hardcodes `N=5` for last-N comparison. + This is calibrated against typical session counts (most attackers + are observed < 10 times before they go quiet). Operators with + long-running attackers (resident threats) may want a wider + window; needs config knob in v1. +3. **Primitive-weight asymmetry.** Today every primitive contributes + equally to the implicit "is this attacker behavioural-stable?" + summary. But `motor.input_modality` is far more discriminative + than `temporal.weekend_cadence`. Open: do we expose primitive + weights in the API, or just sort by confidence? +4. **Observation-to-row contention.** A burst of observations for + the same `(attacker_uuid, primitive)` pair (e.g. a long session + with 50 sub-observations) hits the same row 50 times. v0 reads + the row, runs the merger, writes back — under load this is a + serialised hot path. Open: should the merger batch-process within + one tick, or is per-observation latency cheap enough? +5. **What happens to `attribution_state` rows when an + `attacker_uuid` is deleted?** No `attackers` deletion path + exists today, but if/when one ships (GDPR purge, federation + resync), `ON DELETE CASCADE` is the obvious choice. File when it + matters. + +--- + +## Implementation order checklist + +A single page you can paste into a TODO and tick off: + +- [ ] Phase 1 — Schema + topics + idle handler +- [ ] Phase 2 — Categorical merge function (pure, no I/O) +- [ ] Phase 3 — Hash + numeric merge functions +- [ ] Phase 4 — Wire bus handler, persist state +- [ ] Phase 5 — `multi_actor_suspected` cross-primitive correlator +- [ ] Phase 6 — API + AttackerDetail badges + SSE forwarding +- [ ] Phase 7 — v0 calibration scenarios lockdown + +Seven commits, seven test sets. v0 closes DEBT-051 and gives +operators an honest "is this attacker behaviourally stable, drifting, +or showing multiple operators?" surface — without crossing the +attribution-of-natural-persons bright line. + +After v0, v1 (linkage / clustering) is gated on: +- v0 stable in production for ≥ 1 month +- ≥ 1 high-discrimination linkage signal calibrated + (keystroke-dynamics simhash from BEHAVE-SHELL is the obvious + candidate; v1 of the BEHAVE engine adds it post-step-10) + +--- + +**Owner:** ANTI. +**Implementation gate:** this doc reviewed → Phase 1 starts after +`BEHAVE-INTEGRATION.md` v0 is live (observation table populated + +worker emitting `attacker.observation.*` events). diff --git a/development/BEHAVE-EXTRACTOR.md b/development/BEHAVE-EXTRACTOR.md new file mode 100644 index 00000000..2f5bcc6c --- /dev/null +++ b/development/BEHAVE-EXTRACTOR.md @@ -0,0 +1,702 @@ +# BEHAVE-SHELL Extraction Engine — Implementation Route + +**Status:** pre-implementation. Sibling to `BEHAVE-INTEGRATION.md`. +**Scope:** the inside of `decnet/profiler/behave_shell/`. Nothing else. +**Acceptance gate:** the five-class calibration grid in +`BEHAVE-INTEGRATION.md` §"Calibration grid IS the regression test." + +This doc is the **construction manual** for the engine. The +integration doc says *what* the engine plugs into; this doc says +*how to build it from zero to v0 in a deterministic sequence*. + +--- + +## Mission + +Take an asciinema-style PTY event stream for one session, return an +`Iterable[Observation]` of BEHAVE-SHELL primitives. Pure library: +no I/O, no bus, no DB. Worker owns those. + +```python +def extract_session( + events: Iterable[AsciinemaEvent], # [t_float, kind: 'i'|'o', data: str] + *, + sid: str, + source: str = "decnet/profiler/behave_shell/extract.py", +) -> Iterable[Observation]: +``` + +`AsciinemaEvent` is a 3-tuple `(t, kind, data)` matching the on-disk +shard line format. No fancy class — a tuple is honest about what it is. + +## Single-pass discipline + +A naïve engine re-walks the event stream once per primitive, paying +O(n × primitives) for nothing. We don't do that. + +Single pass over events builds a `SessionContext` — a precomputed +bundle of indexes that every feature module reads from. Cheap; one +walk; reproducible. + +```python +@dataclass(frozen=True, slots=True) +class SessionContext: + sid: str + source: str + evidence_ref: str + t_start: float + t_end: float + duration_s: float + + # Raw event slices (already filtered by kind) + input_events: tuple[InputEvent, ...] # ('i', t, data) + output_events: tuple[OutputEvent, ...] # ('o', t, data) + + # Derived once, used everywhere + iats: tuple[float, ...] # IATs between input events + paste_bursts: tuple[PasteBurst, ...] # detected paste regions + commands: tuple[Command, ...] # split on \r / \n + inter_cmd_iats: tuple[float, ...] # IATs between command boundaries + output_per_cmd: tuple[int, ...] # output bytes between cmd_i and cmd_{i+1} +``` + +All feature modules take `ctx: SessionContext` and yield 0 or more +Observations. Single source of truth, single parse cost. + +## Engine layout + +``` +decnet/profiler/behave_shell/ +├── __init__.py re-exports extract_session +├── extract.py extract_session() + SessionContext build +├── _parse.py asciinema event types + parsing helpers +├── _ctx.py SessionContext dataclass + builders +├── _thresholds.py all numeric thresholds, one place, named constants +└── _features/ + ├── __init__.py FEATURES tuple — registered list of feature funcs + ├── motor.py + ├── cognitive.py + └── temporal.py (later) +``` + +`extract.py` is short: + +```python +def extract_session(events, *, sid, source="..."): + ctx = build_session_context(events, sid=sid, source=source) + for feature_fn in FEATURES: + yield from feature_fn(ctx) +``` + +That's the whole orchestration. Adding a primitive = adding a function +to `_features/.py` and registering it in `FEATURES`. + +## Threshold table convention + +Every numeric threshold lives in `_thresholds.py` as a named constant +with a docstring citing the registry's `notes:` field. **Never inline +magic numbers in feature code.** When calibration drifts, you change +one file. + +```python +# decnet/profiler/behave_shell/_thresholds.py +"""Numeric thresholds for BEHAVE-SHELL primitive classification. + +Each constant cites its calibration source. When the registry's +`notes:` field disagrees with a constant here, the registry is +authoritative — fix the constant, re-run the grid. +""" + +# motor.paste_burst_rate buckets — events per minute of session +PASTE_RATE_OCCASIONAL_MIN = 0.5 # at least one paste every two minutes +PASTE_RATE_HABITUAL_MIN = 3.0 # paste-driven workflow + +# cognitive.inter_command_latency_class — seconds (median IAT between commands) +ICL_TYPING_SPEED_MAX = 2.0 +ICL_DELIBERATE_MAX = 8.0 +ICL_LLM_LIGHTWEIGHT_MAX = 8.0 # 2-8s band; lower bound = ICL_TYPING_SPEED_MAX +ICL_LLM_HEAVYWEIGHT_MAX = 30.0 # 8-30s band — registry primitives.py:140-149 +# > 30s = "long" +``` + +## Full registry scope — what the engine owns, what it doesn't + +Before the route: a sober count. The BEHAVE-SHELL registry today +contains roughly **53 primitives** across 8 top-level domains. Not +all of them are extractable from a single PTY session; some need +observation history; some belong to a different sensor entirely. + +Three tiers: + +### Tier A — Per-session shell-extractable (37 primitives) + +Computable from one `(decky, service, sid)` shard. The extractor +owns these end-to-end. + +| Domain | Primitive | Source signal | +|---|---|---| +| motor | `motor.input_modality` | paste-burst detector | +| motor | `motor.paste_burst_rate` | paste-burst counter | +| motor | `motor.keystroke_cadence` | IAT histogram shape | +| motor | `motor.motor_stability` | IAT outlier rate | +| motor | `motor.error_correction` | backspace-relative-to-error timing | +| motor | `motor.command_chunking` | intra-command IAT variance | +| motor | `motor.shell_mastery.tab_completion` | `\t` rate per command | +| motor | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate | +| motor | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command | +| cognitive | `cognitive.inter_command_latency_class` | median inter-command IAT bucketed | +| cognitive | `cognitive.inter_command_consistency` | CV of inter-command IATs | +| cognitive | `cognitive.command_branch_diversity` | unique-first-token / total-commands | +| cognitive | `cognitive.feedback_loop_engagement` | Pearson r(output_bytes, next_pause) | +| cognitive | `cognitive.cognitive_load` | composite (IAT entropy + error rate + chunking) | +| cognitive | `cognitive.exploration_style` | command-graph branching shape | +| cognitive | `cognitive.planning_depth` | think-pause-length distribution | +| cognitive | `cognitive.tool_vocabulary` | distinct first-tokens normalised | +| cognitive | `cognitive.error_resilience.retry_tactic` | post-error command relation | +| cognitive | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta | +| cognitive | `cognitive.error_resilience.fallback_to_man` | `man`/`--help` invocation post-error | +| temporal | `temporal.session_duration` | `duration_s` bucketed | +| temporal | `temporal.escalation_pattern` | command-rate over rolling windows | +| temporal | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature | +| temporal | `temporal.lifecycle_markers.exit_behavior` | last-command + exit-code analysis | +| operational | `operational.objective` | command-intent classifier (recon / exfil / persistence / lateral / destructive) | +| operational | `operational.opsec_discipline` | history-clearing, log-tampering, .bash_history rm | +| operational | `operational.cleanup_behavior` | exit-time cleanup commands | +| operational | `operational.multi_actor_indicators` | mid-session pace/style shift detection | +| environmental | `environmental.shell_type` | prompt-string sniff from `'o'` events | +| environmental | `environmental.terminal_multiplexer` | tmux/screen escape sequences | +| environmental | `environmental.keyboard_layout` | bigram-frequency layout fingerprint | +| environmental | `environmental.locale` | `LANG`/`LC_*` envvar dump if `env` runs; output language sniff | +| environmental | `environmental.numpad_usage` | numeric input arrival pattern (weak) | +| emotional_valence | `emotional_valence.valence` | obscenity / praise / neutral lexicon | +| emotional_valence | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs | +| emotional_valence | `emotional_valence.stress_response` | post-error speed-up vs slow-down | +| emotional_valence | `emotional_valence.frustration_venting` | `fuck`/`shit`/etc. detection (registry value is binary) | + +The emotional_valence primitives are SOFT and will produce false +positives. Documented as such; emit at confidence ≤ 0.5 per the +confidence convention. + +### Tier B — Cross-session (computed by attribution engine, not extractor) + +8 primitives that **cannot honestly be computed from one session**. +The extractor does not emit these. The attribution engine +(`ATTRIBUTION-ENGINE.md`) computes them during aggregation, reading +the per-attacker observation history. Cross-reference: a TODO in +`ATTRIBUTION-ENGINE.md` notes that aggregation may include +*derivation*, not just *merging*. + +| Domain | Primitive | Why cross-session | +|---|---|---| +| temporal | `temporal.session_timing` | diurnal/nocturnal/irregular requires multiple sessions | +| temporal | `temporal.persistence` | hit_and_run/return_visitor/resident is intrinsically multi-session | +| temporal | `temporal.lifecycle_markers.idle_periodicity` | periodicity needs a long enough sample | +| cultural | `cultural.meal_break_gaps` | gap pattern over days | +| cultural | `cultural.periodic_micro_pauses` | needs many sessions to find regular intervals | +| cultural | `cultural.dst_behavior` | needs sessions spanning a DST transition | +| cultural | `cultural.weekend_cadence` | needs a week+ of sessions | +| cultural | `cultural.holiday_gaps` | needs ≥ a year for honest claim | + +If you find yourself implementing one of these in the extractor, +**stop**. It's an attribution-engine concern. + +### Tier C — Network domain (out of scope for this engine entirely) + +The full `toolchain.*` subtree — +TLS / transport / SSH / HTTP / C2 / protocol_abuse / payload +fingerprints. Roughly 25 primitives. These come from the sniffer / +prober / correlation pipeline, not from PTY session extraction. + +Two paths to populate them, both NOT this doc: + +1. **Wrap existing DECNET workers** (sniffer, prober, correlation, + intel) to emit `attacker.observation.toolchain.*` from their + existing outputs. Pragmatic, ships sooner. Filed as a future + "wire existing producers to BEHAVE" track (mentioned in + `BEHAVE-INTEGRATION.md` Out of Scope, around the + `toolchain.c2.beacon_*` overlap with profiler's existing + `behavioral.py`). +2. **Future BEHAVE-NETWORK extractor** parallel to BEHAVE-SHELL, + eating PCAP / netflow / TLS-handshake records. Cleaner long-term + architecture; substantial effort. + +Either way, **not extractor work for this doc.** + +## Confidence convention + +Every emitted Observation must carry a `confidence` in `[0.0, 1.0]`. +Three rules: + +1. **Sample-size honesty.** A primitive computed from < 5 samples + gets `confidence ≤ 0.5`. A bucket-classification with no IATs + should emit `unknown` (where the registry permits) at + `confidence = 1.0` — the *fact* of insufficient data is itself a + high-confidence observation. +2. **Threshold proximity.** If the measured value is within 10% of a + bucket boundary, drop confidence by 0.2. Sitting on the fence is a + real signal; pretending you know is dishonest. +3. **Output-stream availability.** Primitives that need `[t,"o",d]` + events drop confidence to 0.0 and skip emission entirely if the + shard contains no output events. Don't fabricate. + +Confidence is **the sensor's confidence in its measurement**, not in +any downstream verdict — same line BEHAVE draws. + +--- + +## The route to v0 — every Tier-A primitive emits + +**v0 ships the entire BEHAVE-SHELL Tier-A corpus.** All 37 +shell-extractable primitives in the registry must have a feature +function emitting them before the engine tags v0. Anything less is +v0-pre. + +The route is broken into **eight phases (A–H)** that each ship a +coherent slice with its own tests. With the architecture locked +(`SessionContext`, `_features/`, `_thresholds.py` already designed), +each primitive is a small, well-bounded chunk — most are dozens of +lines plus tests. The two real cost centres are Phase F (prompt +parser) and Phase G (command-intent lexicon); both bounded by the +calibration notes already in the registry. Phase A establishes the +6-primitive calibration floor (the discriminative grid). Phases B–G +expand horizontally across the registry. Phase H is the full-corpus +lockdown + v0 release. + +Each step within a phase is one commit (per the "commit per task" +memory rule), with its own tests in the same commit (per "tests per +task"). No step is allowed to land red against the calibration grid +once Phase A locks it in. + +### Phase A — Calibration floor (Steps 0–10) + +**Goal:** establish the 6-primitive set that discriminates the +five-class calibration grid. Lock the gate. + +This is the foundation. Phases B–G cannot start until Phase A green. + +### Step 0 — Scaffold + smoke + +**Goal:** prove the wiring before any logic. + +- Create `decnet/profiler/behave_shell/{__init__,extract,_parse,_ctx,_thresholds}.py`. +- `extract_session()` parses events into a minimal `SessionContext`, + registers an empty `FEATURES = ()`, returns no observations. +- `tests/profiler/behave_shell/test_extract_smoke.py` asserts: + - empty events → empty iterable + - one input event → SessionContext built, t_start/t_end/duration_s correct + - import path works + +Commit message: `feat(profiler/behave_shell): scaffold extract_session entry point`. + +### Step 1 — Asciinema parser + paste-burst detector + +**Goal:** the shared primitives that two feature modules will consume. + +- `_parse.py`: types (`InputEvent`, `OutputEvent`, `PasteBurst`, + `Command`) + `parse_event(line: str | dict) -> AsciinemaEvent`. +- `_ctx.py`: `build_session_context()` populates `iats`, + `paste_bursts` (chunks where consecutive IATs < `PASTE_IAT_MAX_S` + AND chunk size > `PASTE_MIN_CHARS`). +- Tests: synthetic streams covering pure-typed, pure-pasted, mixed. + +Commit: `feat(profiler/behave_shell): asciinema parser + paste-burst detection`. + +### Step 2 — `motor.input_modality` (FIRST PRIMITIVE) + +**Goal:** prove the end-to-end pipeline emits a single registry-valid +Observation. + +Why first: highest discriminative value (HUMAN vs everyone), simplest +implementation (just count paste-burst chars vs typed chars). + +- `_features/motor.py:input_modality(ctx)` yields one Observation + with value in `{"typed", "pasted", "mixed"}`. +- Register in `FEATURES`. +- Tests: + - synthetic typed stream → `typed` + - synthetic pasted stream → `pasted` + - HUMAN calibration shard → `typed` + - YOU-sim calibration shard → `pasted` + +After this step, the calibration grid passes for **one column** and +the integration is end-to-end live (Phase 4 of the integration plan +becomes wireable, not just blocked on theory). + +Commit: `feat(profiler/behave_shell): emit motor.input_modality`. + +### Step 3 — `motor.paste_burst_rate` + +**Goal:** second primitive, builds on the paste-burst index from +step 1. Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL. + +- `_features/motor.py:paste_burst_rate(ctx)` → `none / occasional / habitual`. +- Threshold constants in `_thresholds.py`. +- Tests + grid extension. + +Commit: `feat(profiler/behave_shell): emit motor.paste_burst_rate`. + +### Step 4 — Command segmentation (no primitive) + +**Goal:** shared utility for the three cognitive primitives next in +line. Pure refactor inside `_ctx.py`. + +- `commands` populated: split input stream on `\r` (and `\n`) into + `Command(start_ts, end_ts, first_token_hash)` records. +- **PII discipline:** store only the *first token* (or its hash) plus + timing. Never the full command body. Branch-diversity needs the + first token; nothing needs the rest. +- `inter_cmd_iats` and `output_per_cmd` populated. +- Tests for segmentation edge cases (no trailing newline, multiple + newlines in a paste, etc). + +Commit: `feat(profiler/behave_shell): command segmentation in SessionContext`. + +### Step 5 — `cognitive.inter_command_latency_class` + +**Goal:** classify the operator's *thinking pace* between commands. +Splits LW-sim / CLAUDE-FF / CLAUDE-CL. + +- `_features/cognitive.py:inter_command_latency_class(ctx)` → + `instant / typing_speed / deliberate / llm_lightweight / llm_heavyweight / long`. +- Median of `inter_cmd_iats`, bucketed against `_thresholds.py`. +- Confidence drops if < 5 commands. +- Tests + grid extension. + +Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_latency_class`. + +### Step 6 — `cognitive.command_branch_diversity` + +**Goal:** content-based playbook-vs-adaptive split. Splits CLAUDE-FF +from CLAUDE-CL. + +- `_features/cognitive.py:command_branch_diversity(ctx)` → + `linear_playbook / adaptive_branching / unknown`. +- `unique_first_tokens / total_commands` ratio against threshold. +- `unknown` when total_commands < 5 (registry-allowed). +- Tests + grid extension. + +Commit: `feat(profiler/behave_shell): emit cognitive.command_branch_diversity`. + +### Step 7 — `cognitive.feedback_loop_engagement` + +**Goal:** the orthogonal axis — does the operator's pause-after-command +correlate with output bytes? Splits HUMAN/CLAUDE-CL (closed) from +LW-sim/CLAUDE-FF (fire-and-forget). + +- Requires `output_per_cmd[i]` paired with `inter_cmd_iats[i+1]`. +- Pearson correlation; bucket on r > 0.3 / r ≈ 0 / insufficient. +- `_features/cognitive.py:feedback_loop_engagement(ctx)` → + `closed_loop / fire_and_forget / unknown`. +- **First primitive that depends on output events.** If the shard + carries no `'o'` events (rare but possible — minimal recorders), + emit `unknown` at confidence 1.0. +- Tests + grid extension. + +Commit: `feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement`. + +### Step 8 — `cognitive.inter_command_consistency` + +**Goal:** dispersion/bimodality of command IATs. +HUMAN-bimodal vs LLM-metronomic. + +- CV of `inter_cmd_iats` → `metronomic` (CV < 0.2) / + `variable` (0.2 ≤ CV < 1.0) / `bimodal` (CV ≥ 1.0 OR Hartigan dip + significant — v0.1 is CV-only, registry note flags v0.2 work). +- Tests + grid extension. + +Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_consistency`. + +### Step 9 — Calibration grid lockdown + +**Goal:** the gate. After this step lands, no engine PR is allowed +to drop a primitive from any of the five classes. + +- `tests/profiler/behave_shell/test_calibration_grid.py` parametrised + over the five shards from `BEHAVE/prototype_extractors/shell/`. +- For each shard, assert the **required primitive set** from the + integration doc's grid table is present in the output (subset + check, not exact match — engine is allowed to emit *more* than + the table requires). +- Skip with `pytest.importorskip` style if `BEHAVE_CALIBRATION_DIR` + unset — CI provides it, dev doesn't have to. +- This is the v0 gate. + +Commit: `test(profiler/behave_shell): five-class calibration grid lockdown`. + +### Step 10 — Phase A complete: calibration floor locked + +**Goal:** Phase A done. **NOT v0 release** — v0 requires the full +Tier-A corpus (Phases B–H below). Phase A delivers the 6-primitive +discriminative floor + the gate that future phases must not break. + +- 6 primitives emitting (`motor.input_modality`, + `motor.paste_burst_rate`, + `cognitive.inter_command_latency_class`, + `cognitive.command_branch_diversity`, + `cognitive.feedback_loop_engagement`, + `cognitive.inter_command_consistency`). +- Calibration grid green across all five class shards. +- Worker can be wired against Phase A safely + (BEHAVE-INTEGRATION.md Phase 4 unblocks here, *not* at v0). + +Commit: `feat(profiler/behave_shell): Phase A — calibration floor green`. + +--- + +### Phase B — `motor.*` completion (4 primitives) + +**Goal:** finish the motor family minus shell-mastery. All four +read existing `SessionContext` derived data; no new parsing. + +| Step | Primitive | Source | Notes | +|---|---|---|---| +| B.1 | `motor.keystroke_cadence` | `ctx.iats` histogram shape | steady (uniform) / bursty (heavy-tailed) / hunt_and_peck (bimodal slow+fast) / machine (sub-typing-floor) | +| B.2 | `motor.motor_stability` | `ctx.iats` outlier rate | tremor = high-frequency outliers above CV-of-IATs threshold | +| B.3 | `motor.error_correction` | backspace events relative to preceding key | immediate (<500ms) / deferred (next word boundary) / absent / route_around (no backspaces, but command later replaced) | +| B.4 | `motor.command_chunking` | per-command IAT variance + word-boundary timing | fluent (low intra-cmd variance + tight word boundaries) / fragmented (high variance) / single_command (one-shot session) | + +Per-step deliverable: feature function in `_features/motor.py`, +threshold constants in `_thresholds.py`, unit tests against +synthetic streams, calibration grid still green. + +Commits (4): `feat(profiler/behave_shell): emit motor.{keystroke_cadence,motor_stability,error_correction,command_chunking}`. + +### Phase C — `motor.shell_mastery.*` (3 primitives) + +**Goal:** the shell-fluency block. Per-command counters; trivial +implementations once command segmentation is in place (Step 4). + +| Step | Primitive | Source | +|---|---|---| +| C.1 | `motor.shell_mastery.tab_completion` | `\t` rate per command (none / occasional <30% / habitual ≥50%) | +| C.2 | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate (none / moderate / heavy) | +| C.3 | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command, median (shallow / moderate / deep) | + +Commits (3): `feat(profiler/behave_shell): emit motor.shell_mastery.*`. + +### Phase D — `cognitive.*` completion (8 primitives) + +**Goal:** finish the cognitive family. Mix of cheap and expensive; +`cognitive_load` is a composite over earlier primitives. + +| Step | Primitive | Source | Cost | +|---|---|---|---| +| D.1 | `cognitive.cognitive_load` | composite: IAT entropy + error rate + chunking variance | MEDIUM | +| D.2 | `cognitive.exploration_style` | command-graph branching shape (revisits, backtracks) | MEDIUM | +| D.3 | `cognitive.planning_depth` | think-pause-length distribution; deep = many >1.5s gaps before commands | LOW | +| D.4 | `cognitive.tool_vocabulary` | distinct first-tokens normalised by session length | LOW | +| D.5 | `cognitive.error_resilience.retry_tactic` | post-error command relation: rerun (same), modify (edit-and-retry), switch (different tool), abort (exit) | MEDIUM | +| D.6 | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta | LOW | +| D.7 | `cognitive.error_resilience.fallback_to_man` | `man`/`--help`/`-h` invocation post-error | LOW | +| D.8 | `cognitive.cognitive_load` re-tune (gate) | re-run calibration once D.1-D.7 stable | — | + +Commits (7): one per primitive, plus a re-tune commit if needed. + +### Phase E — `temporal.*` per-session subset (4 primitives) + +**Goal:** the four temporal primitives that don't need observation +history. The other three temporal primitives (session_timing, +persistence, idle_periodicity) are **Tier B** and are filed in +`ATTRIBUTION-ENGINE.md` — do not implement here. + +| Step | Primitive | Source | Cost | +|---|---|---|---| +| E.1 | `temporal.session_duration` | `ctx.duration_s` bucketed (short <60s / medium <600s / long <3600s / marathon ≥3600s) | TRIVIAL | +| E.2 | `temporal.escalation_pattern` | command-rate over rolling windows (sustained / erratic / bursty) | LOW | +| E.3 | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature match (`uname` / `id` / `whoami` / `pwd`) | LOW | +| E.4 | `temporal.lifecycle_markers.exit_behavior` | last command + exit timing (graceful `exit`/`logout` / abrupt session-cut / cleanup `history -c` etc.) | LOW | + +Commits (4): per primitive. + +### Phase F — `environmental.*` output-stream block (5 primitives) + +**Goal:** the output-stream-dependent cluster. Lands a shared +prompt-string parser once, then five primitives consume it. **This +is the most expensive single phase** — the prompt parser has to +handle ANSI escape sequences, multi-line continuation, and +custom prompts. + +| Step | Primitive | Source | Cost | +|---|---|---|---| +| F.0 | Prompt-string parser (`_parse.py`) | shared utility, no primitive | HIGH | +| F.1 | `environmental.shell_type` | prompt suffix sniff (`$`/`#`/`%`/`>`) + command syntax (bash / zsh / fish / cmd / powershell) | MEDIUM | +| F.2 | `environmental.terminal_multiplexer` | tmux/screen-specific escape sequences in output stream | LOW | +| F.3 | `environmental.locale` | `LANG`/`LC_*` envvars if attacker dumps env; output language sniff fallback (free string, BCP-47) | MEDIUM | +| F.4 | `environmental.keyboard_layout` | bigram-frequency fingerprint against known layouts (qwerty / azerty / qwertz / other) | HIGH | +| F.5 | `environmental.numpad_usage` | numeric input arrival pattern; weak signal — confidence cap | LOW | + +Commits (6): F.0 prepares; F.1-F.5 ship one per primitive. + +### Phase G — `operational.*` + `emotional_valence.*` (8 primitives) + +**Goal:** the two soft families. Both want a small command-intent / +sentiment lexicon; combine into one phase to share the lexical +infrastructure. + +| Step | Primitive | Source | Cost / Confidence | +|---|---|---|---| +| G.0 | Command-intent lexicon (`_features/_intent.py`) | shared first-token → category mapping (recon / exfil / persistence / lateral / destructive) | HIGH (corpus building) | +| G.1 | `operational.objective` | majority-category over session commands | MEDIUM | +| G.2 | `operational.opsec_discipline` | history-clearing / log-tampering / `.bash_history` removal patterns | MEDIUM | +| G.3 | `operational.cleanup_behavior` | exit-time cleanup commands (`rm`-of-touched-files, `unset HISTFILE`) | MEDIUM | +| G.4 | `operational.multi_actor_indicators` | mid-session pace/style shift detection (only `solo` and `handoff_detected` honest single-session; `team_coordinated` is Tier B) | HIGH | +| G.5 | `emotional_valence.valence` | lexical sentiment; positive / neutral / negative — **CONFIDENCE CAP 0.5** | LOW (soft) | +| G.6 | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs — **CAP 0.5** | LOW (soft) | +| G.7 | `emotional_valence.stress_response` | post-error speed-up (distress) vs slow-down (eustress) — **CAP 0.5** | LOW (soft) | +| G.8 | `emotional_valence.frustration_venting` | obscenity detection (`fuck`/`shit`/`damn`); registry value is binary — **CAP 0.5** | LOW (soft) | + +Commits (9). All four `emotional_valence.*` primitives ship under a +**hard 0.5 confidence cap** by convention — these are the most +likely primitives to embarrass the project, and operators must not +act on them without corroboration. + +### Phase H — Full-corpus lockdown + v0 release + +**Goal:** prove every Tier-A primitive in the registry has a feature +function, tag v0. + +| Step | Action | +|---|---| +| H.1 | **Registry-coverage test**: `tests/profiler/behave_shell/test_registry_coverage.py` walks `PRIMITIVE_REGISTRY`, filters out Tier-B and Tier-C primitives (explicit allow-list), asserts every remaining primitive appears in the output of at least one calibration shard. CI fails if the registry adds a primitive DECNET hasn't implemented yet. | +| H.2 | **Calibration grid full sweep**: re-run the five-class grid against the full primitive set; no regressions. | +| H.3 | **Live smoke**: ship a decky, run a real session from each calibration class, observe full primitive output in `observations` table + bus + AttackerDetail panel (mirrors integration-doc Phase 6). | +| H.4 | **Worker wired** (BEHAVE-INTEGRATION.md Phase 4 unblocks here). Pin `decnet-behave-core` / `decnet-behave-shell` in `pyproject.toml`. | +| H.5 | Tag v0; add `__version__ = "0.1.0"` to `behave_shell/__init__.py`. | + +Commit: `feat(profiler/behave_shell): v0 — full Tier-A corpus, all 37 primitives emitting`. + +### Per-phase rules (binding for all of B–H) + +1. **Calibration-grid gate is binding.** Every commit in B–G runs + the grid; any drop in expected primitive sets fails CI. +2. **Registry-coverage test is binding from H onward.** New Tier-A + primitives added to BEHAVE's registry without a corresponding + DECNET feature function fail CI. +3. **Adding a primitive = adding a feature func + registering it + + threshold constants + tests in the same commit.** No sneaking + implementation in without tests, no sneaking tests in without the + calibration assertion. +4. **Phases B–G can ship in any order**, but finish a phase before + starting another. Phase F is the hardest and should be sequenced + by reader stamina, not enthusiasm. +5. **Don't rush Phase G.** The soft primitives are the most likely + to embarrass the project. Calibrate against real-attacker shards + before tagging — and even then, hold the 0.5 confidence cap. +6. **Tier-B and Tier-C scope creep is forbidden.** The moment you + feel tempted to read a SECOND session inside `extract_session()`, + stop. That observation belongs to the attribution engine. + +Don't promise a delivery date for any phase. Each lands when it's +honest. v0 ships when **every Tier-A primitive emits + every test +green** — not before. + +--- + +## Out of scope for the engine + +- **Attribution.** Per the integration doc's bright line. Engine + emits observations; some other thing decides what they mean. See + `ATTRIBUTION-ENGINE.md`. +- **Cross-session merge logic.** That's DEBT-051 / Tier-B + primitives. Engine sees one session at a time, period. +- **Tier-C `toolchain.*` primitives.** Network-domain sensors + (sniffer, prober, correlator) own these. Either via existing + workers wrapping their outputs as BEHAVE observations, or a future + BEHAVE-NETWORK extractor. Not this doc. +- **Persistence / bus.** Worker concerns. Engine is pure. +- **Dynamic primitive registration.** The `FEATURES` tuple is + hand-edited; no plugin loaders. New primitive = new feature func + + one-line registry edit + tests in the same commit. +- **Streaming / partial extraction.** Engine assumes a complete + session. Live mid-session inference is a v2 concern; needs a + separate state-keeping design. +- **`primitives.py` registry edits.** The engine consumes the + registry; never mutates it. If a primitive is missing, file a + BEHAVE-side commit per the integration doc's "BEHAVE-side commits" + rule. +- **Confidence calibration against ground truth.** The calibration + grid is a *discrimination* test, not a *correctness* test. True + ground-truth labels would require red-team exercises with logged + intent. Filed when that data exists. + +--- + +## Implementation order checklist + +A single page you can paste into a TODO and tick off. **Every box +unchecked = no v0 tag.** + +### Phase A — Calibration floor (Steps 0–10) +- [ ] Step 0 — Scaffold + smoke test +- [ ] Step 1 — Asciinema parser + paste-burst detector +- [ ] Step 2 — `motor.input_modality` (FIRST PRIMITIVE) +- [ ] Step 3 — `motor.paste_burst_rate` +- [ ] Step 4 — Command segmentation in `SessionContext` +- [ ] Step 5 — `cognitive.inter_command_latency_class` +- [ ] Step 6 — `cognitive.command_branch_diversity` +- [ ] Step 7 — `cognitive.feedback_loop_engagement` +- [ ] Step 8 — `cognitive.inter_command_consistency` +- [ ] Step 9 — Calibration grid lockdown (the gate) +- [ ] Step 10 — Phase A complete: floor green + +### Phase B — `motor.*` completion +- [ ] B.1 `motor.keystroke_cadence` +- [ ] B.2 `motor.motor_stability` +- [ ] B.3 `motor.error_correction` +- [ ] B.4 `motor.command_chunking` + +### Phase C — `motor.shell_mastery.*` +- [ ] C.1 `motor.shell_mastery.tab_completion` +- [ ] C.2 `motor.shell_mastery.shortcut_usage` +- [ ] C.3 `motor.shell_mastery.pipe_chaining_depth` + +### Phase D — `cognitive.*` completion +- [ ] D.1 `cognitive.cognitive_load` +- [ ] D.2 `cognitive.exploration_style` +- [ ] D.3 `cognitive.planning_depth` +- [ ] D.4 `cognitive.tool_vocabulary` +- [ ] D.5 `cognitive.error_resilience.retry_tactic` +- [ ] D.6 `cognitive.error_resilience.frustration_typing` +- [ ] D.7 `cognitive.error_resilience.fallback_to_man` +- [ ] D.8 cognitive.cognitive_load re-tune (gate) + +### Phase E — `temporal.*` per-session +- [ ] E.1 `temporal.session_duration` +- [ ] E.2 `temporal.escalation_pattern` +- [ ] E.3 `temporal.lifecycle_markers.landing_ritual` +- [ ] E.4 `temporal.lifecycle_markers.exit_behavior` + +### Phase F — `environmental.*` (output-stream block) +- [ ] F.0 Prompt-string parser (shared utility) +- [ ] F.1 `environmental.shell_type` +- [ ] F.2 `environmental.terminal_multiplexer` +- [ ] F.3 `environmental.locale` +- [ ] F.4 `environmental.keyboard_layout` +- [ ] F.5 `environmental.numpad_usage` + +### Phase G — `operational.*` + `emotional_valence.*` (soft block) +- [ ] G.0 Command-intent lexicon (`_features/_intent.py`) +- [ ] G.1 `operational.objective` +- [ ] G.2 `operational.opsec_discipline` +- [ ] G.3 `operational.cleanup_behavior` +- [ ] G.4 `operational.multi_actor_indicators` +- [ ] G.5 `emotional_valence.valence` (cap 0.5) +- [ ] G.6 `emotional_valence.arousal` (cap 0.5) +- [ ] G.7 `emotional_valence.stress_response` (cap 0.5) +- [ ] G.8 `emotional_valence.frustration_venting` (cap 0.5) + +### Phase H — Full-corpus lockdown + v0 release +- [ ] H.1 Registry-coverage test +- [ ] H.2 Calibration grid full sweep, no regressions +- [ ] H.3 Live smoke across all five calibration classes +- [ ] H.4 Worker wired + `pyproject.toml` pin +- [ ] H.5 Tag v0 (`__version__ = "0.1.0"`) + +**44 boxes. 37 primitives. 1 v0.** Each box is a commit + tests in +the same commit. + +--- + +**Owner:** ANTI. +**Implementation gate:** Step 0 starts after this doc is reviewed + +Phase 1 of `BEHAVE-INTEGRATION.md` lands (storage table exists). diff --git a/development/BEHAVE-INTEGRATION.md b/development/BEHAVE-INTEGRATION.md new file mode 100644 index 00000000..04ffe6c4 --- /dev/null +++ b/development/BEHAVE-INTEGRATION.md @@ -0,0 +1,680 @@ +# BEHAVE Integration — Design + +**Status:** pre-implementation. This doc is the spec; code follows. +**Tracks:** DEBT-050 (replaces stale DEBT-036). +**Spec source:** `/home/anti/Tools/BEHAVE` (sibling, never vendored). +**Engine home:** this repo, `decnet/profiler/behave_shell/` (sublibrary inside the existing `profiler` worker — no new daemon). + +## Premise + +ANTI built BEHAVE — an out-of-tree behavioural-observation framework +with a primitive registry, a registry-validated `Observation` +envelope, a DECNET-bus event adapter, and a five-class calibration +grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the +right substrate for keystroke-dynamics extraction. + +The original DEBT-036 plan (hand-rolled `kd_*` columns on +`SessionProfile`) is obsolete. This doc replaces it with a +BEHAVE-aligned ingester that emits registry-validated observations on +the bus and persists them in a single generic table. + +**Bright line, lifted from BEHAVE itself:** *BEHAVE emits +observations. It does not conclude.* DECNET is a consumer of +`attacker.observation.*` events; attribution / linkage / verdicts are +out-of-scope for this integration and live in their own (future) +attribution engine. + +## Architectural placement + +``` +/home/anti/Tools/ +├── BEHAVE/ sibling repo, separate git history +│ ├── core/ decnet-behave-core (envelope) +│ ├── BEHAVE-SHELL/ decnet-behave-shell (registry + adapter) +│ └── prototype_extractors/shell/ extract.py — JSONL → Observation stream +│ +└── DECNET/ THIS repo + ├── pyproject.toml pins decnet-behave-{core,shell} + ├── decnet/profiler/ EXISTING worker — gains a sublibrary + a new trigger + │ ├── worker.py gains attacker.session.ended subscription + │ ├── behavioral.py UNCHANGED — networking-domain (LogEvent IATs, beacon detection) + │ ├── timing.py UNCHANGED — networking-domain + │ └── behave_shell/ NEW — pure extraction library + │ ├── __init__.py + │ ├── extract.py orchestration: parse → dispatch → assemble Observations + │ └── _features/ per-primitive-family modules + └── decnet/web/db/models/observations.py NEW — generic Observation table +``` + +**No new worker.** The existing `decnet-profiler.service` already +supervises this codepath. No new systemd unit, no new polkit rule, no +new heartbeat. The session-ended handler is a peer to the existing +scoring tick inside the same async loop. + +**Audit finding (network vs PTY domains).** `behavioral.py` and +`timing.py` operate on `LogEvent` (network-level connection events +from `decnet.correlation.parser`), feeding the existing +`attacker_behavior` table — TCP fingerprint, OS guess, beacon +interval, behavior class. **Zero overlap with BEHAVE-SHELL**, which +operates on `AsciinemaEvent` (PTY input) and persists to the new +`observations` table. The two coexist; no rewrite, no migration, no +shared state. + +Two repos, two commits, no vendoring. `pip install -e +../BEHAVE/core ../BEHAVE/BEHAVE-SHELL` for local dev; pinned wheels in +CI. + +## BEHAVE is the spec. DECNET is the engine. + +This is a *load-bearing* architectural fact, called out explicitly so +nobody (including future me) misreads the layout. + +- **BEHAVE ships:** the primitive registry, the registry-validated + `Observation` envelope, the bus event adapter, the JSON schema. + Reference prototype extractor for spec validation only. BEHAVE will + **not** ship a production engine — that's not what the BEHAVE repo + is for. +- **DECNET ships:** the production extraction engine. It lives in + `decnet/profiler/behave_shell/`, written from scratch against the + BEHAVE spec, called from the existing profiler worker on + `attacker.session.ended`. + +DECNET-side BEHAVE imports are spec-only: + +```python +from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope, Window +from decnet_behave_shell.spec.primitives import PRIMITIVE_REGISTRY, get as get_primitive_spec +from decnet_behave_shell.spec.event_adapter import event_topic_for, to_event_payload +``` + +`Observation` is aliased to `ObservationEnvelope` so the storage +SQLModel can keep the `Observation`-flavoured class name where it's +useful, and the BEHAVE primitive-spec accessor is aliased away from +the bare name `get` to avoid shadowing in feature-extractor modules +that read dicts heavily. + +That's it. No imports from `BEHAVE/prototype_extractors/`. The +prototype is read as **design notes** during the engine build, then +ignored. If the prototype yields a primitive the production engine +doesn't, that's a calibration delta to investigate, not a regression +in either direction. + +### The extraction engine — DECNET-side + +``` +decnet/profiler/behave_shell/ +├── __init__.py exposes extract_session() +├── extract.py orchestration: parse → dispatch → assemble Observations +└── _features/ feature-extractor modules, one per primitive family + ├── motor.py cadence, paste burst, modality, shell mastery + ├── cognitive.py latency class, consistency, branch diversity, feedback loop + ├── temporal.py session timing, escalation pattern + └── ... others added as primitives are productionised + +tests/profiler/behave_shell/ +└── _features/ one test module per feature family, against synthetic streams +``` + +The library is **pure** — no I/O, no bus calls, no DB writes. Events +in → `Iterable[Observation]` out. The split between `extract.py` +(orchestration) and `_features/` (per-family implementations) keeps +each primitive's logic auditable in isolation — including the +threshold tables, which are the part most likely to drift across +calibration cycles. The worker (in `decnet/profiler/worker.py`) owns +all I/O: disk-reach, bus publish, DB upsert. + +**The engine is its own first-class effort, not a side-effect of +this integration doc.** The five-class calibration grid is the +acceptance test. Beyond that, it has its own design surface +(threshold calibration methodology, per-primitive confidence scoring, +feature-family precedence rules) that this doc does not attempt to +fully specify — that belongs in a sibling `BEHAVE-EXTRACTOR.md` once +Phase 1 lands and we have the storage shape to write into. + +**Calibration knowledge does leak across the repo boundary.** BEHAVE's +`primitives.py` carries empirical calibration notes (e.g. CLAUDE-FF +vs CLAUDE-CL on 2026-05-02) inline in the registry. The clean +separation "BEHAVE = pure spec, DECNET = pure engine" is leakier +than this doc would prefer; both repos must agree on what a primitive +*means* before the engine threshold tables are tuned. Treat the +registry's `notes:` field as ground truth and tune DECNET to match. + +### BEHAVE-side commits (rare, for spec changes only) + +The only reasons to touch the BEHAVE repo during this integration: + +1. The DECNET engine discovers a primitive the registry needs and the + spec doesn't yet define → registry edit in BEHAVE → version bump + → DECNET pin update. +2. The envelope schema needs a field DECNET can populate honestly + (e.g. a structured `evidence_ref` schema) → envelope edit → schema + `v` bump → `observations.envelope_v` column already tracks it. + +These are not blockers for Phase 1. They land iteratively as the +engine matures. + +## Versioning + +| Axis | Current | DECNET pin | +|---|---|---| +| Envelope schema (`Observation.v`) | `1` | column `observations.envelope_v` tracks it | +| Schema URL | `https://behave.local/schema/observation/v1.json` | — | +| `decnet-behave-core` | `0.1.0` | `>=0.1.0,<0.2` | +| `decnet-behave-shell` | `0.1.0` | `>=0.1.0,<0.2` | + +A future `v=2` envelope coexists in the same table without a +destructive migration — query by `envelope_v` when shape diverges. +Bump the cap in `pyproject.toml` when BEHAVE cuts `0.2.0`. + +## Data flow + +``` + asciinema shard on disk + /var/lib/decnet/artifacts/{decky}/sessrec/sessions-YYYY-MM-DD.jsonl + │ + │ disk-reach (host-local, never on bus) + ▼ + bus: attacker.session.ended ─► decnet-profiler worker (existing) + (or poll fallback) │ → handler in worker.py + │ → calls behave_shell.extract_session(events) → Iterable[Observation] + │ (registry-validated by BEHAVE) + ▼ + bus.publish(event_topic_for(obs.primitive), + to_event_payload(obs)) + │ + ┌─────────────────────┼──────────────────────┐ + ▼ ▼ ▼ + observations table AttackerDetail UI future: attribution engine, + (DECNET storage) (live SSE consumer) federation gossip, webhook export +``` + +Raw `[t,"i",d]` events never cross the worker→bus boundary. Bus +carries observation envelopes only. Disk-reach for the input stream +mirrors DEBT-047's pattern (filesystem-group-readable artifacts via +DEBT-035). + +## Storage — the `observations` table + +Generic table holding every BEHAVE envelope field, plus a single +DECNET-side denormalization (`attacker_uuid`) for cheap joins. +**Not a strict 1:1 mirror** — the envelope has no `attacker_uuid`; +DECNET adds it so AttackerDetail doesn't have to chase +`identity_ref → AttackerIdentity → attacker_uuid` on every read. + +The SQLModel class is named `ObservationRow` to avoid colliding +with the BEHAVE `Observation` Pydantic class imported into the +same module. + +```python +# decnet/web/db/models/observations.py +from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope + +class ObservationRow(SQLModel, table=True): + __tablename__ = "observations" + + # ── envelope fields (types match BEHAVE exactly) ───────────── + id: str = Field(primary_key=True) # envelope.id (uuid4().hex string) + identity_ref: str | None = None # envelope.identity_ref (str, not UUID) + primitive: str = Field(index=True) # 'motor.keystroke_cadence' + value: dict[str, Any] | str | int | float | bool | list = \ + Field(sa_column=Column(JSON, nullable=False)) + confidence: float + window_start_ts: float # flattened from envelope.window + window_end_ts: float + source: str + evidence_ref: str = Field(nullable=False) # NOT NULL for DECNET emissions; see "Idempotency" + envelope_v: int # envelope.v + ts: float = Field(index=True) # emission ts + + # ── DECNET-side denormalization (NOT in BEHAVE envelope) ───── + attacker_uuid: UUID = Field(foreign_key="attackers.uuid", index=True) + + __table_args__ = ( + Index("ix_observations_attacker_primitive_ts", + "attacker_uuid", "primitive", "ts"), + Index("ix_observations_primitive_ts", "primitive", "ts"), + UniqueConstraint("evidence_ref", "primitive", + name="uq_observations_evidence_primitive"), + ) +``` + +**SQLAlchemy `JSON` not `JSONB`** per the typed-evidence-dicts memory +rule (dual-backend MySQL + SQLite). + +**`evidence_ref` is NOT NULL** for DECNET-emitted observations, even +though BEHAVE's envelope makes it `Optional[str]`. The worker's +"have we already profiled this session?" check (see Idempotency +below) keys on `evidence_ref`; if it's NULL the check breaks. The +shape `shard:{decky}/{service}/{date}.jsonl#sid` is mandatory at the +worker layer. If a future BEHAVE consumer needs nullable +evidence_ref, that's a separate observation source with its own +worker — not this one. + +**`UniqueConstraint(evidence_ref, primitive)`** enforces idempotency +at the schema level, so a re-run of the worker on the same shard+sid +produces a DB-side conflict, not silent duplicate rows. SQLite and +MySQL both treat distinct (non-NULL) tuples as distinct in unique +indexes — safe across both backends since `evidence_ref` is +NOT NULL. + +**No `_migrate_*` helper.** Pre-v1; `SessionProfile` and its `kd_*` +columns are deleted from `decnet/web/db/models/attackers.py` +outright. DEBT-011 (Alembic) remains deferred. + +### Canonical queries + +**Latest observation per primitive, for one attacker** (AttackerDetail +"current state" panel): + +```sql +SELECT primitive, value, confidence, ts +FROM observations +WHERE attacker_uuid = :uuid + AND ts = (SELECT MAX(ts) FROM observations o2 + WHERE o2.attacker_uuid = observations.attacker_uuid + AND o2.primitive = observations.primitive) +ORDER BY primitive; +``` + +(SQLite — no `DISTINCT ON`; window-function rewrite available if the +correlated subquery hot-spots.) + +**Time-series for one primitive across all sessions of one attacker** +(for "is this typist drifting" charts, future): + +```sql +SELECT ts, value, confidence +FROM observations +WHERE attacker_uuid = :uuid AND primitive = :primitive +ORDER BY ts; +``` + +## The session-ended handler — riding the existing profiler worker + +``` +decnet/profiler/ +├── worker.py EXISTING — gains attacker.session.ended subscription +└── behave_shell/ NEW — pure extraction library (no I/O) + ├── __init__.py + └── extract.py wraps the engine + disk-reach call site + +tests/profiler/behave_shell/ +├── __init__.py +├── test_extract.py unit tests against synthetic event streams +├── test_calibration_grid.py the five-class regression suite (Phase 5) +├── test_worker_session_ended_bus.py FakeBus path +└── test_worker_session_ended_poll.py DECNET_BUS_ENABLED=false path +``` + +(All tests live under `tests/`, mirroring the source tree per repo +convention. Existing `tests/profiler/test_session_profile.py` is +deleted alongside the `SessionProfile` model in Phase 1.) + +**Trigger.** Subscribe to `attacker.session.ended` on the bus. Poll +fallback walks `Log` rows where `event_type='session_recorded'` and +no `observations` row carries the matching `evidence_ref`. Bus path +ships first; poll fallback ships in the same commit so +`DECNET_BUS_ENABLED=false` is supported from day one (DEBT-031 +pattern). + +**Disk-reach.** For each `(decky, service, sid)`, resolve the shard +via `_find_shard_with_sid` (already shipped, `323077b`). Open the +JSONL via `decnet/artifacts/paths.py:resolve_artifact_path` +(DEBT-047 — symlink-escape check, regex validation, +`ARTIFACTS_ROOT` env override). Slice the per-sid event list. Pass +to BEHAVE. + +**Extraction.** Call +`decnet.profiler.behave_shell.extract_session(events, sid=..., source=...)`. +Receive `Iterable[Observation]`. Each is registry-validated at +construction by BEHAVE's `Observation` subclass; DECNET does not +re-validate. + +**Resolve `attacker_uuid`.** Sessrec carries `(decky_name, service, +sid, src_ip, src_port)` per shard line. Resolve src_ip → attacker +via the existing `attackers.ip` index; create-if-missing per the +existing observe path. Stamp `identity_ref=NULL` until attribution +exists. + +**Bus emission.** For each observation, **DECNET overrides BEHAVE's +adapter** to preserve sensor-side identifiers across the bus: + +```python +# BEHAVE's to_event_payload() excludes id/ts/v because BEHAVE assumes +# the bus envelope carries them at the Event level. DECNET's bus +# (DEBT-029) auto-generates fresh id/ts/v on publish — there's no +# bus.publish overload that accepts envelope-level overrides. Without +# this merge, BEHAVE's id/ts/v would be silently lost, breaking +# cross-host dedup and federation gossip. +payload = to_event_payload(obs) | {"id": obs.id, "ts": obs.ts, "v": obs.v} + +bus.publish( + topic = event_topic_for(obs.primitive), # 'attacker.observation.motor.keystroke_cadence' + payload = payload, +) +``` + +Subscribers reconstructing the envelope via +`from_event_payload(primitive, payload)` see the original BEHAVE id / +ts / v because they ride along in `payload`. The DECNET-bus Event +envelope's *own* id/ts/v (auto-generated) are bus-routing concerns, +distinct from observation identity. + +**This is a known deviation from BEHAVE's wire-format docstring** +(`core/decnet_behave_core/spec/envelope.py:77-84`). If DECNET's bus +later grows envelope-level overrides on `publish()`, revert to the +upstream contract. Filed as a low-priority follow-up — not blocking. + +Adapter import path is pure-stdlib — no DECNET imports inside BEHAVE. +DECNET is the consumer of BEHAVE's contract, never the other way +around. + +**Persistence.** All observations from one session — i.e. one +`(decky, service, sid)` triple — commit as **a single transaction**. +Either the entire session lands in `observations` or none of it +does; partial-failure mid-session never leaves a half-profiled +attacker row. + +Persist **first**, then publish to the bus best-effort. Bus is +fire-and-forget (DEBT-029 §6) — a publish failure does **not** roll +back the persisted rows, and a persist failure means nothing is +published. DB is the source of truth; the bus is the notification +layer only. Order matters: a downstream subscriber receiving an +`attacker.observation.*` event can immediately query the table and +find it; the inverse (publish-then-persist) would create a window +where subscribers chase rows that don't exist yet. + +**Idempotency.** Enforced at the schema level by +`UniqueConstraint(evidence_ref, primitive)`. Re-running the worker +on the same shard+sid produces a DB-side conflict per row, which the +worker handles via `INSERT … ON CONFLICT DO UPDATE` (SQLAlchemy +upsert). Worker marks a session "profiled" by the existence of any +row matching its `evidence_ref` — no separate marker column. Because +the unique index makes accidental duplicates structurally +impossible, the marker check is honest. + +## Bus topics + +Add to `decnet/bus/topics.py`: + +```python +ATTACKER_OBSERVATION_PREFIX = "attacker.observation" +# Wildcard patterns: +# attacker.observation.motor.* +# attacker.observation.cognitive.* +# attacker.observation.> (everything BEHAVE-SHELL emits) +``` + +Topic shape locked by BEHAVE's `event_topic_for()`; DECNET registers +the prefix for documentation and pattern-matching only. **Bus auth +is not topic-level** — per DEBT-029 §2 the bus uses +kernel-authenticated peer delivery (UNIX socket file permissions), +not topic ACLs. `bus/topics.py` change co-commits with a +wiki-checkout `Service-Bus.md` update (memory rule: "Document new +bus signals in the wiki"). + +## AttackerDetail consumer + +### REST surface + +`decnet/web/router/attackers/api_get_attacker_detail.py` swaps the +`SessionProfile` join for the latest-per-primitive query above. +Response shape gains: + +```jsonc +{ + // ... existing attacker fields ... + "observations": [ + { + "primitive": "motor.input_modality", + "value": "pasted", + "confidence": 0.91, + "ts": 1714521660.456, + "source": "decnet/profiler/behave_shell/extract.py" + }, + // ... one row per primitive observed for this attacker ... + ] +} +``` + +Frontend (`AttackerDetail.tsx`) renders a "Behavioural primitives" +panel grouped by the registry's top-level domain (`motor.*`, +`cognitive.*`, `temporal.*`, `operational.*`, `environmental.*`, +`cultural.*`, `emotional_valence.*`, `toolchain.*`). Day-one render +priorities for the panel: + +1. `motor.input_modality` — pasted vs typed vs mixed +2. `cognitive.feedback_loop_engagement` — closed_loop vs fire_and_forget +3. `cognitive.command_branch_diversity` — linear_playbook vs adaptive_branching +4. `cognitive.inter_command_latency_class` — typing_speed / llm_lightweight / llm_heavyweight / long +5. Everything else, alphabetised by primitive path. + +These four are the highest-discriminative-value primitives in the +calibration grid; surfacing them first is what unblocks the "is this +the same operator class" hover story. + +### Live-update SSE route + +`GET /api/v1/attackers/{uuid}/events` — per-attacker SSE stream, +mirrors the per-topology pattern shipped in DEBT-030. +The route subscribes to `attacker.observation.*` filtered by +`identity_ref` / resolved `attacker_uuid`, plus +`attacker.fingerprint_rotated` / `attacker.scored` for the same +attacker. + +Envelope identical to topology events: +`{v, type, ts, payload}`. Day-one event types: +`observation.`, `fingerprint.rotated`, `attacker.scored`. + +Auth: `?token=` query-param matching the existing per-topology and +`/stream` pattern. Snapshot-on-connect serves the latest-per-primitive +query result so the panel hydrates immediately, then live-forwards +bus events. 15s keepalive, mirrors the topology route. + +The global `/stream` is **not** the right fit here — it fans out +every attacker's events to every subscriber, and the AttackerDetail +page only cares about one. Per-attacker route, like +per-topology. + +## PII discipline + +Binds at the BEHAVE layer; DECNET does not get to "improve" the +envelope by reading raw bodies into payloads. + +- Raw `[t,"i",d]` keystroke events stay on disk. Worker reads, + extracts, discards. +- `evidence_ref` is a *pointer* (`shard:path#sid`), never the + evidence itself. +- `value` JSON is bounded by the registry's `ValueTypeSpec` — no + free-form blobs that could smuggle keystrokes. +- Bigram simhashes (when emitted via `cognitive.*` digraph + primitives) are *characters*, not *content* — already documented in + BEHAVE's primitives module. + +**Canonical PII binding.** The authoritative statement is the module +docstring at `core/decnet_behave_core/spec/envelope.py:3-19` — it +forbids raw keystrokes, command bodies, credentials, and payload +bytes in observation values; `evidence_ref` is a pointer, never the +evidence. That docstring is binding on this DECNET integration. +*Not* `BEHAVE-SHELL/scratchpad.md` — scratchpads, by definition, +aren't binding policy surfaces. + +## Calibration grid IS the regression test + +`tests/profiler/behave_shell/test_calibration_grid.py` runs the +**pure engine** (`behave_shell.extract_session()` called directly, +no worker, no bus, no DB) against each of the five +`BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl` +shards (gitignored — fixture path resolved via +`BEHAVE_CALIBRATION_DIR` env var, skipped if unset). Asserts the +expected primitive set fires per class: + +| Shard | Class | Required primitives in output | +|---|---|---| +| `sessions-2026-05-02.jsonl` | HUMAN | `motor.input_modality=typed`, `cognitive.inter_command_consistency=bimodal`, `cognitive.feedback_loop_engagement=closed_loop`, `cognitive.command_branch_diversity=adaptive_branching` | +| `sessions-2026-05-02-with-llm.jsonl` | YOU-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=occasional`, `cognitive.inter_command_latency_class=typing_speed`, `cognitive.command_branch_diversity=linear_playbook` | +| `sessions-2026-05-02-new.jsonl` | LW-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_lightweight`, `cognitive.command_branch_diversity=linear_playbook` | +| `sessions-2026-05-02-with-claude.jsonl` | CLAUDE-FF | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_heavyweight`, `cognitive.command_branch_diversity=linear_playbook`, `cognitive.feedback_loop_engagement=fire_and_forget` | +| `sessions-2026-05-02-closed-loop.jsonl` | CLAUDE-CL | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=long`, `cognitive.command_branch_diversity=adaptive_branching`, `cognitive.feedback_loop_engagement=closed_loop` | + +Any extractor change that breaks one of these classifications fails +CI. The grid is the discriminative-power floor — calibration +refinement can *add* primitives, never silently *drop* them. + +## Phase plan + +Per the "commit per task" memory rule, each phase ships as one commit +with its own tests. + +### Phase 1 — DECNET-side storage (no BEHAVE coupling yet) + +- New `observations` table + SQLModel + repository methods. +- Drop `SessionProfile` + `kd_*` columns from + `decnet/web/db/models/attackers.py`. +- AttackerDetail API switches to the latest-per-primitive query. + Returns empty `observations: []` since nothing populates the table. +- `decnet/bus/topics.py` registers `attacker.observation.*` prefix. +- Tests: SQLModel CRUD, latest-per-primitive query against fixture + rows, empty-attacker contract. + +### Phase 2 — DECNET extraction engine (`decnet/profiler/behave_shell/`) + +- Production extractor written against the BEHAVE spec, pure library + (no I/O). +- One feature-family module per `_features/{motor,cognitive,temporal,...}.py`. +- Public entry: `extract_session(events, *, sid, source) -> Iterable[Observation]`. +- Tests in `tests/profiler/behave_shell/_features/`: per-feature unit + tests against synthetic event streams. The calibration-grid suite + (Phase 5) is the integration test. +- This phase has its own design surface — see `BEHAVE-EXTRACTOR.md` + (filed as a sibling doc when Phase 1 lands). Phases 1 and 2 are + largely independent; can run in parallel. + +### Phase 3 — BEHAVE pin + +- `pyproject.toml` pins `decnet-behave-core` and `decnet-behave-shell` + at whatever versions the engine settles on. +- CI install-time smoke: registry imports cleanly, envelope validates + a known-good observation. + +### Phase 4 — Wire the trigger into the existing profiler worker + +- `decnet/profiler/worker.py` gains an `attacker.session.ended` + subscription handler. +- Handler does: resolve shard via disk-reach → call + `behave_shell.extract_session()` → upsert into `observations` table + → publish each observation on the bus. +- Poll fallback for `DECNET_BUS_ENABLED=false`. +- Trigger isolation: handler exceptions logged, do not affect the + existing scoring tick. +- Tests in `tests/profiler/behave_shell/`: FakeBus path, poll-only + path, disk-reach error paths, idempotency on re-run. +- **No new systemd unit.** The existing `decnet-profiler.service` + already supervises this code. + +### Phase 5 — Calibration regression suite + UI surface + +- `tests/profiler/behave_shell/test_calibration_grid.py` against all + five BEHAVE shards. +- New `GET /api/v1/attackers/{uuid}/events` SSE route (mirrors the + per-topology pattern from DEBT-030); snapshot-on-connect + + bus-forwarded `attacker.observation.*` events. Tests in + `tests/api/attackers/test_events_stream.py`. +- AttackerDetail.tsx renders the Behavioural primitives panel and + consumes the SSE route for live updates. +- Frontend Vitest coverage for the panel (DEBT-043 harness, shipped). + +### Phase 6 — Live smoke + +- Ship a decky, run a real SSH session from each calibration class + manually, disconnect, observe `observations` rows + bus events + + AttackerDetail panel. +- Document the smoke procedure in + `scripts/behave_shell/smoke.sh` (parallel to + `scripts/bus/smoke-mutator.sh` — per-feature dirs). + +## Out of scope + +Filed for future paydown when they bite. Do not let them creep into +this integration. + +- **Attribution engine.** Consumes `attacker.observation.*`, emits + `attribution.profile.candidate.*`. BEHAVE explicitly separates + observation from attribution. +- **Federation gossip** of observations across swarm hosts. +- **Backfill** over historical shards (one-shot script when the + table lands; not a worker feature). +- **Webhook export** of observation streams (rides DEBT-037). +- **Observation retention / vacuum.** Pre-v1, no users to mislead; + filed when storage actually pressures. +- **`SessionProfile` data migration.** None — table ships empty + today, drop is destructive but lossless. +- **Cross-domain BEHAVE** (BEHAVE-TEXT integration for stylometric + analysis of attacker-typed messages, e.g. captured emails). Same + `observations` table will accept those envelopes when their primitive + registry is registered, but the wiring is a separate paydown. + +## Resolved decisions (formerly open questions) + +- **Q1 — engine location.** RESOLVED: BEHAVE's prototype is reference + code only, never imported by DECNET. The production extraction + engine lives in `decnet/profiler/behave_shell/` as a sublibrary of + the existing profiler worker — no new daemon, no new systemd unit. + (See "BEHAVE is the spec. DECNET is the engine.") +- **Q2 — emission granularity.** RESOLVED: **per-(sid, primitive).** + Every session emits its full primitive set; every emission + persists. The schema already supports it; this just locks in the + worker write loop. *More detail the better.* +- **Q3 — cross-session aggregation, day one.** RESOLVED: latest wins + per primitive in the AttackerDetail "current state" query. Simple, + honest, easy to reason about. + +## Real open question — Cross-session aggregation, the right way + +Q3's "latest wins" is a stopgap. The actual question is harder and +deserves its own design pass before AttackerDetail starts surfacing +attribution-flavoured claims: + +> **When two sessions from the same attacker (or identity) emit +> conflicting values for the same primitive, what does the +> attacker-level view say?** + +Concrete cases: + +- Session A: `motor.input_modality = typed` (conf 0.92). + Session B (next day): `motor.input_modality = pasted` (conf 0.88). + Is this attacker `mixed`? Or did they switch tooling? Or did a + *different operator* take over the same credentialed access? +- `cognitive.feedback_loop_engagement` flips from `closed_loop` to + `fire_and_forget` between two sessions. Is this fatigue, a + handoff (`operational.multi_actor_indicators=handoff_detected`?), + or a script taking over from a human? +- `cognitive.command_branch_diversity = unknown` in a short session + vs `adaptive_branching` in a long session. Latest-wins would + collapse this to `unknown` if the short session lands second — + exactly the wrong answer. + +**This is genuinely an attribution-engine concern**, not an +extraction concern. BEHAVE is firm on that bright line. The clean +answer is: + +1. **DECNET stores all observations** (per-sid, per-primitive — Q2). +2. **AttackerDetail's day-one "current state" query is latest-wins** + (Q3) — not because it's right, but because it's *honestly + transparent* about being naïve. +3. **The right answer ships with the attribution engine** as a + separate paydown — likely as new `attribution.profile.*` topics + that emit a *derived* per-attacker primitive map with explicit + merge semantics (`stable` / `drifting` / `conflicted` / + `multi_actor`). Day-zero, that engine doesn't exist; day-one, + AttackerDetail just shows raw latest values + a "N + observations" hover. + +Filed as **DEBT-051 — Cross-session BEHAVE primitive aggregation +(attribution engine)** when this doc is reviewed. Out of scope for +this integration; explicitly listed under "Out of scope" above. + +--- + +**Owner:** ANTI. +**Implementation gate:** this doc reviewed → Phase 1 starts. diff --git a/development/DEBT.md b/development/DEBT.md index f3e01633..05fdbc81 100644 --- a/development/DEBT.md +++ b/development/DEBT.md @@ -277,7 +277,17 @@ The Workers panel (Config → Workers) landed with bus-based STOP but every STAR **Status:** Open. Depends on the Workers panel (shipped) and `deploy/decnet-bus.service` pattern being extended to the other workers. -### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards) +### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards) — **STALE 2026-05-03, SUPERSEDED BY DEBT-050** + +> **Stale.** This entry was drafted before BEHAVE-SHELL existed. It bakes the +> feature schema into hand-rolled `SessionProfile` columns (`kd_iki_mean`, +> `kd_burst_ratio`, …), which duplicates the registry in +> `BEHAVE/BEHAVE-SHELL/decnet_behave_shell/spec/primitives.py`, bypasses the +> registry-validated `Observation` envelope, and skips the bus event adapter +> (`event_topic_for` / `to_event_payload`) that already speaks DECNET's +> `attacker.observation.*` topic shape. The replacement plan is **DEBT-050** +> below. Original text preserved unchanged for context. + **Files:** `decnet/web/ingester.py` (or new sibling under `decnet/session_profiler/`), `decnet/web/db/models/attackers.py:SessionProfile` (table already exists, ships empty), `decnet/templates/_shared/sessrec/sessrec.c` (emitter side — already done), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer — already joins SessionProfile when present). The `SessionProfile` SQLModel table has been committed to storage since session recording v1 landed (see `decnet/web/db/models/attackers.py:97-143`). Every column — `kd_iki_mean`, `kd_iki_stdev`, `kd_iki_p50`, `kd_iki_p95`, `kd_enter_latency_p50/p95`, `kd_burst_ratio`, `kd_think_ratio`, `kd_ctrl_backspace/wkill/ukill/abort/eof`, `kd_arrow_rate`, `kd_tab_rate`, `kd_digraph_simhash`, `total_keystrokes`, `session_duration_s` — is nullable by design because the **ingester that populates them does not exist yet** (documented as gap #2 in `SIGNAL_CAPTURE_AUDIT.md`). Every session that gets recorded lands an empty row (or, today, no row at all) while the `[t, "i", d]` event stream in the shard carries every signal those columns exist to capture. @@ -317,7 +327,83 @@ All four signals fall out of the schema for free. CoV from `kd_iki_mean` + `kd_i - The motivating-case wget session produces CoV ≈ 0.74 ± 0.05 when the ingester processes it — sanity check against the manual analysis. - The AttackerDetail page surfaces at least `kd_iki_mean` + `kd_burst_ratio` somewhere in the keystroke-dynamics section, unblocking the "is this the same typist" hover story. -**Status:** Open. Depends on the shard-scan fallback (shipped in `323077b`) and `SessionProfile` schema (shipped with session recording v1). The bus-trigger path depends on DEBT-031's deferred `attacker.session.started/ended` topics, but poll-driven ingestion works today and can ship first. +**Status:** ⚠️ Stale — superseded by DEBT-050. Do not implement against this entry; the column-zoo design is the wrong shape now that BEHAVE-SHELL exists. + +### DEBT-050 — BEHAVE-SHELL session-profile ingester worker (replaces DEBT-036) +**Files:** `decnet/session_profiler/worker.py` (**new**), `decnet/web/db/models/observations.py` (**new** — generic Observation table, see Storage), `decnet/web/db/models/attackers.py` (drop `SessionProfile` and its `kd_*` columns), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer surface — switch from SessionProfile join to per-primitive Observation latest-state query), `decnet/bus/topics.py` (admit `attacker.observation.*` prefix), `decnet/web/db/sqlmodel_repo/observations.py` (**new** — repository methods), `packaging/systemd/decnet-session-profiler.service` (**new**), `pyproject.toml` (pin `decnet-behave-core`, `decnet-behave-shell`), **BEHAVE repo (separate commit):** `BEHAVE/prototype_extractors/shell/extract.py` (refactor `__main__` into importable `extract_session()`). + +**Context.** ANTI built BEHAVE — an out-of-tree behavioural-observation framework with its own primitive registry, registry-validated `Observation` envelope, DECNET-bus event adapter, and a five-class calibration grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the right substrate for keystroke-dynamics extraction; the original DEBT-036 entry predates it and got the schema wrong by inventing parallel columns. BEHAVE is a **separate repo** (mirrors `wiki-checkout` discipline — two repos, two commits per change). + +**Design:** + +1. **New worker** `decnet/session_profiler/worker.py`. Sibling of `decnet/ingester/`, supervised by a new `packaging/systemd/decnet-session-profiler.service` unit (mirrors DEBT-034's pattern). One process per host, agent-or-master-agnostic. +2. **Trigger.** Subscribe on the bus to `attacker.session.ended`; poll-fallback over `Log.event_type='session_recorded'` rows lacking a "profiled" marker (see Storage). Bus-optional per DEBT-031: `try get_bus(); except: warn-and-degrade-to-poll`. +3. **Disk-reach** (per DEBT-047 precedent). For each `(decky, service, sid)`, resolve the shard via `_find_shard_with_sid` (already shipped in `323077b`), open the JSONL, walk the per-sid event slice. **No raw `d` values cross the worker→bus boundary** — BEHAVE's envelope rules prohibit it, and disk-reach keeps the input stream host-local. +4. **Extraction.** Refactor `BEHAVE/prototype_extractors/shell/extract.py`'s `__main__` into an importable `extract_session(events: Iterable[AsciinemaEvent]) -> Iterable[Observation]`. Feed it the per-sid `[t,"i",d]` slice. Output is a stream of registry-validated `Observation`s, one per primitive that fired for the session. **Refactor lands in the BEHAVE repo as a separate commit** (two repos, two commits). +5. **Bus emission.** For each `obs`: `bus.publish(event_topic_for(obs.primitive), to_event_payload(obs))`. The adapter is pure-stdlib, no DECNET imports — DECNET is the consumer of *its* contract, not the other way around. Topic prefix `attacker.observation.*` registered in `decnet/bus/topics.py`. +6. **Storage — drop `SessionProfile`, new generic `Observation` table.** Schema mirrors the BEHAVE envelope 1:1 so persistence cannot drift from the wire format: + + ``` + observations ( + id UUID PRIMARY KEY, -- BEHAVE Observation.id + attacker_uuid UUID NOT NULL FK, -- denormalised from identity_ref or join-resolved + identity_ref UUID NULL, -- raw envelope field, may be null pre-attribution + primitive TEXT NOT NULL, -- 'motor.keystroke_cadence' etc. + value JSON NOT NULL, -- envelope shape; SQLAlchemy JSON not JSONB (memory rule) + confidence REAL NOT NULL, + window_start_ts REAL NOT NULL, + window_end_ts REAL NOT NULL, + source TEXT NOT NULL, + evidence_ref TEXT NULL, -- shard:sid pointer for disk-reach audit, never evidence itself + envelope_v INTEGER NOT NULL, -- BEHAVE Observation.v (currently 1) + ts REAL NOT NULL, -- emission ts + INDEX (attacker_uuid, primitive, ts DESC), + INDEX (primitive, ts DESC) + ) + ``` + + AttackerDetail's "current state per primitive" view = `SELECT DISTINCT ON (primitive) … ORDER BY primitive, ts DESC` (or the SQLite equivalent via window function). `SessionProfile` and its `kd_*` columns are dropped outright — pre-v1, no users to mislead, no migration ceremony (DEBT-011 still deferred; just edit the SQLModel). +7. **Packaging.** Pin `decnet-behave-core>=0.1.0,<0.2` and `decnet-behave-shell>=0.1.0,<0.2` in DECNET's `pyproject.toml`. Envelope schema is currently `v=1` (`https://behave.local/schema/observation/v1.json`); the `observations.envelope_v` column tracks it so a future `v=2` envelope can land alongside without a destructive migration. Local dev: `pip install -e ../BEHAVE/core ../BEHAVE/BEHAVE-SHELL`. CI installs the pinned wheels from a BEHAVE release tag — bump the cap when BEHAVE cuts `0.2.0`. + +**Non-negotiables:** +- Registry validation is enforced at construction time by BEHAVE's `Observation` subclass — no DECNET-side primitive whitelist, no drift. +- Extractor refactor must keep `extract.py --summary` and the calibration-grid CLI flow working; the library entry-point is *additive*. +- `DECNET_BUS_ENABLED=false` keeps the worker functional in poll-only mode (mirrors DEBT-031). +- Idempotent on re-run: same shard + same sid → same observation set (sort+dedupe by primitive before emitting). +- PII discipline binds at the BEHAVE layer; DECNET does not get to "improve" the envelope by reading raw bodies into payloads. + +**Acceptance:** +- Replay each of the five `BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl` calibration shards through the worker. Each session produces the BEHAVE-SHELL primitives that the README's class-signature column predicts (e.g. CLAUDE-FF: `motor.input_modality=pasted` + `motor.paste_burst_rate=habitual` + `cognitive.inter_command_latency_class=llm_heavyweight` + `cognitive.command_branch_diversity=linear_playbook` + `cognitive.feedback_loop_engagement=fire_and_forget`). +- AttackerDetail surfaces at least `motor.input_modality`, `cognitive.feedback_loop_engagement`, and `cognitive.command_branch_diversity` for any attacker with a profiled session. +- The five-class grid IS the regression test — any extractor change must keep all five sessions classifying within their expected primitive sets. + +**Out of scope (defer to DEBT-051+ as they bite):** +- Attribution engine (consumes `attacker.observation.*`, emits `attribution.profile.candidate.*`). BEHAVE deliberately separates observation from attribution. +- Federation gossip of observations across swarm hosts. +- Backfill over historical shards. +- Webhook export of observation streams (rides DEBT-037). + +**Status:** Open. Replaces DEBT-036. Depends on (a) BEHAVE-SHELL spec frozen at v0.x, (b) `extract.py` library refactor in the BEHAVE repo, (c) shard-scan fallback (shipped `323077b`). + +### DEBT-051 — Cross-session BEHAVE primitive aggregation (attribution engine) +**Files:** `decnet/correlation/attribution/` (**new**), `decnet/web/db/models/attribution_state.py` (**new**), `decnet/bus/topics.py` (`attribution.profile.*` prefix), `decnet/web/router/attackers/api_get_attacker_detail.py` (state-badge wiring). + +`BEHAVE-INTEGRATION.md`'s Q3 settled the AttackerDetail "current state" surface as **latest-wins per primitive** for v0 — honest about being naïve. The harder question — *how do conflicting observations across sessions of the same attacker resolve into a stable view?* — is filed here. + +Concrete cases: +- Session A says `motor.input_modality = typed`, session B says `pasted`. Mixed? Operator switched tooling? Different operator on shared creds? +- `cognitive.feedback_loop_engagement` flips closed_loop ↔ fire_and_forget across sessions. Fatigue, handoff (`operational.multi_actor_indicators=handoff_detected`), or scripted takeover? +- A short session emits `cognitive.command_branch_diversity=unknown`; a long one emits `adaptive_branching`. Latest-wins would collapse to `unknown` if the short one lands second — exactly the wrong answer. + +**This is genuinely an attribution-engine concern**, not an extraction concern (BEHAVE's bright line is firm on the split). The clean answer: + +1. DECNET stores all observations per-(sid, primitive). ✅ Substrate ships in DEBT-050. +2. AttackerDetail's day-one query is latest-wins (Q3 above). ✅ Substrate ships in DEBT-050. +3. The right answer ships as a derived per-(attacker, primitive) state machine emitting `attribution.profile.state_changed` events with explicit merge semantics: `stable / drifting / conflicted / multi_actor / unknown`. + +Full design in `development/ATTRIBUTION-ENGINE.md`. v0 scope: aggregation only over per-`attacker_uuid` proto-identities (sidesteps the still-deferred clusterer from `IDENTITY_RESOLUTION.md`); v1 widens to identity_uuid clustering; v2 federation gossip. + +**Status:** Open. Depends on DEBT-050 v0 in production for ≥ 1 month (so the engine has observation data to merge against) + a calibration corpus that exercises drift / multi-actor scenarios end-to-end. ### ~~DEBT-035 — Artifacts written as the container uid, not the API's~~ ✅ RESOLVED 2026-05-02 **Files:** `decnet/cli/init.py`, `decnet/web/router/transcripts/api_get_transcript.py` (soft-fail kept as defence-in-depth). @@ -717,7 +803,9 @@ user who needs it. | ~~DEBT-032~~ | ✅ | Correlation / Prober | resolved 2026-05-03 | | DEBT-033 | 🟡 Medium | Storage / Session recording | open | | ~~DEBT-035~~ | ✅ | Artifacts / Filesystem perms | resolved 2026-05-02 | -| DEBT-036 | 🟡 Medium | Correlation / Keystroke dynamics | open | +| DEBT-036 | ⚠️ Stale | Correlation / Keystroke dynamics | superseded by DEBT-050 | +| DEBT-050 | 🟡 Medium | BEHAVE-SHELL session-profile ingester | open (replaces DEBT-036) | +| DEBT-051 | 🟡 Medium | Attribution engine / cross-session aggregation | open (depends on DEBT-050) | | DEBT-037 | 🟡 Medium | Integration / Webhooks | open (tracks MVP follow-ups) | | DEBT-038 | 🟡 Medium | Honeypot / SSH cred capture | open (document-only) | | ~~DEBT-039~~ | ✅ | Honeypot / Cred emitters | resolved | @@ -732,5 +820,5 @@ user who needs it. | DEBT-048 | 🟡 Medium | TTP / Intel provider mapping review (recurring) | open / recurring | | DEBT-049 | 🟡 Medium | TTP / Sigma adapter (post-v1) | open | -**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-036 (session-profile ingester), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1). +**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1), DEBT-050 (BEHAVE-SHELL session-profile ingester — replaces DEBT-036), DEBT-051 (attribution engine / cross-session aggregation). DEBT-036 is stale. **Estimated remaining effort:** ~21 hours plus the new EmailLifter / TTP follow-ups. DEBT-030 Phase B (optimistic staged-buffer editor) is a follow-up, not debt.