docs(behave): integration + extractor + attribution design (DEBT-050 / 051)

Three sibling design docs plus DEBT.md updates that supersede the
stale DEBT-036 with a BEHAVE-aligned plan.

development/BEHAVE-INTEGRATION.md — five-phase rollout: storage
(observations table mirroring the BEHAVE Observation envelope plus
one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive)
enforcing idempotency); engine (in decnet/profiler/behave_shell/
sublibrary, no new daemon, not in BEHAVE — DECNET is the engine);
BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live
smoke. Bus payload merges id/ts/v back in to preserve sensor
identifiers across the bus envelope.

development/BEHAVE-EXTRACTOR.md — engine route in eight phases
(A–H). Phase A locks the 6-primitive calibration grid; Phases B–G
expand horizontally; Phase H is the full Tier-A corpus + v0
release. v0 ships every shell-extractable primitive (37 of them);
Tier B is cross-session and lives in the attribution engine; Tier
C is network-domain (toolchain.*) and lives elsewhere.

development/ATTRIBUTION-ENGINE.md — sublibrary inside
decnet/correlation/ that consumes attacker.observation.* events
and emits attribution.profile.* derived state. Five-state machine
(unknown / stable / drifting / conflicted / multi_actor) with per-
ValueKind merge functions. v0 closes DEBT-051; v1 adds the real
clusterer; v2 federation gossip. The bright line forbidding
attribution to natural persons is lifted directly from BEHAVE's
envelope docstring.

development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and
DEBT-051 entries added; summary table + open list updated.
This commit is contained in:
2026-05-03 07:24:19 -04:00
parent 3f080f601d
commit 11f474556c
4 changed files with 2046 additions and 4 deletions

View File

@@ -0,0 +1,572 @@
# Attribution Engine — Design
**Status:** pre-implementation. This doc is the spec; code follows.
**Tracks:** DEBT-051 (cross-session BEHAVE primitive aggregation —
named in `BEHAVE-INTEGRATION.md`).
**Depends on:** `IDENTITY_RESOLUTION.md` (substrate shipped — table,
FK, lifecycle topics), `BEHAVE-INTEGRATION.md` (observation
producer), `DEBT-032` (fingerprint rotation, shipped).
**Engine home:** this repo, `decnet/correlation/attribution/`
(sublibrary inside the existing correlation worker — no new daemon).
## Premise
DECNET has three layers stacked above raw events. After
`BEHAVE-INTEGRATION.md` ships, we have:
| Layer | What it stores | What it knows |
|---|---|---|
| **Observation** | `observations` table, one row per (sid, primitive) | "I saw value V for primitive P, sourced from session S, at time T, with confidence C." |
| **Attacker** | `attackers` table, one row per source IP | "These observations all came from IP X." |
| **Identity** | `attacker_identities` table (empty today — `IDENTITY_RESOLUTION.md`) | "These N attacker rows are the same hands." |
BEHAVE *emits*. Attackers are *observed*. The attribution engine is
the layer that **concludes** — it links observations into identities
and surfaces a per-identity primitive map with explicit merge
semantics. This doc specifies it.
## The bright line — lifted from BEHAVE, binding here
The BEHAVE envelope module docstring
(`core/decnet_behave_core/spec/envelope.py:20-26`) draws an explicit
bright line:
> Explicitly NOT for: identity attribution to named natural persons;
> access or admission decisions; biometric login; ML-driven user
> identification. Those framings push into legal/ethics territory the
> project will not walk into by accident.
That binding statement carries forward. The attribution engine:
- **Links observations to opaque identity UUIDs**, never to named
persons.
- **Emits probabilistic linkage**, never certainty.
- **Does not gate access** to anything — it's an analytics surface.
- **Does not output classifier verdicts** about "good" vs "bad"
operators; it surfaces *behavioural coherence* (these observations
cluster) and *behavioural drift* (this identity's primitives are
changing), and stops there.
Crossing this line is grounds for ripping the engine out and
starting over.
## What the engine IS, what it IS NOT
| IS | IS NOT |
|---|---|
| A clusterer + state machine over BEHAVE observations | A keystroke-dynamics extractor (that's the engine in `BEHAVE-EXTRACTOR.md`) |
| The thing that writes `attacker_identities` rows | The thing that decides whether to block/alert/page on an attacker |
| The producer of `attribution.profile.*` events | The producer of `attacker.observation.*` events |
| Honest about uncertainty (every claim carries a confidence) | A binary classifier with an arbitrary threshold |
| Replayable / deterministic given the same observation sequence | A black-box ML model |
## Architectural placement
```
/home/anti/Tools/DECNET/
├── decnet/correlation/ EXISTING worker — gains a sublibrary + a new trigger
│ ├── worker.py gains attacker.observation.* subscription
│ ├── fingerprint_rotation.py UNCHANGED — already shipped (DEBT-032)
│ └── attribution/ NEW — pure attribution library
│ ├── __init__.py exposes link_observation(), aggregate_identity()
│ ├── linkage.py "which identity does this observation belong to?"
│ ├── aggregate.py per-(identity, primitive) merge state machine
│ ├── _signals/ per-signal scorers (jarm, hassh, kd, c2, ip)
│ └── _thresholds.py named constants, calibration-cited
└── decnet/web/db/models/
├── attacker_identities.py EXISTING (IDENTITY_RESOLUTION.md substrate)
└── attribution_state.py NEW — per-(identity, primitive) state rows
```
**No new worker.** The existing `decnet-correlation.service`
supervises this codepath. The correlation worker already owns
cross-attacker reasoning (DEBT-032 fingerprint rotation lives there).
Attribution is a natural peer.
**Audit finding (correlation vs profiler).** Profiler emits
observations per-session (BEHAVE-SHELL extraction). Correlation
consumes observations across sessions and decides identity. Two
roles, two workers, clean cut. **Don't mix them.**
## Two responsibilities, kept separate
The engine has **two axes of work**, often confused:
### Axis 1 — Linkage
> "This new observation arrived. Which identity does it belong to?"
Inputs: one observation (just arrived) + the existing identity table.
Output: one of {`assign-to-existing(uuid)`, `create-new()`,
`defer(reason)`}.
Lives in `attribution/linkage.py`. Reads
`attacker.observation.*` events; writes `attacker_identities` rows
and `attackers.identity_id` FK; emits `identity.formed` /
`identity.observation.linked` (existing topics from
`IDENTITY_RESOLUTION.md`).
### Axis 2 — Aggregation
> "Given an identity's full observation history, what's the
> per-primitive summary I should surface to AttackerDetail /
> IdentityDetail?"
Inputs: all observations linked to one identity. Output: a
per-primitive state map: `{primitive: (current_value, state, confidence, dispersion)}`
where `state ∈ {stable, drifting, conflicted, multi_actor, unknown}`.
Lives in `attribution/aggregate.py`. Pure function — given the same
observation set, returns the same state map (replayability is
non-negotiable).
**These two axes are separable.** v0 ships **aggregation only** (over
single-`attacker_uuid` proto-identities), solves DEBT-051. v1 adds
linkage (real clustering across attacker_uuids). v2 adds federation.
This ordering is deliberate — aggregation has narrower failure modes
and doesn't require the linkage signals to be calibrated yet.
## v0 / v1 / v2 ladder
### v0 — Aggregation over per-attacker proto-identities
The substrate of `IDENTITY_RESOLUTION.md` ships empty: every
`attackers` row has `identity_id = NULL`. No clusterer means no
identity rows. v0 sidesteps this honestly: **treat each
`attacker_uuid` as its own proto-identity** and aggregate
observations over it.
What v0 delivers:
- Per-(attacker_uuid, primitive) merge state machine.
- New `attribution_state` table holding the derived state.
- New `attribution.profile.*` bus topics emitting state transitions.
- AttackerDetail's "current state" panel gains state badges
(`stable / drifting / conflicted`) replacing today's naïve
latest-wins surface from `BEHAVE-INTEGRATION.md` Q3.
What v0 does NOT do:
- No clustering across IPs.
- No identity rows ever populated.
- `IdentityDetail.tsx` (already built per `IDENTITY_RESOLUTION.md`)
stays unreached — there are no identities yet.
**v0 closes DEBT-051.** That's the explicit scope.
### v1 — Linkage (real clustering)
What changes:
- Clusterer subscribes to high-confidence rotation-resistant signals
(HASSH, payload simhashes, keystroke-dynamics simhash,
C2 callbacks) and groups `attacker_uuid`s under
`attacker_identities.uuid`.
- v0's aggregation engine retargets from `attacker_uuid` to
`identity_uuid` once a cluster forms.
- `identity.formed` / `identity.observation.linked` /
`identity.merged` (existing topics) start firing.
- IdentityDetail.tsx starts seeing rows.
What v1 does NOT do:
- No federation. Cluster decisions are master-local.
- No retroactive observation re-linking once an identity is committed
(that's a v1.5 problem, "stable" identities should be hard to
un-link silently).
### v2 — Federation gossip
What changes:
- Identities + their primitive-state maps gossip over the existing
swarm mTLS infra to peer masters.
- `schema_version` field on `attacker_identities`
(`IDENTITY_RESOLUTION.md` Risk #3) becomes load-bearing.
- Trust model is **social**, not cryptographic
(memory rule: federation trust is invite-based/human).
Out of scope for this doc beyond noting it exists. Federation gets
its own design pass.
---
## v0 design — Aggregation state machine
The whole reason DEBT-051 was filed. This is the load-bearing piece.
### State definitions
For each `(attacker_uuid, primitive)` pair, the engine maintains a
state from this set:
| State | Meaning | When to assert |
|---|---|---|
| `unknown` | Insufficient observations to classify | Default; < 3 observations OR all-`unknown` values |
| `stable` | Recent observations agree | Last N observations all share the same value |
| `drifting` | Recent observations disagree with older | Recent N != older N, but recent N is internally consistent |
| `conflicted` | Recent observations disagree with each other | Recent N is split (no majority) |
| `multi_actor` | Strong signal that two operators share access | Conflicted + alternation pattern (operator A → B → A → B), not random flip |
### Per-primitive merge logic
The engine carries a per-`ValueKind` merge function. Categorical
primitives dominate the calibration grid; numeric and hash primitives
need different math:
#### Categorical (`motor.input_modality`, `cognitive.feedback_loop_engagement`, etc.)
Last-N window comparison. With `N = 5` (configurable in
`_thresholds.py`):
```
recent_5 = observations[-5:]
older_5 = observations[-10:-5] # if available
if all(o.value == recent_5[0].value for o in recent_5):
if older_5 and all(o.value == older_5[0].value for o in older_5):
if recent_5[0].value != older_5[0].value:
state = drifting
else:
state = stable
else:
state = stable # consistent with no older comparison
elif majority_value(recent_5):
state = stable # tolerant — one outlier in five is fine
else:
state = conflicted
```
`multi_actor` triggers on conflicted + temporal alternation
(operator A and B observations interleave on a session-level granularity,
not just within one session). Lower-confidence detection;
v0 emits at confidence ≤ 0.6 by design.
#### Numeric (`toolchain.c2.beacon_interval_ms`, etc.)
EWMA + dispersion. State = `stable` if dispersion < 20% of mean,
`drifting` if mean shifts > 30% over recent window, `conflicted`
if dispersion > 100%.
#### Hash (`toolchain.tls.jarm_server`, `toolchain.ssh.hassh_client`)
Already handled by DEBT-032 fingerprint rotation. Attribution engine
*reads* `attacker.fingerprint_rotated` events, doesn't recompute.
State = `stable` if no rotation, `drifting` if 1-2 rotations,
`conflicted` if > 2 rotations in a tight window.
### Storage — the `attribution_state` table
Materialised view of the state machine. Re-derivable from
`observations` + DEBT-032's rotation log; this table is a cache for
cheap reads, not a source of truth.
```python
# decnet/web/db/models/attribution_state.py
class AttributionStateRow(SQLModel, table=True):
__tablename__ = "attribution_state"
# ── key ────────────────────────────────────────────────
attacker_uuid: UUID = Field(foreign_key="attackers.uuid", primary_key=True)
primitive: str = Field(primary_key=True)
# ── derived state ──────────────────────────────────────
current_value: dict[str, Any] | str | int | float | bool | list = \
Field(sa_column=Column(JSON, nullable=False))
state: str # 'stable' | 'drifting' | 'conflicted' | 'multi_actor' | 'unknown'
confidence: float # engine's confidence in the state assertion (not in any verdict)
observation_count: int # how many observations underlie this state
last_change_ts: float # when state last flipped
last_observation_ts: float # most recent observation that fed this row
# ── audit ──────────────────────────────────────────────
schema_version: int = 1 # for federation, mirrors AttackerIdentity convention
updated_at: float
__table_args__ = (
Index("ix_attribution_state_state", "state"),
Index("ix_attribution_state_last_change", "last_change_ts"),
)
```
`(attacker_uuid, primitive)` is the composite PK — at most one state
row per pair. v1 will rename `attacker_uuid` to a polymorphic
`subject_uuid` keyed on either attackers or identities (deferred —
don't pre-build the polymorphism before clustering ships).
### Bus topics
New, distinct from `IDENTITY_RESOLUTION.md`'s `identity.*` lifecycle
topics:
| Topic | Payload | When |
|---|---|---|
| `attribution.profile.state_changed` | `{attacker_uuid, primitive, old_state, new_state, current_value, confidence, ts}` | State transitions (e.g. `stable``drifting`) |
| `attribution.profile.multi_actor_suspected` | `{attacker_uuid, primitives: [], evidence_summary, confidence, ts}` | When ≥ 2 primitives independently signal `multi_actor`; correlation is the trigger, not any single primitive |
`identity.*` topics from `IDENTITY_RESOLUTION.md` stay reserved for
v1 (clusterer-emitted lifecycle events). v0 doesn't touch them.
**Wiki:** `Service-Bus.md` documents these in the same commit that
adds the constants (`feedback_wiki_bus_signals`).
### API surface
```
GET /api/v1/attackers/{uuid}/attribution
→ {
"primitives": [
{
"primitive": "motor.input_modality",
"current_value": "pasted",
"state": "stable",
"confidence": 0.91,
"observation_count": 7,
"last_change_ts": 1714521660.456
},
...
]
}
```
AttackerDetail.tsx merges this with the latest-per-primitive query
from `BEHAVE-INTEGRATION.md`. The state badge is the new bit.
The SSE route from `BEHAVE-INTEGRATION.md`
(`GET /api/v1/attackers/{uuid}/events`) gains forwarded
`attribution.profile.state_changed` events so the badge updates live.
---
## Linkage signals (v1 — not v0)
For when v0 is stable and we promote attacker_uuid → identity_uuid.
Documented here so v0 doesn't paint into a corner.
### Signal weights
Each signal contributes to a linkage score. Two `attacker_uuid`s
with combined score above the threshold get clustered.
| Signal | Strength | Why | Cost |
|---|---|---|---|
| Same `kd_digraph_simhash` (Hamming distance < 8) | **STRONG** | Keystroke rhythm is hard to fake without effort | Computed at session-end by BEHAVE engine |
| Same C2 callback endpoint | **STRONG** | Operator infra is sticky | Already extracted |
| Same `hassh_client` | MEDIUM | Tools change less than IPs | Already in `attacker_behavior` |
| Same `jarm_server` (if attacker exposes services) | MEDIUM | Probed-attacker substrate (DEBT-032) | Already shipped |
| Same `tcp_fingerprint` cluster | WEAK | OS info, easily collided | Already in `attacker_behavior` |
| Same source IP | **REJECT** | Triggers naïvely on NAT collisions; never use IP alone | n/a |
### Threshold
Single combined score, calibrated against:
- **False merges**: two distinct attackers collapsed into one (silent
miscount). HARD failure — engine refuses to merge below ~0.85.
- **Missed merges**: two observations from the same operator
unrelated. Soft failure — operator can review unmerged candidates
in IdentityDetail's "candidate links" panel and merge manually.
The threshold lives in `_thresholds.py` like the BEHAVE-SHELL
engine's; calibration cycle ships with the linkage code.
### Soft-merge audit trail
`attacker_identities.merged_into_uuid` already exists from
`IDENTITY_RESOLUTION.md`. v1 uses it. When the clusterer reverses an
earlier merge (rare but real), the loser row's `merged_into_uuid` is
NULLed and a `attribution.profile.split_proposed` event surfaces in
the operator's review queue.
---
## Phase plan
Per the "commit per task" + "tests per task" memory rules. Each
phase is one commit.
### Phase 1 — Schema + topics + empty handler
- New `attribution_state` SQLModel + migration (none needed pre-v1,
per the memory rule — just edit the model).
- `decnet/bus/topics.py` registers `attribution.profile.*` prefix.
- `decnet/correlation/worker.py` gains an
`attacker.observation.*` subscription handler that does
**nothing yet** — just logs. Proves the wiring.
- Wiki `Service-Bus.md` update co-commits.
- Tests: SQLModel CRUD on `attribution_state`, bus subscription
handler is exercised by FakeBus.
Commit: `feat(correlation/attribution): substrate + idle handler`.
### Phase 2 — Categorical merge function
- `attribution/aggregate.py:_aggregate_categorical(observations) → (value, state, confidence)`.
- Implements the last-N comparison logic above.
- Pure function. Synthetic-input tests covering each state transition
(unknown → stable → drifting → stable, conflicted, multi_actor).
- No DB, no bus, no I/O.
Commit: `feat(correlation/attribution): categorical merge state machine`.
### Phase 3 — Hash + numeric merge functions
- `_aggregate_hash` reads `attacker_fingerprint_rotation` events
(DEBT-032 already produces them).
- `_aggregate_numeric` does EWMA + dispersion.
- Per-`ValueKind` dispatcher in `aggregate.py` picks the right
function.
- Tests for each value-kind path.
Commit: `feat(correlation/attribution): hash + numeric merge functions`.
### Phase 4 — Wire into the worker
- Subscription handler reads each `attacker.observation.*` event,
loads the prior `AttributionStateRow` (if any), runs the merger,
upserts the new state, emits `attribution.profile.state_changed`
on transition.
- Trigger isolation: handler exceptions logged, do not affect
fingerprint-rotation or any other correlator path.
- Tests: end-to-end with FakeBus + in-memory DB, observation-in →
state-row-out + transition-event-out.
Commit: `feat(correlation/attribution): wire bus handler, persist state`.
### Phase 5 — `multi_actor_suspected` cross-primitive correlator
- Periodic tick (every 60s default — configurable) walks
`attribution_state` rows where `state = 'multi_actor'`, groups by
`attacker_uuid`, fires
`attribution.profile.multi_actor_suspected` if ≥ 2 primitives flag
the same attacker_uuid concurrently.
- Tests: synthetic state rows, assert event fires only on co-flag.
Commit: `feat(correlation/attribution): cross-primitive multi-actor detection`.
### Phase 6 — API surface
- `GET /api/v1/attackers/{uuid}/attribution` route + Pydantic model.
- AttackerDetail.tsx renders state badges per primitive in the
Behavioural Primitives panel.
- SSE route forwarding `attribution.profile.state_changed` events
filtered by attacker_uuid.
- Frontend Vitest coverage.
Commit: `feat(web): expose attribution state on AttackerDetail`.
### Phase 7 — v0 lockdown
- Synthetic calibration scenarios (extending the BEHAVE-SHELL
calibration grid concept):
- "Stable HUMAN over 7 sessions" → all primitives `stable`
- "HUMAN switches to LLM mid-week" → primitives flip
`stable``drifting`
- "Two operators alternating on shared creds" → ≥ 2 primitives
flag `multi_actor`
- "Single short session" → all primitives `unknown`
- All four scenarios green in CI.
Commit: `test(correlation/attribution): v0 calibration lockdown`.
---
## Out of scope
Filed for future paydown when they bite. Do not let them creep into
v0.
- **Linkage / clustering across attacker_uuids.** That's v1.
- **Federation gossip of identities.** That's v2.
- **Identity-level intel** (`attacker_identity_intel` from
`IDENTITY_RESOLUTION.md`). Different lifecycle, ships with v1.
- **Manual operator merge UI.** Operators can't fix clusterer
mistakes from the dashboard — the read-only API stays read-only
in v0. Editable identity rows are a v1 concern.
- **Retroactive re-aggregation** when thresholds change. v0
recomputes lazily on next observation per attacker; no batch
re-walk.
- **Confidence calibration against ground truth.** No ground-truth
data exists yet. v0 confidence values are heuristic; calibration
ships when red-team exercises produce labelled trace data.
- **Persona-classification** (e.g. "this identity behaves like a
bot"). The bright line forbids this. State machine emits
*coherence* and *drift*, not classifier labels.
## Resolved decisions
- **Where the engine lives.** RESOLVED:
`decnet/correlation/attribution/`, sublibrary inside the existing
correlation worker. No new daemon. Symmetric with BEHAVE-SHELL's
placement under `decnet/profiler/behave_shell/`.
- **Linkage vs aggregation separation.** RESOLVED: two axes, two
modules (`linkage.py` / `aggregate.py`). v0 ships aggregation
only.
- **Topic namespace.** RESOLVED: `attribution.profile.*` for
derived state, distinct from `IDENTITY_RESOLUTION.md`'s
`identity.*` lifecycle topics. The two namespaces compose; they
don't overlap.
- **State machine vocabulary.** RESOLVED:
`unknown / stable / drifting / conflicted / multi_actor`.
Five states, no more (resist the urge to grow the enum).
- **Subject of attribution in v0.** RESOLVED: `attacker_uuid`,
not `identity_uuid`. v1 widens.
## Real open questions
These are not stoppers for v0 but need answers before the engine
ships beyond v0.
1. **`multi_actor` false-positive cost.** A flapping primitive can
look like multi-actor when it's really an operator on a flaky
network or split between phone/laptop. v0's confidence ≤ 0.6 cap
helps but doesn't eliminate it. Open: what's the operator-facing
UX for a `multi_actor` claim that's wrong?
2. **Window size `N`.** v0 hardcodes `N=5` for last-N comparison.
This is calibrated against typical session counts (most attackers
are observed < 10 times before they go quiet). Operators with
long-running attackers (resident threats) may want a wider
window; needs config knob in v1.
3. **Primitive-weight asymmetry.** Today every primitive contributes
equally to the implicit "is this attacker behavioural-stable?"
summary. But `motor.input_modality` is far more discriminative
than `temporal.weekend_cadence`. Open: do we expose primitive
weights in the API, or just sort by confidence?
4. **Observation-to-row contention.** A burst of observations for
the same `(attacker_uuid, primitive)` pair (e.g. a long session
with 50 sub-observations) hits the same row 50 times. v0 reads
the row, runs the merger, writes back — under load this is a
serialised hot path. Open: should the merger batch-process within
one tick, or is per-observation latency cheap enough?
5. **What happens to `attribution_state` rows when an
`attacker_uuid` is deleted?** No `attackers` deletion path
exists today, but if/when one ships (GDPR purge, federation
resync), `ON DELETE CASCADE` is the obvious choice. File when it
matters.
---
## Implementation order checklist
A single page you can paste into a TODO and tick off:
- [ ] Phase 1 — Schema + topics + idle handler
- [ ] Phase 2 — Categorical merge function (pure, no I/O)
- [ ] Phase 3 — Hash + numeric merge functions
- [ ] Phase 4 — Wire bus handler, persist state
- [ ] Phase 5 — `multi_actor_suspected` cross-primitive correlator
- [ ] Phase 6 — API + AttackerDetail badges + SSE forwarding
- [ ] Phase 7 — v0 calibration scenarios lockdown
Seven commits, seven test sets. v0 closes DEBT-051 and gives
operators an honest "is this attacker behaviourally stable, drifting,
or showing multiple operators?" surface — without crossing the
attribution-of-natural-persons bright line.
After v0, v1 (linkage / clustering) is gated on:
- v0 stable in production for ≥ 1 month
- ≥ 1 high-discrimination linkage signal calibrated
(keystroke-dynamics simhash from BEHAVE-SHELL is the obvious
candidate; v1 of the BEHAVE engine adds it post-step-10)
---
**Owner:** ANTI.
**Implementation gate:** this doc reviewed → Phase 1 starts after
`BEHAVE-INTEGRATION.md` v0 is live (observation table populated +
worker emitting `attacker.observation.*` events).

View File

@@ -0,0 +1,702 @@
# BEHAVE-SHELL Extraction Engine — Implementation Route
**Status:** pre-implementation. Sibling to `BEHAVE-INTEGRATION.md`.
**Scope:** the inside of `decnet/profiler/behave_shell/`. Nothing else.
**Acceptance gate:** the five-class calibration grid in
`BEHAVE-INTEGRATION.md` §"Calibration grid IS the regression test."
This doc is the **construction manual** for the engine. The
integration doc says *what* the engine plugs into; this doc says
*how to build it from zero to v0 in a deterministic sequence*.
---
## Mission
Take an asciinema-style PTY event stream for one session, return an
`Iterable[Observation]` of BEHAVE-SHELL primitives. Pure library:
no I/O, no bus, no DB. Worker owns those.
```python
def extract_session(
events: Iterable[AsciinemaEvent], # [t_float, kind: 'i'|'o', data: str]
*,
sid: str,
source: str = "decnet/profiler/behave_shell/extract.py",
) -> Iterable[Observation]:
```
`AsciinemaEvent` is a 3-tuple `(t, kind, data)` matching the on-disk
shard line format. No fancy class — a tuple is honest about what it is.
## Single-pass discipline
A naïve engine re-walks the event stream once per primitive, paying
O(n × primitives) for nothing. We don't do that.
Single pass over events builds a `SessionContext` — a precomputed
bundle of indexes that every feature module reads from. Cheap; one
walk; reproducible.
```python
@dataclass(frozen=True, slots=True)
class SessionContext:
sid: str
source: str
evidence_ref: str
t_start: float
t_end: float
duration_s: float
# Raw event slices (already filtered by kind)
input_events: tuple[InputEvent, ...] # ('i', t, data)
output_events: tuple[OutputEvent, ...] # ('o', t, data)
# Derived once, used everywhere
iats: tuple[float, ...] # IATs between input events
paste_bursts: tuple[PasteBurst, ...] # detected paste regions
commands: tuple[Command, ...] # split on \r / \n
inter_cmd_iats: tuple[float, ...] # IATs between command boundaries
output_per_cmd: tuple[int, ...] # output bytes between cmd_i and cmd_{i+1}
```
All feature modules take `ctx: SessionContext` and yield 0 or more
Observations. Single source of truth, single parse cost.
## Engine layout
```
decnet/profiler/behave_shell/
├── __init__.py re-exports extract_session
├── extract.py extract_session() + SessionContext build
├── _parse.py asciinema event types + parsing helpers
├── _ctx.py SessionContext dataclass + builders
├── _thresholds.py all numeric thresholds, one place, named constants
└── _features/
├── __init__.py FEATURES tuple — registered list of feature funcs
├── motor.py
├── cognitive.py
└── temporal.py (later)
```
`extract.py` is short:
```python
def extract_session(events, *, sid, source="..."):
ctx = build_session_context(events, sid=sid, source=source)
for feature_fn in FEATURES:
yield from feature_fn(ctx)
```
That's the whole orchestration. Adding a primitive = adding a function
to `_features/<family>.py` and registering it in `FEATURES`.
## Threshold table convention
Every numeric threshold lives in `_thresholds.py` as a named constant
with a docstring citing the registry's `notes:` field. **Never inline
magic numbers in feature code.** When calibration drifts, you change
one file.
```python
# decnet/profiler/behave_shell/_thresholds.py
"""Numeric thresholds for BEHAVE-SHELL primitive classification.
Each constant cites its calibration source. When the registry's
`notes:` field disagrees with a constant here, the registry is
authoritative — fix the constant, re-run the grid.
"""
# motor.paste_burst_rate buckets — events per minute of session
PASTE_RATE_OCCASIONAL_MIN = 0.5 # at least one paste every two minutes
PASTE_RATE_HABITUAL_MIN = 3.0 # paste-driven workflow
# cognitive.inter_command_latency_class — seconds (median IAT between commands)
ICL_TYPING_SPEED_MAX = 2.0
ICL_DELIBERATE_MAX = 8.0
ICL_LLM_LIGHTWEIGHT_MAX = 8.0 # 2-8s band; lower bound = ICL_TYPING_SPEED_MAX
ICL_LLM_HEAVYWEIGHT_MAX = 30.0 # 8-30s band — registry primitives.py:140-149
# > 30s = "long"
```
## Full registry scope — what the engine owns, what it doesn't
Before the route: a sober count. The BEHAVE-SHELL registry today
contains roughly **53 primitives** across 8 top-level domains. Not
all of them are extractable from a single PTY session; some need
observation history; some belong to a different sensor entirely.
Three tiers:
### Tier A — Per-session shell-extractable (37 primitives)
Computable from one `(decky, service, sid)` shard. The extractor
owns these end-to-end.
| Domain | Primitive | Source signal |
|---|---|---|
| motor | `motor.input_modality` | paste-burst detector |
| motor | `motor.paste_burst_rate` | paste-burst counter |
| motor | `motor.keystroke_cadence` | IAT histogram shape |
| motor | `motor.motor_stability` | IAT outlier rate |
| motor | `motor.error_correction` | backspace-relative-to-error timing |
| motor | `motor.command_chunking` | intra-command IAT variance |
| motor | `motor.shell_mastery.tab_completion` | `\t` rate per command |
| motor | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate |
| motor | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command |
| cognitive | `cognitive.inter_command_latency_class` | median inter-command IAT bucketed |
| cognitive | `cognitive.inter_command_consistency` | CV of inter-command IATs |
| cognitive | `cognitive.command_branch_diversity` | unique-first-token / total-commands |
| cognitive | `cognitive.feedback_loop_engagement` | Pearson r(output_bytes, next_pause) |
| cognitive | `cognitive.cognitive_load` | composite (IAT entropy + error rate + chunking) |
| cognitive | `cognitive.exploration_style` | command-graph branching shape |
| cognitive | `cognitive.planning_depth` | think-pause-length distribution |
| cognitive | `cognitive.tool_vocabulary` | distinct first-tokens normalised |
| cognitive | `cognitive.error_resilience.retry_tactic` | post-error command relation |
| cognitive | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta |
| cognitive | `cognitive.error_resilience.fallback_to_man` | `man`/`--help` invocation post-error |
| temporal | `temporal.session_duration` | `duration_s` bucketed |
| temporal | `temporal.escalation_pattern` | command-rate over rolling windows |
| temporal | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature |
| temporal | `temporal.lifecycle_markers.exit_behavior` | last-command + exit-code analysis |
| operational | `operational.objective` | command-intent classifier (recon / exfil / persistence / lateral / destructive) |
| operational | `operational.opsec_discipline` | history-clearing, log-tampering, .bash_history rm |
| operational | `operational.cleanup_behavior` | exit-time cleanup commands |
| operational | `operational.multi_actor_indicators` | mid-session pace/style shift detection |
| environmental | `environmental.shell_type` | prompt-string sniff from `'o'` events |
| environmental | `environmental.terminal_multiplexer` | tmux/screen escape sequences |
| environmental | `environmental.keyboard_layout` | bigram-frequency layout fingerprint |
| environmental | `environmental.locale` | `LANG`/`LC_*` envvar dump if `env` runs; output language sniff |
| environmental | `environmental.numpad_usage` | numeric input arrival pattern (weak) |
| emotional_valence | `emotional_valence.valence` | obscenity / praise / neutral lexicon |
| emotional_valence | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs |
| emotional_valence | `emotional_valence.stress_response` | post-error speed-up vs slow-down |
| emotional_valence | `emotional_valence.frustration_venting` | `fuck`/`shit`/etc. detection (registry value is binary) |
The emotional_valence primitives are SOFT and will produce false
positives. Documented as such; emit at confidence ≤ 0.5 per the
confidence convention.
### Tier B — Cross-session (computed by attribution engine, not extractor)
8 primitives that **cannot honestly be computed from one session**.
The extractor does not emit these. The attribution engine
(`ATTRIBUTION-ENGINE.md`) computes them during aggregation, reading
the per-attacker observation history. Cross-reference: a TODO in
`ATTRIBUTION-ENGINE.md` notes that aggregation may include
*derivation*, not just *merging*.
| Domain | Primitive | Why cross-session |
|---|---|---|
| temporal | `temporal.session_timing` | diurnal/nocturnal/irregular requires multiple sessions |
| temporal | `temporal.persistence` | hit_and_run/return_visitor/resident is intrinsically multi-session |
| temporal | `temporal.lifecycle_markers.idle_periodicity` | periodicity needs a long enough sample |
| cultural | `cultural.meal_break_gaps` | gap pattern over days |
| cultural | `cultural.periodic_micro_pauses` | needs many sessions to find regular intervals |
| cultural | `cultural.dst_behavior` | needs sessions spanning a DST transition |
| cultural | `cultural.weekend_cadence` | needs a week+ of sessions |
| cultural | `cultural.holiday_gaps` | needs ≥ a year for honest claim |
If you find yourself implementing one of these in the extractor,
**stop**. It's an attribution-engine concern.
### Tier C — Network domain (out of scope for this engine entirely)
The full `toolchain.*` subtree —
TLS / transport / SSH / HTTP / C2 / protocol_abuse / payload
fingerprints. Roughly 25 primitives. These come from the sniffer /
prober / correlation pipeline, not from PTY session extraction.
Two paths to populate them, both NOT this doc:
1. **Wrap existing DECNET workers** (sniffer, prober, correlation,
intel) to emit `attacker.observation.toolchain.*` from their
existing outputs. Pragmatic, ships sooner. Filed as a future
"wire existing producers to BEHAVE" track (mentioned in
`BEHAVE-INTEGRATION.md` Out of Scope, around the
`toolchain.c2.beacon_*` overlap with profiler's existing
`behavioral.py`).
2. **Future BEHAVE-NETWORK extractor** parallel to BEHAVE-SHELL,
eating PCAP / netflow / TLS-handshake records. Cleaner long-term
architecture; substantial effort.
Either way, **not extractor work for this doc.**
## Confidence convention
Every emitted Observation must carry a `confidence` in `[0.0, 1.0]`.
Three rules:
1. **Sample-size honesty.** A primitive computed from < 5 samples
gets `confidence ≤ 0.5`. A bucket-classification with no IATs
should emit `unknown` (where the registry permits) at
`confidence = 1.0` — the *fact* of insufficient data is itself a
high-confidence observation.
2. **Threshold proximity.** If the measured value is within 10% of a
bucket boundary, drop confidence by 0.2. Sitting on the fence is a
real signal; pretending you know is dishonest.
3. **Output-stream availability.** Primitives that need `[t,"o",d]`
events drop confidence to 0.0 and skip emission entirely if the
shard contains no output events. Don't fabricate.
Confidence is **the sensor's confidence in its measurement**, not in
any downstream verdict — same line BEHAVE draws.
---
## The route to v0 — every Tier-A primitive emits
**v0 ships the entire BEHAVE-SHELL Tier-A corpus.** All 37
shell-extractable primitives in the registry must have a feature
function emitting them before the engine tags v0. Anything less is
v0-pre.
The route is broken into **eight phases (AH)** that each ship a
coherent slice with its own tests. With the architecture locked
(`SessionContext`, `_features/`, `_thresholds.py` already designed),
each primitive is a small, well-bounded chunk — most are dozens of
lines plus tests. The two real cost centres are Phase F (prompt
parser) and Phase G (command-intent lexicon); both bounded by the
calibration notes already in the registry. Phase A establishes the
6-primitive calibration floor (the discriminative grid). Phases BG
expand horizontally across the registry. Phase H is the full-corpus
lockdown + v0 release.
Each step within a phase is one commit (per the "commit per task"
memory rule), with its own tests in the same commit (per "tests per
task"). No step is allowed to land red against the calibration grid
once Phase A locks it in.
### Phase A — Calibration floor (Steps 010)
**Goal:** establish the 6-primitive set that discriminates the
five-class calibration grid. Lock the gate.
This is the foundation. Phases BG cannot start until Phase A green.
### Step 0 — Scaffold + smoke
**Goal:** prove the wiring before any logic.
- Create `decnet/profiler/behave_shell/{__init__,extract,_parse,_ctx,_thresholds}.py`.
- `extract_session()` parses events into a minimal `SessionContext`,
registers an empty `FEATURES = ()`, returns no observations.
- `tests/profiler/behave_shell/test_extract_smoke.py` asserts:
- empty events → empty iterable
- one input event → SessionContext built, t_start/t_end/duration_s correct
- import path works
Commit message: `feat(profiler/behave_shell): scaffold extract_session entry point`.
### Step 1 — Asciinema parser + paste-burst detector
**Goal:** the shared primitives that two feature modules will consume.
- `_parse.py`: types (`InputEvent`, `OutputEvent`, `PasteBurst`,
`Command`) + `parse_event(line: str | dict) -> AsciinemaEvent`.
- `_ctx.py`: `build_session_context()` populates `iats`,
`paste_bursts` (chunks where consecutive IATs < `PASTE_IAT_MAX_S`
AND chunk size > `PASTE_MIN_CHARS`).
- Tests: synthetic streams covering pure-typed, pure-pasted, mixed.
Commit: `feat(profiler/behave_shell): asciinema parser + paste-burst detection`.
### Step 2 — `motor.input_modality` (FIRST PRIMITIVE)
**Goal:** prove the end-to-end pipeline emits a single registry-valid
Observation.
Why first: highest discriminative value (HUMAN vs everyone), simplest
implementation (just count paste-burst chars vs typed chars).
- `_features/motor.py:input_modality(ctx)` yields one Observation
with value in `{"typed", "pasted", "mixed"}`.
- Register in `FEATURES`.
- Tests:
- synthetic typed stream → `typed`
- synthetic pasted stream → `pasted`
- HUMAN calibration shard → `typed`
- YOU-sim calibration shard → `pasted`
After this step, the calibration grid passes for **one column** and
the integration is end-to-end live (Phase 4 of the integration plan
becomes wireable, not just blocked on theory).
Commit: `feat(profiler/behave_shell): emit motor.input_modality`.
### Step 3 — `motor.paste_burst_rate`
**Goal:** second primitive, builds on the paste-burst index from
step 1. Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL.
- `_features/motor.py:paste_burst_rate(ctx)``none / occasional / habitual`.
- Threshold constants in `_thresholds.py`.
- Tests + grid extension.
Commit: `feat(profiler/behave_shell): emit motor.paste_burst_rate`.
### Step 4 — Command segmentation (no primitive)
**Goal:** shared utility for the three cognitive primitives next in
line. Pure refactor inside `_ctx.py`.
- `commands` populated: split input stream on `\r` (and `\n`) into
`Command(start_ts, end_ts, first_token_hash)` records.
- **PII discipline:** store only the *first token* (or its hash) plus
timing. Never the full command body. Branch-diversity needs the
first token; nothing needs the rest.
- `inter_cmd_iats` and `output_per_cmd` populated.
- Tests for segmentation edge cases (no trailing newline, multiple
newlines in a paste, etc).
Commit: `feat(profiler/behave_shell): command segmentation in SessionContext`.
### Step 5 — `cognitive.inter_command_latency_class`
**Goal:** classify the operator's *thinking pace* between commands.
Splits LW-sim / CLAUDE-FF / CLAUDE-CL.
- `_features/cognitive.py:inter_command_latency_class(ctx)`
`instant / typing_speed / deliberate / llm_lightweight / llm_heavyweight / long`.
- Median of `inter_cmd_iats`, bucketed against `_thresholds.py`.
- Confidence drops if < 5 commands.
- Tests + grid extension.
Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_latency_class`.
### Step 6 — `cognitive.command_branch_diversity`
**Goal:** content-based playbook-vs-adaptive split. Splits CLAUDE-FF
from CLAUDE-CL.
- `_features/cognitive.py:command_branch_diversity(ctx)`
`linear_playbook / adaptive_branching / unknown`.
- `unique_first_tokens / total_commands` ratio against threshold.
- `unknown` when total_commands < 5 (registry-allowed).
- Tests + grid extension.
Commit: `feat(profiler/behave_shell): emit cognitive.command_branch_diversity`.
### Step 7 — `cognitive.feedback_loop_engagement`
**Goal:** the orthogonal axis — does the operator's pause-after-command
correlate with output bytes? Splits HUMAN/CLAUDE-CL (closed) from
LW-sim/CLAUDE-FF (fire-and-forget).
- Requires `output_per_cmd[i]` paired with `inter_cmd_iats[i+1]`.
- Pearson correlation; bucket on r > 0.3 / r ≈ 0 / insufficient.
- `_features/cognitive.py:feedback_loop_engagement(ctx)`
`closed_loop / fire_and_forget / unknown`.
- **First primitive that depends on output events.** If the shard
carries no `'o'` events (rare but possible — minimal recorders),
emit `unknown` at confidence 1.0.
- Tests + grid extension.
Commit: `feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement`.
### Step 8 — `cognitive.inter_command_consistency`
**Goal:** dispersion/bimodality of command IATs.
HUMAN-bimodal vs LLM-metronomic.
- CV of `inter_cmd_iats``metronomic` (CV < 0.2) /
`variable` (0.2 ≤ CV < 1.0) / `bimodal` (CV ≥ 1.0 OR Hartigan dip
significant — v0.1 is CV-only, registry note flags v0.2 work).
- Tests + grid extension.
Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_consistency`.
### Step 9 — Calibration grid lockdown
**Goal:** the gate. After this step lands, no engine PR is allowed
to drop a primitive from any of the five classes.
- `tests/profiler/behave_shell/test_calibration_grid.py` parametrised
over the five shards from `BEHAVE/prototype_extractors/shell/`.
- For each shard, assert the **required primitive set** from the
integration doc's grid table is present in the output (subset
check, not exact match — engine is allowed to emit *more* than
the table requires).
- Skip with `pytest.importorskip` style if `BEHAVE_CALIBRATION_DIR`
unset — CI provides it, dev doesn't have to.
- This is the v0 gate.
Commit: `test(profiler/behave_shell): five-class calibration grid lockdown`.
### Step 10 — Phase A complete: calibration floor locked
**Goal:** Phase A done. **NOT v0 release** — v0 requires the full
Tier-A corpus (Phases BH below). Phase A delivers the 6-primitive
discriminative floor + the gate that future phases must not break.
- 6 primitives emitting (`motor.input_modality`,
`motor.paste_burst_rate`,
`cognitive.inter_command_latency_class`,
`cognitive.command_branch_diversity`,
`cognitive.feedback_loop_engagement`,
`cognitive.inter_command_consistency`).
- Calibration grid green across all five class shards.
- Worker can be wired against Phase A safely
(BEHAVE-INTEGRATION.md Phase 4 unblocks here, *not* at v0).
Commit: `feat(profiler/behave_shell): Phase A — calibration floor green`.
---
### Phase B — `motor.*` completion (4 primitives)
**Goal:** finish the motor family minus shell-mastery. All four
read existing `SessionContext` derived data; no new parsing.
| Step | Primitive | Source | Notes |
|---|---|---|---|
| B.1 | `motor.keystroke_cadence` | `ctx.iats` histogram shape | steady (uniform) / bursty (heavy-tailed) / hunt_and_peck (bimodal slow+fast) / machine (sub-typing-floor) |
| B.2 | `motor.motor_stability` | `ctx.iats` outlier rate | tremor = high-frequency outliers above CV-of-IATs threshold |
| B.3 | `motor.error_correction` | backspace events relative to preceding key | immediate (<500ms) / deferred (next word boundary) / absent / route_around (no backspaces, but command later replaced) |
| B.4 | `motor.command_chunking` | per-command IAT variance + word-boundary timing | fluent (low intra-cmd variance + tight word boundaries) / fragmented (high variance) / single_command (one-shot session) |
Per-step deliverable: feature function in `_features/motor.py`,
threshold constants in `_thresholds.py`, unit tests against
synthetic streams, calibration grid still green.
Commits (4): `feat(profiler/behave_shell): emit motor.{keystroke_cadence,motor_stability,error_correction,command_chunking}`.
### Phase C — `motor.shell_mastery.*` (3 primitives)
**Goal:** the shell-fluency block. Per-command counters; trivial
implementations once command segmentation is in place (Step 4).
| Step | Primitive | Source |
|---|---|---|
| C.1 | `motor.shell_mastery.tab_completion` | `\t` rate per command (none / occasional <30% / habitual ≥50%) |
| C.2 | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate (none / moderate / heavy) |
| C.3 | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command, median (shallow / moderate / deep) |
Commits (3): `feat(profiler/behave_shell): emit motor.shell_mastery.*`.
### Phase D — `cognitive.*` completion (8 primitives)
**Goal:** finish the cognitive family. Mix of cheap and expensive;
`cognitive_load` is a composite over earlier primitives.
| Step | Primitive | Source | Cost |
|---|---|---|---|
| D.1 | `cognitive.cognitive_load` | composite: IAT entropy + error rate + chunking variance | MEDIUM |
| D.2 | `cognitive.exploration_style` | command-graph branching shape (revisits, backtracks) | MEDIUM |
| D.3 | `cognitive.planning_depth` | think-pause-length distribution; deep = many >1.5s gaps before commands | LOW |
| D.4 | `cognitive.tool_vocabulary` | distinct first-tokens normalised by session length | LOW |
| D.5 | `cognitive.error_resilience.retry_tactic` | post-error command relation: rerun (same), modify (edit-and-retry), switch (different tool), abort (exit) | MEDIUM |
| D.6 | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta | LOW |
| D.7 | `cognitive.error_resilience.fallback_to_man` | `man`/`--help`/`-h` invocation post-error | LOW |
| D.8 | `cognitive.cognitive_load` re-tune (gate) | re-run calibration once D.1-D.7 stable | — |
Commits (7): one per primitive, plus a re-tune commit if needed.
### Phase E — `temporal.*` per-session subset (4 primitives)
**Goal:** the four temporal primitives that don't need observation
history. The other three temporal primitives (session_timing,
persistence, idle_periodicity) are **Tier B** and are filed in
`ATTRIBUTION-ENGINE.md` — do not implement here.
| Step | Primitive | Source | Cost |
|---|---|---|---|
| E.1 | `temporal.session_duration` | `ctx.duration_s` bucketed (short <60s / medium <600s / long <3600s / marathon ≥3600s) | TRIVIAL |
| E.2 | `temporal.escalation_pattern` | command-rate over rolling windows (sustained / erratic / bursty) | LOW |
| E.3 | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature match (`uname` / `id` / `whoami` / `pwd`) | LOW |
| E.4 | `temporal.lifecycle_markers.exit_behavior` | last command + exit timing (graceful `exit`/`logout` / abrupt session-cut / cleanup `history -c` etc.) | LOW |
Commits (4): per primitive.
### Phase F — `environmental.*` output-stream block (5 primitives)
**Goal:** the output-stream-dependent cluster. Lands a shared
prompt-string parser once, then five primitives consume it. **This
is the most expensive single phase** — the prompt parser has to
handle ANSI escape sequences, multi-line continuation, and
custom prompts.
| Step | Primitive | Source | Cost |
|---|---|---|---|
| F.0 | Prompt-string parser (`_parse.py`) | shared utility, no primitive | HIGH |
| F.1 | `environmental.shell_type` | prompt suffix sniff (`$`/`#`/`%`/`>`) + command syntax (bash / zsh / fish / cmd / powershell) | MEDIUM |
| F.2 | `environmental.terminal_multiplexer` | tmux/screen-specific escape sequences in output stream | LOW |
| F.3 | `environmental.locale` | `LANG`/`LC_*` envvars if attacker dumps env; output language sniff fallback (free string, BCP-47) | MEDIUM |
| F.4 | `environmental.keyboard_layout` | bigram-frequency fingerprint against known layouts (qwerty / azerty / qwertz / other) | HIGH |
| F.5 | `environmental.numpad_usage` | numeric input arrival pattern; weak signal — confidence cap | LOW |
Commits (6): F.0 prepares; F.1-F.5 ship one per primitive.
### Phase G — `operational.*` + `emotional_valence.*` (8 primitives)
**Goal:** the two soft families. Both want a small command-intent /
sentiment lexicon; combine into one phase to share the lexical
infrastructure.
| Step | Primitive | Source | Cost / Confidence |
|---|---|---|---|
| G.0 | Command-intent lexicon (`_features/_intent.py`) | shared first-token → category mapping (recon / exfil / persistence / lateral / destructive) | HIGH (corpus building) |
| G.1 | `operational.objective` | majority-category over session commands | MEDIUM |
| G.2 | `operational.opsec_discipline` | history-clearing / log-tampering / `.bash_history` removal patterns | MEDIUM |
| G.3 | `operational.cleanup_behavior` | exit-time cleanup commands (`rm`-of-touched-files, `unset HISTFILE`) | MEDIUM |
| G.4 | `operational.multi_actor_indicators` | mid-session pace/style shift detection (only `solo` and `handoff_detected` honest single-session; `team_coordinated` is Tier B) | HIGH |
| G.5 | `emotional_valence.valence` | lexical sentiment; positive / neutral / negative — **CONFIDENCE CAP 0.5** | LOW (soft) |
| G.6 | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs — **CAP 0.5** | LOW (soft) |
| G.7 | `emotional_valence.stress_response` | post-error speed-up (distress) vs slow-down (eustress) — **CAP 0.5** | LOW (soft) |
| G.8 | `emotional_valence.frustration_venting` | obscenity detection (`fuck`/`shit`/`damn`); registry value is binary — **CAP 0.5** | LOW (soft) |
Commits (9). All four `emotional_valence.*` primitives ship under a
**hard 0.5 confidence cap** by convention — these are the most
likely primitives to embarrass the project, and operators must not
act on them without corroboration.
### Phase H — Full-corpus lockdown + v0 release
**Goal:** prove every Tier-A primitive in the registry has a feature
function, tag v0.
| Step | Action |
|---|---|
| H.1 | **Registry-coverage test**: `tests/profiler/behave_shell/test_registry_coverage.py` walks `PRIMITIVE_REGISTRY`, filters out Tier-B and Tier-C primitives (explicit allow-list), asserts every remaining primitive appears in the output of at least one calibration shard. CI fails if the registry adds a primitive DECNET hasn't implemented yet. |
| H.2 | **Calibration grid full sweep**: re-run the five-class grid against the full primitive set; no regressions. |
| H.3 | **Live smoke**: ship a decky, run a real session from each calibration class, observe full primitive output in `observations` table + bus + AttackerDetail panel (mirrors integration-doc Phase 6). |
| H.4 | **Worker wired** (BEHAVE-INTEGRATION.md Phase 4 unblocks here). Pin `decnet-behave-core` / `decnet-behave-shell` in `pyproject.toml`. |
| H.5 | Tag v0; add `__version__ = "0.1.0"` to `behave_shell/__init__.py`. |
Commit: `feat(profiler/behave_shell): v0 — full Tier-A corpus, all 37 primitives emitting`.
### Per-phase rules (binding for all of BH)
1. **Calibration-grid gate is binding.** Every commit in BG runs
the grid; any drop in expected primitive sets fails CI.
2. **Registry-coverage test is binding from H onward.** New Tier-A
primitives added to BEHAVE's registry without a corresponding
DECNET feature function fail CI.
3. **Adding a primitive = adding a feature func + registering it +
threshold constants + tests in the same commit.** No sneaking
implementation in without tests, no sneaking tests in without the
calibration assertion.
4. **Phases BG can ship in any order**, but finish a phase before
starting another. Phase F is the hardest and should be sequenced
by reader stamina, not enthusiasm.
5. **Don't rush Phase G.** The soft primitives are the most likely
to embarrass the project. Calibrate against real-attacker shards
before tagging — and even then, hold the 0.5 confidence cap.
6. **Tier-B and Tier-C scope creep is forbidden.** The moment you
feel tempted to read a SECOND session inside `extract_session()`,
stop. That observation belongs to the attribution engine.
Don't promise a delivery date for any phase. Each lands when it's
honest. v0 ships when **every Tier-A primitive emits + every test
green** — not before.
---
## Out of scope for the engine
- **Attribution.** Per the integration doc's bright line. Engine
emits observations; some other thing decides what they mean. See
`ATTRIBUTION-ENGINE.md`.
- **Cross-session merge logic.** That's DEBT-051 / Tier-B
primitives. Engine sees one session at a time, period.
- **Tier-C `toolchain.*` primitives.** Network-domain sensors
(sniffer, prober, correlator) own these. Either via existing
workers wrapping their outputs as BEHAVE observations, or a future
BEHAVE-NETWORK extractor. Not this doc.
- **Persistence / bus.** Worker concerns. Engine is pure.
- **Dynamic primitive registration.** The `FEATURES` tuple is
hand-edited; no plugin loaders. New primitive = new feature func +
one-line registry edit + tests in the same commit.
- **Streaming / partial extraction.** Engine assumes a complete
session. Live mid-session inference is a v2 concern; needs a
separate state-keeping design.
- **`primitives.py` registry edits.** The engine consumes the
registry; never mutates it. If a primitive is missing, file a
BEHAVE-side commit per the integration doc's "BEHAVE-side commits"
rule.
- **Confidence calibration against ground truth.** The calibration
grid is a *discrimination* test, not a *correctness* test. True
ground-truth labels would require red-team exercises with logged
intent. Filed when that data exists.
---
## Implementation order checklist
A single page you can paste into a TODO and tick off. **Every box
unchecked = no v0 tag.**
### Phase A — Calibration floor (Steps 010)
- [ ] Step 0 — Scaffold + smoke test
- [ ] Step 1 — Asciinema parser + paste-burst detector
- [ ] Step 2 — `motor.input_modality` (FIRST PRIMITIVE)
- [ ] Step 3 — `motor.paste_burst_rate`
- [ ] Step 4 — Command segmentation in `SessionContext`
- [ ] Step 5 — `cognitive.inter_command_latency_class`
- [ ] Step 6 — `cognitive.command_branch_diversity`
- [ ] Step 7 — `cognitive.feedback_loop_engagement`
- [ ] Step 8 — `cognitive.inter_command_consistency`
- [ ] Step 9 — Calibration grid lockdown (the gate)
- [ ] Step 10 — Phase A complete: floor green
### Phase B — `motor.*` completion
- [ ] B.1 `motor.keystroke_cadence`
- [ ] B.2 `motor.motor_stability`
- [ ] B.3 `motor.error_correction`
- [ ] B.4 `motor.command_chunking`
### Phase C — `motor.shell_mastery.*`
- [ ] C.1 `motor.shell_mastery.tab_completion`
- [ ] C.2 `motor.shell_mastery.shortcut_usage`
- [ ] C.3 `motor.shell_mastery.pipe_chaining_depth`
### Phase D — `cognitive.*` completion
- [ ] D.1 `cognitive.cognitive_load`
- [ ] D.2 `cognitive.exploration_style`
- [ ] D.3 `cognitive.planning_depth`
- [ ] D.4 `cognitive.tool_vocabulary`
- [ ] D.5 `cognitive.error_resilience.retry_tactic`
- [ ] D.6 `cognitive.error_resilience.frustration_typing`
- [ ] D.7 `cognitive.error_resilience.fallback_to_man`
- [ ] D.8 cognitive.cognitive_load re-tune (gate)
### Phase E — `temporal.*` per-session
- [ ] E.1 `temporal.session_duration`
- [ ] E.2 `temporal.escalation_pattern`
- [ ] E.3 `temporal.lifecycle_markers.landing_ritual`
- [ ] E.4 `temporal.lifecycle_markers.exit_behavior`
### Phase F — `environmental.*` (output-stream block)
- [ ] F.0 Prompt-string parser (shared utility)
- [ ] F.1 `environmental.shell_type`
- [ ] F.2 `environmental.terminal_multiplexer`
- [ ] F.3 `environmental.locale`
- [ ] F.4 `environmental.keyboard_layout`
- [ ] F.5 `environmental.numpad_usage`
### Phase G — `operational.*` + `emotional_valence.*` (soft block)
- [ ] G.0 Command-intent lexicon (`_features/_intent.py`)
- [ ] G.1 `operational.objective`
- [ ] G.2 `operational.opsec_discipline`
- [ ] G.3 `operational.cleanup_behavior`
- [ ] G.4 `operational.multi_actor_indicators`
- [ ] G.5 `emotional_valence.valence` (cap 0.5)
- [ ] G.6 `emotional_valence.arousal` (cap 0.5)
- [ ] G.7 `emotional_valence.stress_response` (cap 0.5)
- [ ] G.8 `emotional_valence.frustration_venting` (cap 0.5)
### Phase H — Full-corpus lockdown + v0 release
- [ ] H.1 Registry-coverage test
- [ ] H.2 Calibration grid full sweep, no regressions
- [ ] H.3 Live smoke across all five calibration classes
- [ ] H.4 Worker wired + `pyproject.toml` pin
- [ ] H.5 Tag v0 (`__version__ = "0.1.0"`)
**44 boxes. 37 primitives. 1 v0.** Each box is a commit + tests in
the same commit.
---
**Owner:** ANTI.
**Implementation gate:** Step 0 starts after this doc is reviewed +
Phase 1 of `BEHAVE-INTEGRATION.md` lands (storage table exists).

View File

@@ -0,0 +1,680 @@
# BEHAVE Integration — Design
**Status:** pre-implementation. This doc is the spec; code follows.
**Tracks:** DEBT-050 (replaces stale DEBT-036).
**Spec source:** `/home/anti/Tools/BEHAVE` (sibling, never vendored).
**Engine home:** this repo, `decnet/profiler/behave_shell/` (sublibrary inside the existing `profiler` worker — no new daemon).
## Premise
ANTI built BEHAVE — an out-of-tree behavioural-observation framework
with a primitive registry, a registry-validated `Observation`
envelope, a DECNET-bus event adapter, and a five-class calibration
grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the
right substrate for keystroke-dynamics extraction.
The original DEBT-036 plan (hand-rolled `kd_*` columns on
`SessionProfile`) is obsolete. This doc replaces it with a
BEHAVE-aligned ingester that emits registry-validated observations on
the bus and persists them in a single generic table.
**Bright line, lifted from BEHAVE itself:** *BEHAVE emits
observations. It does not conclude.* DECNET is a consumer of
`attacker.observation.*` events; attribution / linkage / verdicts are
out-of-scope for this integration and live in their own (future)
attribution engine.
## Architectural placement
```
/home/anti/Tools/
├── BEHAVE/ sibling repo, separate git history
│ ├── core/ decnet-behave-core (envelope)
│ ├── BEHAVE-SHELL/ decnet-behave-shell (registry + adapter)
│ └── prototype_extractors/shell/ extract.py — JSONL → Observation stream
└── DECNET/ THIS repo
├── pyproject.toml pins decnet-behave-{core,shell}
├── decnet/profiler/ EXISTING worker — gains a sublibrary + a new trigger
│ ├── worker.py gains attacker.session.ended subscription
│ ├── behavioral.py UNCHANGED — networking-domain (LogEvent IATs, beacon detection)
│ ├── timing.py UNCHANGED — networking-domain
│ └── behave_shell/ NEW — pure extraction library
│ ├── __init__.py
│ ├── extract.py orchestration: parse → dispatch → assemble Observations
│ └── _features/ per-primitive-family modules
└── decnet/web/db/models/observations.py NEW — generic Observation table
```
**No new worker.** The existing `decnet-profiler.service` already
supervises this codepath. No new systemd unit, no new polkit rule, no
new heartbeat. The session-ended handler is a peer to the existing
scoring tick inside the same async loop.
**Audit finding (network vs PTY domains).** `behavioral.py` and
`timing.py` operate on `LogEvent` (network-level connection events
from `decnet.correlation.parser`), feeding the existing
`attacker_behavior` table — TCP fingerprint, OS guess, beacon
interval, behavior class. **Zero overlap with BEHAVE-SHELL**, which
operates on `AsciinemaEvent` (PTY input) and persists to the new
`observations` table. The two coexist; no rewrite, no migration, no
shared state.
Two repos, two commits, no vendoring. `pip install -e
../BEHAVE/core ../BEHAVE/BEHAVE-SHELL` for local dev; pinned wheels in
CI.
## BEHAVE is the spec. DECNET is the engine.
This is a *load-bearing* architectural fact, called out explicitly so
nobody (including future me) misreads the layout.
- **BEHAVE ships:** the primitive registry, the registry-validated
`Observation` envelope, the bus event adapter, the JSON schema.
Reference prototype extractor for spec validation only. BEHAVE will
**not** ship a production engine — that's not what the BEHAVE repo
is for.
- **DECNET ships:** the production extraction engine. It lives in
`decnet/profiler/behave_shell/`, written from scratch against the
BEHAVE spec, called from the existing profiler worker on
`attacker.session.ended`.
DECNET-side BEHAVE imports are spec-only:
```python
from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope, Window
from decnet_behave_shell.spec.primitives import PRIMITIVE_REGISTRY, get as get_primitive_spec
from decnet_behave_shell.spec.event_adapter import event_topic_for, to_event_payload
```
`Observation` is aliased to `ObservationEnvelope` so the storage
SQLModel can keep the `Observation`-flavoured class name where it's
useful, and the BEHAVE primitive-spec accessor is aliased away from
the bare name `get` to avoid shadowing in feature-extractor modules
that read dicts heavily.
That's it. No imports from `BEHAVE/prototype_extractors/`. The
prototype is read as **design notes** during the engine build, then
ignored. If the prototype yields a primitive the production engine
doesn't, that's a calibration delta to investigate, not a regression
in either direction.
### The extraction engine — DECNET-side
```
decnet/profiler/behave_shell/
├── __init__.py exposes extract_session()
├── extract.py orchestration: parse → dispatch → assemble Observations
└── _features/ feature-extractor modules, one per primitive family
├── motor.py cadence, paste burst, modality, shell mastery
├── cognitive.py latency class, consistency, branch diversity, feedback loop
├── temporal.py session timing, escalation pattern
└── ... others added as primitives are productionised
tests/profiler/behave_shell/
└── _features/ one test module per feature family, against synthetic streams
```
The library is **pure** — no I/O, no bus calls, no DB writes. Events
in → `Iterable[Observation]` out. The split between `extract.py`
(orchestration) and `_features/` (per-family implementations) keeps
each primitive's logic auditable in isolation — including the
threshold tables, which are the part most likely to drift across
calibration cycles. The worker (in `decnet/profiler/worker.py`) owns
all I/O: disk-reach, bus publish, DB upsert.
**The engine is its own first-class effort, not a side-effect of
this integration doc.** The five-class calibration grid is the
acceptance test. Beyond that, it has its own design surface
(threshold calibration methodology, per-primitive confidence scoring,
feature-family precedence rules) that this doc does not attempt to
fully specify — that belongs in a sibling `BEHAVE-EXTRACTOR.md` once
Phase 1 lands and we have the storage shape to write into.
**Calibration knowledge does leak across the repo boundary.** BEHAVE's
`primitives.py` carries empirical calibration notes (e.g. CLAUDE-FF
vs CLAUDE-CL on 2026-05-02) inline in the registry. The clean
separation "BEHAVE = pure spec, DECNET = pure engine" is leakier
than this doc would prefer; both repos must agree on what a primitive
*means* before the engine threshold tables are tuned. Treat the
registry's `notes:` field as ground truth and tune DECNET to match.
### BEHAVE-side commits (rare, for spec changes only)
The only reasons to touch the BEHAVE repo during this integration:
1. The DECNET engine discovers a primitive the registry needs and the
spec doesn't yet define → registry edit in BEHAVE → version bump
→ DECNET pin update.
2. The envelope schema needs a field DECNET can populate honestly
(e.g. a structured `evidence_ref` schema) → envelope edit → schema
`v` bump → `observations.envelope_v` column already tracks it.
These are not blockers for Phase 1. They land iteratively as the
engine matures.
## Versioning
| Axis | Current | DECNET pin |
|---|---|---|
| Envelope schema (`Observation.v`) | `1` | column `observations.envelope_v` tracks it |
| Schema URL | `https://behave.local/schema/observation/v1.json` | — |
| `decnet-behave-core` | `0.1.0` | `>=0.1.0,<0.2` |
| `decnet-behave-shell` | `0.1.0` | `>=0.1.0,<0.2` |
A future `v=2` envelope coexists in the same table without a
destructive migration — query by `envelope_v` when shape diverges.
Bump the cap in `pyproject.toml` when BEHAVE cuts `0.2.0`.
## Data flow
```
asciinema shard on disk
/var/lib/decnet/artifacts/{decky}/sessrec/sessions-YYYY-MM-DD.jsonl
│ disk-reach (host-local, never on bus)
bus: attacker.session.ended ─► decnet-profiler worker (existing)
(or poll fallback) │ → handler in worker.py
│ → calls behave_shell.extract_session(events) → Iterable[Observation]
│ (registry-validated by BEHAVE)
bus.publish(event_topic_for(obs.primitive),
to_event_payload(obs))
┌─────────────────────┼──────────────────────┐
▼ ▼ ▼
observations table AttackerDetail UI future: attribution engine,
(DECNET storage) (live SSE consumer) federation gossip, webhook export
```
Raw `[t,"i",d]` events never cross the worker→bus boundary. Bus
carries observation envelopes only. Disk-reach for the input stream
mirrors DEBT-047's pattern (filesystem-group-readable artifacts via
DEBT-035).
## Storage — the `observations` table
Generic table holding every BEHAVE envelope field, plus a single
DECNET-side denormalization (`attacker_uuid`) for cheap joins.
**Not a strict 1:1 mirror** — the envelope has no `attacker_uuid`;
DECNET adds it so AttackerDetail doesn't have to chase
`identity_ref → AttackerIdentity → attacker_uuid` on every read.
The SQLModel class is named `ObservationRow` to avoid colliding
with the BEHAVE `Observation` Pydantic class imported into the
same module.
```python
# decnet/web/db/models/observations.py
from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope
class ObservationRow(SQLModel, table=True):
__tablename__ = "observations"
# ── envelope fields (types match BEHAVE exactly) ─────────────
id: str = Field(primary_key=True) # envelope.id (uuid4().hex string)
identity_ref: str | None = None # envelope.identity_ref (str, not UUID)
primitive: str = Field(index=True) # 'motor.keystroke_cadence'
value: dict[str, Any] | str | int | float | bool | list = \
Field(sa_column=Column(JSON, nullable=False))
confidence: float
window_start_ts: float # flattened from envelope.window
window_end_ts: float
source: str
evidence_ref: str = Field(nullable=False) # NOT NULL for DECNET emissions; see "Idempotency"
envelope_v: int # envelope.v
ts: float = Field(index=True) # emission ts
# ── DECNET-side denormalization (NOT in BEHAVE envelope) ─────
attacker_uuid: UUID = Field(foreign_key="attackers.uuid", index=True)
__table_args__ = (
Index("ix_observations_attacker_primitive_ts",
"attacker_uuid", "primitive", "ts"),
Index("ix_observations_primitive_ts", "primitive", "ts"),
UniqueConstraint("evidence_ref", "primitive",
name="uq_observations_evidence_primitive"),
)
```
**SQLAlchemy `JSON` not `JSONB`** per the typed-evidence-dicts memory
rule (dual-backend MySQL + SQLite).
**`evidence_ref` is NOT NULL** for DECNET-emitted observations, even
though BEHAVE's envelope makes it `Optional[str]`. The worker's
"have we already profiled this session?" check (see Idempotency
below) keys on `evidence_ref`; if it's NULL the check breaks. The
shape `shard:{decky}/{service}/{date}.jsonl#sid` is mandatory at the
worker layer. If a future BEHAVE consumer needs nullable
evidence_ref, that's a separate observation source with its own
worker — not this one.
**`UniqueConstraint(evidence_ref, primitive)`** enforces idempotency
at the schema level, so a re-run of the worker on the same shard+sid
produces a DB-side conflict, not silent duplicate rows. SQLite and
MySQL both treat distinct (non-NULL) tuples as distinct in unique
indexes — safe across both backends since `evidence_ref` is
NOT NULL.
**No `_migrate_*` helper.** Pre-v1; `SessionProfile` and its `kd_*`
columns are deleted from `decnet/web/db/models/attackers.py`
outright. DEBT-011 (Alembic) remains deferred.
### Canonical queries
**Latest observation per primitive, for one attacker** (AttackerDetail
"current state" panel):
```sql
SELECT primitive, value, confidence, ts
FROM observations
WHERE attacker_uuid = :uuid
AND ts = (SELECT MAX(ts) FROM observations o2
WHERE o2.attacker_uuid = observations.attacker_uuid
AND o2.primitive = observations.primitive)
ORDER BY primitive;
```
(SQLite — no `DISTINCT ON`; window-function rewrite available if the
correlated subquery hot-spots.)
**Time-series for one primitive across all sessions of one attacker**
(for "is this typist drifting" charts, future):
```sql
SELECT ts, value, confidence
FROM observations
WHERE attacker_uuid = :uuid AND primitive = :primitive
ORDER BY ts;
```
## The session-ended handler — riding the existing profiler worker
```
decnet/profiler/
├── worker.py EXISTING — gains attacker.session.ended subscription
└── behave_shell/ NEW — pure extraction library (no I/O)
├── __init__.py
└── extract.py wraps the engine + disk-reach call site
tests/profiler/behave_shell/
├── __init__.py
├── test_extract.py unit tests against synthetic event streams
├── test_calibration_grid.py the five-class regression suite (Phase 5)
├── test_worker_session_ended_bus.py FakeBus path
└── test_worker_session_ended_poll.py DECNET_BUS_ENABLED=false path
```
(All tests live under `tests/`, mirroring the source tree per repo
convention. Existing `tests/profiler/test_session_profile.py` is
deleted alongside the `SessionProfile` model in Phase 1.)
**Trigger.** Subscribe to `attacker.session.ended` on the bus. Poll
fallback walks `Log` rows where `event_type='session_recorded'` and
no `observations` row carries the matching `evidence_ref`. Bus path
ships first; poll fallback ships in the same commit so
`DECNET_BUS_ENABLED=false` is supported from day one (DEBT-031
pattern).
**Disk-reach.** For each `(decky, service, sid)`, resolve the shard
via `_find_shard_with_sid` (already shipped, `323077b`). Open the
JSONL via `decnet/artifacts/paths.py:resolve_artifact_path`
(DEBT-047 — symlink-escape check, regex validation,
`ARTIFACTS_ROOT` env override). Slice the per-sid event list. Pass
to BEHAVE.
**Extraction.** Call
`decnet.profiler.behave_shell.extract_session(events, sid=..., source=...)`.
Receive `Iterable[Observation]`. Each is registry-validated at
construction by BEHAVE's `Observation` subclass; DECNET does not
re-validate.
**Resolve `attacker_uuid`.** Sessrec carries `(decky_name, service,
sid, src_ip, src_port)` per shard line. Resolve src_ip → attacker
via the existing `attackers.ip` index; create-if-missing per the
existing observe path. Stamp `identity_ref=NULL` until attribution
exists.
**Bus emission.** For each observation, **DECNET overrides BEHAVE's
adapter** to preserve sensor-side identifiers across the bus:
```python
# BEHAVE's to_event_payload() excludes id/ts/v because BEHAVE assumes
# the bus envelope carries them at the Event level. DECNET's bus
# (DEBT-029) auto-generates fresh id/ts/v on publish — there's no
# bus.publish overload that accepts envelope-level overrides. Without
# this merge, BEHAVE's id/ts/v would be silently lost, breaking
# cross-host dedup and federation gossip.
payload = to_event_payload(obs) | {"id": obs.id, "ts": obs.ts, "v": obs.v}
bus.publish(
topic = event_topic_for(obs.primitive), # 'attacker.observation.motor.keystroke_cadence'
payload = payload,
)
```
Subscribers reconstructing the envelope via
`from_event_payload(primitive, payload)` see the original BEHAVE id /
ts / v because they ride along in `payload`. The DECNET-bus Event
envelope's *own* id/ts/v (auto-generated) are bus-routing concerns,
distinct from observation identity.
**This is a known deviation from BEHAVE's wire-format docstring**
(`core/decnet_behave_core/spec/envelope.py:77-84`). If DECNET's bus
later grows envelope-level overrides on `publish()`, revert to the
upstream contract. Filed as a low-priority follow-up — not blocking.
Adapter import path is pure-stdlib — no DECNET imports inside BEHAVE.
DECNET is the consumer of BEHAVE's contract, never the other way
around.
**Persistence.** All observations from one session — i.e. one
`(decky, service, sid)` triple — commit as **a single transaction**.
Either the entire session lands in `observations` or none of it
does; partial-failure mid-session never leaves a half-profiled
attacker row.
Persist **first**, then publish to the bus best-effort. Bus is
fire-and-forget (DEBT-029 §6) — a publish failure does **not** roll
back the persisted rows, and a persist failure means nothing is
published. DB is the source of truth; the bus is the notification
layer only. Order matters: a downstream subscriber receiving an
`attacker.observation.*` event can immediately query the table and
find it; the inverse (publish-then-persist) would create a window
where subscribers chase rows that don't exist yet.
**Idempotency.** Enforced at the schema level by
`UniqueConstraint(evidence_ref, primitive)`. Re-running the worker
on the same shard+sid produces a DB-side conflict per row, which the
worker handles via `INSERT … ON CONFLICT DO UPDATE` (SQLAlchemy
upsert). Worker marks a session "profiled" by the existence of any
row matching its `evidence_ref` — no separate marker column. Because
the unique index makes accidental duplicates structurally
impossible, the marker check is honest.
## Bus topics
Add to `decnet/bus/topics.py`:
```python
ATTACKER_OBSERVATION_PREFIX = "attacker.observation"
# Wildcard patterns:
# attacker.observation.motor.*
# attacker.observation.cognitive.*
# attacker.observation.> (everything BEHAVE-SHELL emits)
```
Topic shape locked by BEHAVE's `event_topic_for()`; DECNET registers
the prefix for documentation and pattern-matching only. **Bus auth
is not topic-level** — per DEBT-029 §2 the bus uses
kernel-authenticated peer delivery (UNIX socket file permissions),
not topic ACLs. `bus/topics.py` change co-commits with a
wiki-checkout `Service-Bus.md` update (memory rule: "Document new
bus signals in the wiki").
## AttackerDetail consumer
### REST surface
`decnet/web/router/attackers/api_get_attacker_detail.py` swaps the
`SessionProfile` join for the latest-per-primitive query above.
Response shape gains:
```jsonc
{
// ... existing attacker fields ...
"observations": [
{
"primitive": "motor.input_modality",
"value": "pasted",
"confidence": 0.91,
"ts": 1714521660.456,
"source": "decnet/profiler/behave_shell/extract.py"
},
// ... one row per primitive observed for this attacker ...
]
}
```
Frontend (`AttackerDetail.tsx`) renders a "Behavioural primitives"
panel grouped by the registry's top-level domain (`motor.*`,
`cognitive.*`, `temporal.*`, `operational.*`, `environmental.*`,
`cultural.*`, `emotional_valence.*`, `toolchain.*`). Day-one render
priorities for the panel:
1. `motor.input_modality` — pasted vs typed vs mixed
2. `cognitive.feedback_loop_engagement` — closed_loop vs fire_and_forget
3. `cognitive.command_branch_diversity` — linear_playbook vs adaptive_branching
4. `cognitive.inter_command_latency_class` — typing_speed / llm_lightweight / llm_heavyweight / long
5. Everything else, alphabetised by primitive path.
These four are the highest-discriminative-value primitives in the
calibration grid; surfacing them first is what unblocks the "is this
the same operator class" hover story.
### Live-update SSE route
`GET /api/v1/attackers/{uuid}/events` — per-attacker SSE stream,
mirrors the per-topology pattern shipped in DEBT-030.
The route subscribes to `attacker.observation.*` filtered by
`identity_ref` / resolved `attacker_uuid`, plus
`attacker.fingerprint_rotated` / `attacker.scored` for the same
attacker.
Envelope identical to topology events:
`{v, type, ts, payload}`. Day-one event types:
`observation.<primitive>`, `fingerprint.rotated`, `attacker.scored`.
Auth: `?token=` query-param matching the existing per-topology and
`/stream` pattern. Snapshot-on-connect serves the latest-per-primitive
query result so the panel hydrates immediately, then live-forwards
bus events. 15s keepalive, mirrors the topology route.
The global `/stream` is **not** the right fit here — it fans out
every attacker's events to every subscriber, and the AttackerDetail
page only cares about one. Per-attacker route, like
per-topology.
## PII discipline
Binds at the BEHAVE layer; DECNET does not get to "improve" the
envelope by reading raw bodies into payloads.
- Raw `[t,"i",d]` keystroke events stay on disk. Worker reads,
extracts, discards.
- `evidence_ref` is a *pointer* (`shard:path#sid`), never the
evidence itself.
- `value` JSON is bounded by the registry's `ValueTypeSpec` — no
free-form blobs that could smuggle keystrokes.
- Bigram simhashes (when emitted via `cognitive.*` digraph
primitives) are *characters*, not *content* — already documented in
BEHAVE's primitives module.
**Canonical PII binding.** The authoritative statement is the module
docstring at `core/decnet_behave_core/spec/envelope.py:3-19` — it
forbids raw keystrokes, command bodies, credentials, and payload
bytes in observation values; `evidence_ref` is a pointer, never the
evidence. That docstring is binding on this DECNET integration.
*Not* `BEHAVE-SHELL/scratchpad.md` — scratchpads, by definition,
aren't binding policy surfaces.
## Calibration grid IS the regression test
`tests/profiler/behave_shell/test_calibration_grid.py` runs the
**pure engine** (`behave_shell.extract_session()` called directly,
no worker, no bus, no DB) against each of the five
`BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl`
shards (gitignored — fixture path resolved via
`BEHAVE_CALIBRATION_DIR` env var, skipped if unset). Asserts the
expected primitive set fires per class:
| Shard | Class | Required primitives in output |
|---|---|---|
| `sessions-2026-05-02.jsonl` | HUMAN | `motor.input_modality=typed`, `cognitive.inter_command_consistency=bimodal`, `cognitive.feedback_loop_engagement=closed_loop`, `cognitive.command_branch_diversity=adaptive_branching` |
| `sessions-2026-05-02-with-llm.jsonl` | YOU-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=occasional`, `cognitive.inter_command_latency_class=typing_speed`, `cognitive.command_branch_diversity=linear_playbook` |
| `sessions-2026-05-02-new.jsonl` | LW-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_lightweight`, `cognitive.command_branch_diversity=linear_playbook` |
| `sessions-2026-05-02-with-claude.jsonl` | CLAUDE-FF | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_heavyweight`, `cognitive.command_branch_diversity=linear_playbook`, `cognitive.feedback_loop_engagement=fire_and_forget` |
| `sessions-2026-05-02-closed-loop.jsonl` | CLAUDE-CL | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=long`, `cognitive.command_branch_diversity=adaptive_branching`, `cognitive.feedback_loop_engagement=closed_loop` |
Any extractor change that breaks one of these classifications fails
CI. The grid is the discriminative-power floor — calibration
refinement can *add* primitives, never silently *drop* them.
## Phase plan
Per the "commit per task" memory rule, each phase ships as one commit
with its own tests.
### Phase 1 — DECNET-side storage (no BEHAVE coupling yet)
- New `observations` table + SQLModel + repository methods.
- Drop `SessionProfile` + `kd_*` columns from
`decnet/web/db/models/attackers.py`.
- AttackerDetail API switches to the latest-per-primitive query.
Returns empty `observations: []` since nothing populates the table.
- `decnet/bus/topics.py` registers `attacker.observation.*` prefix.
- Tests: SQLModel CRUD, latest-per-primitive query against fixture
rows, empty-attacker contract.
### Phase 2 — DECNET extraction engine (`decnet/profiler/behave_shell/`)
- Production extractor written against the BEHAVE spec, pure library
(no I/O).
- One feature-family module per `_features/{motor,cognitive,temporal,...}.py`.
- Public entry: `extract_session(events, *, sid, source) -> Iterable[Observation]`.
- Tests in `tests/profiler/behave_shell/_features/`: per-feature unit
tests against synthetic event streams. The calibration-grid suite
(Phase 5) is the integration test.
- This phase has its own design surface — see `BEHAVE-EXTRACTOR.md`
(filed as a sibling doc when Phase 1 lands). Phases 1 and 2 are
largely independent; can run in parallel.
### Phase 3 — BEHAVE pin
- `pyproject.toml` pins `decnet-behave-core` and `decnet-behave-shell`
at whatever versions the engine settles on.
- CI install-time smoke: registry imports cleanly, envelope validates
a known-good observation.
### Phase 4 — Wire the trigger into the existing profiler worker
- `decnet/profiler/worker.py` gains an `attacker.session.ended`
subscription handler.
- Handler does: resolve shard via disk-reach → call
`behave_shell.extract_session()` → upsert into `observations` table
→ publish each observation on the bus.
- Poll fallback for `DECNET_BUS_ENABLED=false`.
- Trigger isolation: handler exceptions logged, do not affect the
existing scoring tick.
- Tests in `tests/profiler/behave_shell/`: FakeBus path, poll-only
path, disk-reach error paths, idempotency on re-run.
- **No new systemd unit.** The existing `decnet-profiler.service`
already supervises this code.
### Phase 5 — Calibration regression suite + UI surface
- `tests/profiler/behave_shell/test_calibration_grid.py` against all
five BEHAVE shards.
- New `GET /api/v1/attackers/{uuid}/events` SSE route (mirrors the
per-topology pattern from DEBT-030); snapshot-on-connect +
bus-forwarded `attacker.observation.*` events. Tests in
`tests/api/attackers/test_events_stream.py`.
- AttackerDetail.tsx renders the Behavioural primitives panel and
consumes the SSE route for live updates.
- Frontend Vitest coverage for the panel (DEBT-043 harness, shipped).
### Phase 6 — Live smoke
- Ship a decky, run a real SSH session from each calibration class
manually, disconnect, observe `observations` rows + bus events +
AttackerDetail panel.
- Document the smoke procedure in
`scripts/behave_shell/smoke.sh` (parallel to
`scripts/bus/smoke-mutator.sh` — per-feature dirs).
## Out of scope
Filed for future paydown when they bite. Do not let them creep into
this integration.
- **Attribution engine.** Consumes `attacker.observation.*`, emits
`attribution.profile.candidate.*`. BEHAVE explicitly separates
observation from attribution.
- **Federation gossip** of observations across swarm hosts.
- **Backfill** over historical shards (one-shot script when the
table lands; not a worker feature).
- **Webhook export** of observation streams (rides DEBT-037).
- **Observation retention / vacuum.** Pre-v1, no users to mislead;
filed when storage actually pressures.
- **`SessionProfile` data migration.** None — table ships empty
today, drop is destructive but lossless.
- **Cross-domain BEHAVE** (BEHAVE-TEXT integration for stylometric
analysis of attacker-typed messages, e.g. captured emails). Same
`observations` table will accept those envelopes when their primitive
registry is registered, but the wiring is a separate paydown.
## Resolved decisions (formerly open questions)
- **Q1 — engine location.** RESOLVED: BEHAVE's prototype is reference
code only, never imported by DECNET. The production extraction
engine lives in `decnet/profiler/behave_shell/` as a sublibrary of
the existing profiler worker — no new daemon, no new systemd unit.
(See "BEHAVE is the spec. DECNET is the engine.")
- **Q2 — emission granularity.** RESOLVED: **per-(sid, primitive).**
Every session emits its full primitive set; every emission
persists. The schema already supports it; this just locks in the
worker write loop. *More detail the better.*
- **Q3 — cross-session aggregation, day one.** RESOLVED: latest wins
per primitive in the AttackerDetail "current state" query. Simple,
honest, easy to reason about.
## Real open question — Cross-session aggregation, the right way
Q3's "latest wins" is a stopgap. The actual question is harder and
deserves its own design pass before AttackerDetail starts surfacing
attribution-flavoured claims:
> **When two sessions from the same attacker (or identity) emit
> conflicting values for the same primitive, what does the
> attacker-level view say?**
Concrete cases:
- Session A: `motor.input_modality = typed` (conf 0.92).
Session B (next day): `motor.input_modality = pasted` (conf 0.88).
Is this attacker `mixed`? Or did they switch tooling? Or did a
*different operator* take over the same credentialed access?
- `cognitive.feedback_loop_engagement` flips from `closed_loop` to
`fire_and_forget` between two sessions. Is this fatigue, a
handoff (`operational.multi_actor_indicators=handoff_detected`?),
or a script taking over from a human?
- `cognitive.command_branch_diversity = unknown` in a short session
vs `adaptive_branching` in a long session. Latest-wins would
collapse this to `unknown` if the short session lands second —
exactly the wrong answer.
**This is genuinely an attribution-engine concern**, not an
extraction concern. BEHAVE is firm on that bright line. The clean
answer is:
1. **DECNET stores all observations** (per-sid, per-primitive — Q2).
2. **AttackerDetail's day-one "current state" query is latest-wins**
(Q3) — not because it's right, but because it's *honestly
transparent* about being naïve.
3. **The right answer ships with the attribution engine** as a
separate paydown — likely as new `attribution.profile.*` topics
that emit a *derived* per-attacker primitive map with explicit
merge semantics (`stable` / `drifting` / `conflicted` /
`multi_actor`). Day-zero, that engine doesn't exist; day-one,
AttackerDetail just shows raw latest values + a "N
observations" hover.
Filed as **DEBT-051 — Cross-session BEHAVE primitive aggregation
(attribution engine)** when this doc is reviewed. Out of scope for
this integration; explicitly listed under "Out of scope" above.
---
**Owner:** ANTI.
**Implementation gate:** this doc reviewed → Phase 1 starts.

View File

@@ -277,7 +277,17 @@ The Workers panel (Config → Workers) landed with bus-based STOP but every STAR
**Status:** Open. Depends on the Workers panel (shipped) and `deploy/decnet-bus.service` pattern being extended to the other workers.
### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards)
### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards) — **STALE 2026-05-03, SUPERSEDED BY DEBT-050**
> **Stale.** This entry was drafted before BEHAVE-SHELL existed. It bakes the
> feature schema into hand-rolled `SessionProfile` columns (`kd_iki_mean`,
> `kd_burst_ratio`, …), which duplicates the registry in
> `BEHAVE/BEHAVE-SHELL/decnet_behave_shell/spec/primitives.py`, bypasses the
> registry-validated `Observation` envelope, and skips the bus event adapter
> (`event_topic_for` / `to_event_payload`) that already speaks DECNET's
> `attacker.observation.*` topic shape. The replacement plan is **DEBT-050**
> below. Original text preserved unchanged for context.
**Files:** `decnet/web/ingester.py` (or new sibling under `decnet/session_profiler/`), `decnet/web/db/models/attackers.py:SessionProfile` (table already exists, ships empty), `decnet/templates/_shared/sessrec/sessrec.c` (emitter side — already done), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer — already joins SessionProfile when present).
The `SessionProfile` SQLModel table has been committed to storage since session recording v1 landed (see `decnet/web/db/models/attackers.py:97-143`). Every column — `kd_iki_mean`, `kd_iki_stdev`, `kd_iki_p50`, `kd_iki_p95`, `kd_enter_latency_p50/p95`, `kd_burst_ratio`, `kd_think_ratio`, `kd_ctrl_backspace/wkill/ukill/abort/eof`, `kd_arrow_rate`, `kd_tab_rate`, `kd_digraph_simhash`, `total_keystrokes`, `session_duration_s` — is nullable by design because the **ingester that populates them does not exist yet** (documented as gap #2 in `SIGNAL_CAPTURE_AUDIT.md`). Every session that gets recorded lands an empty row (or, today, no row at all) while the `[t, "i", d]` event stream in the shard carries every signal those columns exist to capture.
@@ -317,7 +327,83 @@ All four signals fall out of the schema for free. CoV from `kd_iki_mean` + `kd_i
- The motivating-case wget session produces CoV ≈ 0.74 ± 0.05 when the ingester processes it — sanity check against the manual analysis.
- The AttackerDetail page surfaces at least `kd_iki_mean` + `kd_burst_ratio` somewhere in the keystroke-dynamics section, unblocking the "is this the same typist" hover story.
**Status:** Open. Depends on the shard-scan fallback (shipped in `323077b`) and `SessionProfile` schema (shipped with session recording v1). The bus-trigger path depends on DEBT-031's deferred `attacker.session.started/ended` topics, but poll-driven ingestion works today and can ship first.
**Status:** ⚠️ Stale — superseded by DEBT-050. Do not implement against this entry; the column-zoo design is the wrong shape now that BEHAVE-SHELL exists.
### DEBT-050 — BEHAVE-SHELL session-profile ingester worker (replaces DEBT-036)
**Files:** `decnet/session_profiler/worker.py` (**new**), `decnet/web/db/models/observations.py` (**new** — generic Observation table, see Storage), `decnet/web/db/models/attackers.py` (drop `SessionProfile` and its `kd_*` columns), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer surface — switch from SessionProfile join to per-primitive Observation latest-state query), `decnet/bus/topics.py` (admit `attacker.observation.*` prefix), `decnet/web/db/sqlmodel_repo/observations.py` (**new** — repository methods), `packaging/systemd/decnet-session-profiler.service` (**new**), `pyproject.toml` (pin `decnet-behave-core`, `decnet-behave-shell`), **BEHAVE repo (separate commit):** `BEHAVE/prototype_extractors/shell/extract.py` (refactor `__main__` into importable `extract_session()`).
**Context.** ANTI built BEHAVE — an out-of-tree behavioural-observation framework with its own primitive registry, registry-validated `Observation` envelope, DECNET-bus event adapter, and a five-class calibration grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the right substrate for keystroke-dynamics extraction; the original DEBT-036 entry predates it and got the schema wrong by inventing parallel columns. BEHAVE is a **separate repo** (mirrors `wiki-checkout` discipline — two repos, two commits per change).
**Design:**
1. **New worker** `decnet/session_profiler/worker.py`. Sibling of `decnet/ingester/`, supervised by a new `packaging/systemd/decnet-session-profiler.service` unit (mirrors DEBT-034's pattern). One process per host, agent-or-master-agnostic.
2. **Trigger.** Subscribe on the bus to `attacker.session.ended`; poll-fallback over `Log.event_type='session_recorded'` rows lacking a "profiled" marker (see Storage). Bus-optional per DEBT-031: `try get_bus(); except: warn-and-degrade-to-poll`.
3. **Disk-reach** (per DEBT-047 precedent). For each `(decky, service, sid)`, resolve the shard via `_find_shard_with_sid` (already shipped in `323077b`), open the JSONL, walk the per-sid event slice. **No raw `d` values cross the worker→bus boundary** — BEHAVE's envelope rules prohibit it, and disk-reach keeps the input stream host-local.
4. **Extraction.** Refactor `BEHAVE/prototype_extractors/shell/extract.py`'s `__main__` into an importable `extract_session(events: Iterable[AsciinemaEvent]) -> Iterable[Observation]`. Feed it the per-sid `[t,"i",d]` slice. Output is a stream of registry-validated `Observation`s, one per primitive that fired for the session. **Refactor lands in the BEHAVE repo as a separate commit** (two repos, two commits).
5. **Bus emission.** For each `obs`: `bus.publish(event_topic_for(obs.primitive), to_event_payload(obs))`. The adapter is pure-stdlib, no DECNET imports — DECNET is the consumer of *its* contract, not the other way around. Topic prefix `attacker.observation.*` registered in `decnet/bus/topics.py`.
6. **Storage — drop `SessionProfile`, new generic `Observation` table.** Schema mirrors the BEHAVE envelope 1:1 so persistence cannot drift from the wire format:
```
observations (
id UUID PRIMARY KEY, -- BEHAVE Observation.id
attacker_uuid UUID NOT NULL FK, -- denormalised from identity_ref or join-resolved
identity_ref UUID NULL, -- raw envelope field, may be null pre-attribution
primitive TEXT NOT NULL, -- 'motor.keystroke_cadence' etc.
value JSON NOT NULL, -- envelope shape; SQLAlchemy JSON not JSONB (memory rule)
confidence REAL NOT NULL,
window_start_ts REAL NOT NULL,
window_end_ts REAL NOT NULL,
source TEXT NOT NULL,
evidence_ref TEXT NULL, -- shard:sid pointer for disk-reach audit, never evidence itself
envelope_v INTEGER NOT NULL, -- BEHAVE Observation.v (currently 1)
ts REAL NOT NULL, -- emission ts
INDEX (attacker_uuid, primitive, ts DESC),
INDEX (primitive, ts DESC)
)
```
AttackerDetail's "current state per primitive" view = `SELECT DISTINCT ON (primitive) … ORDER BY primitive, ts DESC` (or the SQLite equivalent via window function). `SessionProfile` and its `kd_*` columns are dropped outright — pre-v1, no users to mislead, no migration ceremony (DEBT-011 still deferred; just edit the SQLModel).
7. **Packaging.** Pin `decnet-behave-core>=0.1.0,<0.2` and `decnet-behave-shell>=0.1.0,<0.2` in DECNET's `pyproject.toml`. Envelope schema is currently `v=1` (`https://behave.local/schema/observation/v1.json`); the `observations.envelope_v` column tracks it so a future `v=2` envelope can land alongside without a destructive migration. Local dev: `pip install -e ../BEHAVE/core ../BEHAVE/BEHAVE-SHELL`. CI installs the pinned wheels from a BEHAVE release tag — bump the cap when BEHAVE cuts `0.2.0`.
**Non-negotiables:**
- Registry validation is enforced at construction time by BEHAVE's `Observation` subclass — no DECNET-side primitive whitelist, no drift.
- Extractor refactor must keep `extract.py --summary` and the calibration-grid CLI flow working; the library entry-point is *additive*.
- `DECNET_BUS_ENABLED=false` keeps the worker functional in poll-only mode (mirrors DEBT-031).
- Idempotent on re-run: same shard + same sid → same observation set (sort+dedupe by primitive before emitting).
- PII discipline binds at the BEHAVE layer; DECNET does not get to "improve" the envelope by reading raw bodies into payloads.
**Acceptance:**
- Replay each of the five `BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl` calibration shards through the worker. Each session produces the BEHAVE-SHELL primitives that the README's class-signature column predicts (e.g. CLAUDE-FF: `motor.input_modality=pasted` + `motor.paste_burst_rate=habitual` + `cognitive.inter_command_latency_class=llm_heavyweight` + `cognitive.command_branch_diversity=linear_playbook` + `cognitive.feedback_loop_engagement=fire_and_forget`).
- AttackerDetail surfaces at least `motor.input_modality`, `cognitive.feedback_loop_engagement`, and `cognitive.command_branch_diversity` for any attacker with a profiled session.
- The five-class grid IS the regression test — any extractor change must keep all five sessions classifying within their expected primitive sets.
**Out of scope (defer to DEBT-051+ as they bite):**
- Attribution engine (consumes `attacker.observation.*`, emits `attribution.profile.candidate.*`). BEHAVE deliberately separates observation from attribution.
- Federation gossip of observations across swarm hosts.
- Backfill over historical shards.
- Webhook export of observation streams (rides DEBT-037).
**Status:** Open. Replaces DEBT-036. Depends on (a) BEHAVE-SHELL spec frozen at v0.x, (b) `extract.py` library refactor in the BEHAVE repo, (c) shard-scan fallback (shipped `323077b`).
### DEBT-051 — Cross-session BEHAVE primitive aggregation (attribution engine)
**Files:** `decnet/correlation/attribution/` (**new**), `decnet/web/db/models/attribution_state.py` (**new**), `decnet/bus/topics.py` (`attribution.profile.*` prefix), `decnet/web/router/attackers/api_get_attacker_detail.py` (state-badge wiring).
`BEHAVE-INTEGRATION.md`'s Q3 settled the AttackerDetail "current state" surface as **latest-wins per primitive** for v0 — honest about being naïve. The harder question — *how do conflicting observations across sessions of the same attacker resolve into a stable view?* — is filed here.
Concrete cases:
- Session A says `motor.input_modality = typed`, session B says `pasted`. Mixed? Operator switched tooling? Different operator on shared creds?
- `cognitive.feedback_loop_engagement` flips closed_loop ↔ fire_and_forget across sessions. Fatigue, handoff (`operational.multi_actor_indicators=handoff_detected`), or scripted takeover?
- A short session emits `cognitive.command_branch_diversity=unknown`; a long one emits `adaptive_branching`. Latest-wins would collapse to `unknown` if the short one lands second — exactly the wrong answer.
**This is genuinely an attribution-engine concern**, not an extraction concern (BEHAVE's bright line is firm on the split). The clean answer:
1. DECNET stores all observations per-(sid, primitive). ✅ Substrate ships in DEBT-050.
2. AttackerDetail's day-one query is latest-wins (Q3 above). ✅ Substrate ships in DEBT-050.
3. The right answer ships as a derived per-(attacker, primitive) state machine emitting `attribution.profile.state_changed` events with explicit merge semantics: `stable / drifting / conflicted / multi_actor / unknown`.
Full design in `development/ATTRIBUTION-ENGINE.md`. v0 scope: aggregation only over per-`attacker_uuid` proto-identities (sidesteps the still-deferred clusterer from `IDENTITY_RESOLUTION.md`); v1 widens to identity_uuid clustering; v2 federation gossip.
**Status:** Open. Depends on DEBT-050 v0 in production for ≥ 1 month (so the engine has observation data to merge against) + a calibration corpus that exercises drift / multi-actor scenarios end-to-end.
### ~~DEBT-035 — Artifacts written as the container uid, not the API's~~ ✅ RESOLVED 2026-05-02
**Files:** `decnet/cli/init.py`, `decnet/web/router/transcripts/api_get_transcript.py` (soft-fail kept as defence-in-depth).
@@ -717,7 +803,9 @@ user who needs it.
| ~~DEBT-032~~ | ✅ | Correlation / Prober | resolved 2026-05-03 |
| DEBT-033 | 🟡 Medium | Storage / Session recording | open |
| ~~DEBT-035~~ | ✅ | Artifacts / Filesystem perms | resolved 2026-05-02 |
| DEBT-036 | 🟡 Medium | Correlation / Keystroke dynamics | open |
| DEBT-036 | ⚠️ Stale | Correlation / Keystroke dynamics | superseded by DEBT-050 |
| DEBT-050 | 🟡 Medium | BEHAVE-SHELL session-profile ingester | open (replaces DEBT-036) |
| DEBT-051 | 🟡 Medium | Attribution engine / cross-session aggregation | open (depends on DEBT-050) |
| DEBT-037 | 🟡 Medium | Integration / Webhooks | open (tracks MVP follow-ups) |
| DEBT-038 | 🟡 Medium | Honeypot / SSH cred capture | open (document-only) |
| ~~DEBT-039~~ | ✅ | Honeypot / Cred emitters | resolved |
@@ -732,5 +820,5 @@ user who needs it.
| DEBT-048 | 🟡 Medium | TTP / Intel provider mapping review (recurring) | open / recurring |
| DEBT-049 | 🟡 Medium | TTP / Sigma adapter (post-v1) | open |
**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-036 (session-profile ingester), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1).
**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1), DEBT-050 (BEHAVE-SHELL session-profile ingester — replaces DEBT-036), DEBT-051 (attribution engine / cross-session aggregation). DEBT-036 is stale.
**Estimated remaining effort:** ~21 hours plus the new EmailLifter / TTP follow-ups. DEBT-030 Phase B (optimistic staged-buffer editor) is a follow-up, not debt.