docs(behave): integration + extractor + attribution design (DEBT-050 / 051)
Three sibling design docs plus DEBT.md updates that supersede the stale DEBT-036 with a BEHAVE-aligned plan. development/BEHAVE-INTEGRATION.md — five-phase rollout: storage (observations table mirroring the BEHAVE Observation envelope plus one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive) enforcing idempotency); engine (in decnet/profiler/behave_shell/ sublibrary, no new daemon, not in BEHAVE — DECNET is the engine); BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live smoke. Bus payload merges id/ts/v back in to preserve sensor identifiers across the bus envelope. development/BEHAVE-EXTRACTOR.md — engine route in eight phases (A–H). Phase A locks the 6-primitive calibration grid; Phases B–G expand horizontally; Phase H is the full Tier-A corpus + v0 release. v0 ships every shell-extractable primitive (37 of them); Tier B is cross-session and lives in the attribution engine; Tier C is network-domain (toolchain.*) and lives elsewhere. development/ATTRIBUTION-ENGINE.md — sublibrary inside decnet/correlation/ that consumes attacker.observation.* events and emits attribution.profile.* derived state. Five-state machine (unknown / stable / drifting / conflicted / multi_actor) with per- ValueKind merge functions. v0 closes DEBT-051; v1 adds the real clusterer; v2 federation gossip. The bright line forbidding attribution to natural persons is lifted directly from BEHAVE's envelope docstring. development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and DEBT-051 entries added; summary table + open list updated.
This commit is contained in:
572
development/ATTRIBUTION-ENGINE.md
Normal file
572
development/ATTRIBUTION-ENGINE.md
Normal file
@@ -0,0 +1,572 @@
|
||||
# Attribution Engine — Design
|
||||
|
||||
**Status:** pre-implementation. This doc is the spec; code follows.
|
||||
**Tracks:** DEBT-051 (cross-session BEHAVE primitive aggregation —
|
||||
named in `BEHAVE-INTEGRATION.md`).
|
||||
**Depends on:** `IDENTITY_RESOLUTION.md` (substrate shipped — table,
|
||||
FK, lifecycle topics), `BEHAVE-INTEGRATION.md` (observation
|
||||
producer), `DEBT-032` (fingerprint rotation, shipped).
|
||||
**Engine home:** this repo, `decnet/correlation/attribution/`
|
||||
(sublibrary inside the existing correlation worker — no new daemon).
|
||||
|
||||
## Premise
|
||||
|
||||
DECNET has three layers stacked above raw events. After
|
||||
`BEHAVE-INTEGRATION.md` ships, we have:
|
||||
|
||||
| Layer | What it stores | What it knows |
|
||||
|---|---|---|
|
||||
| **Observation** | `observations` table, one row per (sid, primitive) | "I saw value V for primitive P, sourced from session S, at time T, with confidence C." |
|
||||
| **Attacker** | `attackers` table, one row per source IP | "These observations all came from IP X." |
|
||||
| **Identity** | `attacker_identities` table (empty today — `IDENTITY_RESOLUTION.md`) | "These N attacker rows are the same hands." |
|
||||
|
||||
BEHAVE *emits*. Attackers are *observed*. The attribution engine is
|
||||
the layer that **concludes** — it links observations into identities
|
||||
and surfaces a per-identity primitive map with explicit merge
|
||||
semantics. This doc specifies it.
|
||||
|
||||
## The bright line — lifted from BEHAVE, binding here
|
||||
|
||||
The BEHAVE envelope module docstring
|
||||
(`core/decnet_behave_core/spec/envelope.py:20-26`) draws an explicit
|
||||
bright line:
|
||||
|
||||
> Explicitly NOT for: identity attribution to named natural persons;
|
||||
> access or admission decisions; biometric login; ML-driven user
|
||||
> identification. Those framings push into legal/ethics territory the
|
||||
> project will not walk into by accident.
|
||||
|
||||
That binding statement carries forward. The attribution engine:
|
||||
|
||||
- **Links observations to opaque identity UUIDs**, never to named
|
||||
persons.
|
||||
- **Emits probabilistic linkage**, never certainty.
|
||||
- **Does not gate access** to anything — it's an analytics surface.
|
||||
- **Does not output classifier verdicts** about "good" vs "bad"
|
||||
operators; it surfaces *behavioural coherence* (these observations
|
||||
cluster) and *behavioural drift* (this identity's primitives are
|
||||
changing), and stops there.
|
||||
|
||||
Crossing this line is grounds for ripping the engine out and
|
||||
starting over.
|
||||
|
||||
## What the engine IS, what it IS NOT
|
||||
|
||||
| IS | IS NOT |
|
||||
|---|---|
|
||||
| A clusterer + state machine over BEHAVE observations | A keystroke-dynamics extractor (that's the engine in `BEHAVE-EXTRACTOR.md`) |
|
||||
| The thing that writes `attacker_identities` rows | The thing that decides whether to block/alert/page on an attacker |
|
||||
| The producer of `attribution.profile.*` events | The producer of `attacker.observation.*` events |
|
||||
| Honest about uncertainty (every claim carries a confidence) | A binary classifier with an arbitrary threshold |
|
||||
| Replayable / deterministic given the same observation sequence | A black-box ML model |
|
||||
|
||||
## Architectural placement
|
||||
|
||||
```
|
||||
/home/anti/Tools/DECNET/
|
||||
├── decnet/correlation/ EXISTING worker — gains a sublibrary + a new trigger
|
||||
│ ├── worker.py gains attacker.observation.* subscription
|
||||
│ ├── fingerprint_rotation.py UNCHANGED — already shipped (DEBT-032)
|
||||
│ └── attribution/ NEW — pure attribution library
|
||||
│ ├── __init__.py exposes link_observation(), aggregate_identity()
|
||||
│ ├── linkage.py "which identity does this observation belong to?"
|
||||
│ ├── aggregate.py per-(identity, primitive) merge state machine
|
||||
│ ├── _signals/ per-signal scorers (jarm, hassh, kd, c2, ip)
|
||||
│ └── _thresholds.py named constants, calibration-cited
|
||||
└── decnet/web/db/models/
|
||||
├── attacker_identities.py EXISTING (IDENTITY_RESOLUTION.md substrate)
|
||||
└── attribution_state.py NEW — per-(identity, primitive) state rows
|
||||
```
|
||||
|
||||
**No new worker.** The existing `decnet-correlation.service`
|
||||
supervises this codepath. The correlation worker already owns
|
||||
cross-attacker reasoning (DEBT-032 fingerprint rotation lives there).
|
||||
Attribution is a natural peer.
|
||||
|
||||
**Audit finding (correlation vs profiler).** Profiler emits
|
||||
observations per-session (BEHAVE-SHELL extraction). Correlation
|
||||
consumes observations across sessions and decides identity. Two
|
||||
roles, two workers, clean cut. **Don't mix them.**
|
||||
|
||||
## Two responsibilities, kept separate
|
||||
|
||||
The engine has **two axes of work**, often confused:
|
||||
|
||||
### Axis 1 — Linkage
|
||||
|
||||
> "This new observation arrived. Which identity does it belong to?"
|
||||
|
||||
Inputs: one observation (just arrived) + the existing identity table.
|
||||
Output: one of {`assign-to-existing(uuid)`, `create-new()`,
|
||||
`defer(reason)`}.
|
||||
|
||||
Lives in `attribution/linkage.py`. Reads
|
||||
`attacker.observation.*` events; writes `attacker_identities` rows
|
||||
and `attackers.identity_id` FK; emits `identity.formed` /
|
||||
`identity.observation.linked` (existing topics from
|
||||
`IDENTITY_RESOLUTION.md`).
|
||||
|
||||
### Axis 2 — Aggregation
|
||||
|
||||
> "Given an identity's full observation history, what's the
|
||||
> per-primitive summary I should surface to AttackerDetail /
|
||||
> IdentityDetail?"
|
||||
|
||||
Inputs: all observations linked to one identity. Output: a
|
||||
per-primitive state map: `{primitive: (current_value, state, confidence, dispersion)}`
|
||||
where `state ∈ {stable, drifting, conflicted, multi_actor, unknown}`.
|
||||
|
||||
Lives in `attribution/aggregate.py`. Pure function — given the same
|
||||
observation set, returns the same state map (replayability is
|
||||
non-negotiable).
|
||||
|
||||
**These two axes are separable.** v0 ships **aggregation only** (over
|
||||
single-`attacker_uuid` proto-identities), solves DEBT-051. v1 adds
|
||||
linkage (real clustering across attacker_uuids). v2 adds federation.
|
||||
This ordering is deliberate — aggregation has narrower failure modes
|
||||
and doesn't require the linkage signals to be calibrated yet.
|
||||
|
||||
## v0 / v1 / v2 ladder
|
||||
|
||||
### v0 — Aggregation over per-attacker proto-identities
|
||||
|
||||
The substrate of `IDENTITY_RESOLUTION.md` ships empty: every
|
||||
`attackers` row has `identity_id = NULL`. No clusterer means no
|
||||
identity rows. v0 sidesteps this honestly: **treat each
|
||||
`attacker_uuid` as its own proto-identity** and aggregate
|
||||
observations over it.
|
||||
|
||||
What v0 delivers:
|
||||
- Per-(attacker_uuid, primitive) merge state machine.
|
||||
- New `attribution_state` table holding the derived state.
|
||||
- New `attribution.profile.*` bus topics emitting state transitions.
|
||||
- AttackerDetail's "current state" panel gains state badges
|
||||
(`stable / drifting / conflicted`) replacing today's naïve
|
||||
latest-wins surface from `BEHAVE-INTEGRATION.md` Q3.
|
||||
|
||||
What v0 does NOT do:
|
||||
- No clustering across IPs.
|
||||
- No identity rows ever populated.
|
||||
- `IdentityDetail.tsx` (already built per `IDENTITY_RESOLUTION.md`)
|
||||
stays unreached — there are no identities yet.
|
||||
|
||||
**v0 closes DEBT-051.** That's the explicit scope.
|
||||
|
||||
### v1 — Linkage (real clustering)
|
||||
|
||||
What changes:
|
||||
- Clusterer subscribes to high-confidence rotation-resistant signals
|
||||
(HASSH, payload simhashes, keystroke-dynamics simhash,
|
||||
C2 callbacks) and groups `attacker_uuid`s under
|
||||
`attacker_identities.uuid`.
|
||||
- v0's aggregation engine retargets from `attacker_uuid` to
|
||||
`identity_uuid` once a cluster forms.
|
||||
- `identity.formed` / `identity.observation.linked` /
|
||||
`identity.merged` (existing topics) start firing.
|
||||
- IdentityDetail.tsx starts seeing rows.
|
||||
|
||||
What v1 does NOT do:
|
||||
- No federation. Cluster decisions are master-local.
|
||||
- No retroactive observation re-linking once an identity is committed
|
||||
(that's a v1.5 problem, "stable" identities should be hard to
|
||||
un-link silently).
|
||||
|
||||
### v2 — Federation gossip
|
||||
|
||||
What changes:
|
||||
- Identities + their primitive-state maps gossip over the existing
|
||||
swarm mTLS infra to peer masters.
|
||||
- `schema_version` field on `attacker_identities`
|
||||
(`IDENTITY_RESOLUTION.md` Risk #3) becomes load-bearing.
|
||||
- Trust model is **social**, not cryptographic
|
||||
(memory rule: federation trust is invite-based/human).
|
||||
|
||||
Out of scope for this doc beyond noting it exists. Federation gets
|
||||
its own design pass.
|
||||
|
||||
---
|
||||
|
||||
## v0 design — Aggregation state machine
|
||||
|
||||
The whole reason DEBT-051 was filed. This is the load-bearing piece.
|
||||
|
||||
### State definitions
|
||||
|
||||
For each `(attacker_uuid, primitive)` pair, the engine maintains a
|
||||
state from this set:
|
||||
|
||||
| State | Meaning | When to assert |
|
||||
|---|---|---|
|
||||
| `unknown` | Insufficient observations to classify | Default; < 3 observations OR all-`unknown` values |
|
||||
| `stable` | Recent observations agree | Last N observations all share the same value |
|
||||
| `drifting` | Recent observations disagree with older | Recent N != older N, but recent N is internally consistent |
|
||||
| `conflicted` | Recent observations disagree with each other | Recent N is split (no majority) |
|
||||
| `multi_actor` | Strong signal that two operators share access | Conflicted + alternation pattern (operator A → B → A → B), not random flip |
|
||||
|
||||
### Per-primitive merge logic
|
||||
|
||||
The engine carries a per-`ValueKind` merge function. Categorical
|
||||
primitives dominate the calibration grid; numeric and hash primitives
|
||||
need different math:
|
||||
|
||||
#### Categorical (`motor.input_modality`, `cognitive.feedback_loop_engagement`, etc.)
|
||||
|
||||
Last-N window comparison. With `N = 5` (configurable in
|
||||
`_thresholds.py`):
|
||||
|
||||
```
|
||||
recent_5 = observations[-5:]
|
||||
older_5 = observations[-10:-5] # if available
|
||||
|
||||
if all(o.value == recent_5[0].value for o in recent_5):
|
||||
if older_5 and all(o.value == older_5[0].value for o in older_5):
|
||||
if recent_5[0].value != older_5[0].value:
|
||||
state = drifting
|
||||
else:
|
||||
state = stable
|
||||
else:
|
||||
state = stable # consistent with no older comparison
|
||||
elif majority_value(recent_5):
|
||||
state = stable # tolerant — one outlier in five is fine
|
||||
else:
|
||||
state = conflicted
|
||||
```
|
||||
|
||||
`multi_actor` triggers on conflicted + temporal alternation
|
||||
(operator A and B observations interleave on a session-level granularity,
|
||||
not just within one session). Lower-confidence detection;
|
||||
v0 emits at confidence ≤ 0.6 by design.
|
||||
|
||||
#### Numeric (`toolchain.c2.beacon_interval_ms`, etc.)
|
||||
|
||||
EWMA + dispersion. State = `stable` if dispersion < 20% of mean,
|
||||
`drifting` if mean shifts > 30% over recent window, `conflicted`
|
||||
if dispersion > 100%.
|
||||
|
||||
#### Hash (`toolchain.tls.jarm_server`, `toolchain.ssh.hassh_client`)
|
||||
|
||||
Already handled by DEBT-032 fingerprint rotation. Attribution engine
|
||||
*reads* `attacker.fingerprint_rotated` events, doesn't recompute.
|
||||
State = `stable` if no rotation, `drifting` if 1-2 rotations,
|
||||
`conflicted` if > 2 rotations in a tight window.
|
||||
|
||||
### Storage — the `attribution_state` table
|
||||
|
||||
Materialised view of the state machine. Re-derivable from
|
||||
`observations` + DEBT-032's rotation log; this table is a cache for
|
||||
cheap reads, not a source of truth.
|
||||
|
||||
```python
|
||||
# decnet/web/db/models/attribution_state.py
|
||||
|
||||
class AttributionStateRow(SQLModel, table=True):
|
||||
__tablename__ = "attribution_state"
|
||||
|
||||
# ── key ────────────────────────────────────────────────
|
||||
attacker_uuid: UUID = Field(foreign_key="attackers.uuid", primary_key=True)
|
||||
primitive: str = Field(primary_key=True)
|
||||
|
||||
# ── derived state ──────────────────────────────────────
|
||||
current_value: dict[str, Any] | str | int | float | bool | list = \
|
||||
Field(sa_column=Column(JSON, nullable=False))
|
||||
state: str # 'stable' | 'drifting' | 'conflicted' | 'multi_actor' | 'unknown'
|
||||
confidence: float # engine's confidence in the state assertion (not in any verdict)
|
||||
observation_count: int # how many observations underlie this state
|
||||
last_change_ts: float # when state last flipped
|
||||
last_observation_ts: float # most recent observation that fed this row
|
||||
|
||||
# ── audit ──────────────────────────────────────────────
|
||||
schema_version: int = 1 # for federation, mirrors AttackerIdentity convention
|
||||
updated_at: float
|
||||
|
||||
__table_args__ = (
|
||||
Index("ix_attribution_state_state", "state"),
|
||||
Index("ix_attribution_state_last_change", "last_change_ts"),
|
||||
)
|
||||
```
|
||||
|
||||
`(attacker_uuid, primitive)` is the composite PK — at most one state
|
||||
row per pair. v1 will rename `attacker_uuid` to a polymorphic
|
||||
`subject_uuid` keyed on either attackers or identities (deferred —
|
||||
don't pre-build the polymorphism before clustering ships).
|
||||
|
||||
### Bus topics
|
||||
|
||||
New, distinct from `IDENTITY_RESOLUTION.md`'s `identity.*` lifecycle
|
||||
topics:
|
||||
|
||||
| Topic | Payload | When |
|
||||
|---|---|---|
|
||||
| `attribution.profile.state_changed` | `{attacker_uuid, primitive, old_state, new_state, current_value, confidence, ts}` | State transitions (e.g. `stable` → `drifting`) |
|
||||
| `attribution.profile.multi_actor_suspected` | `{attacker_uuid, primitives: [], evidence_summary, confidence, ts}` | When ≥ 2 primitives independently signal `multi_actor`; correlation is the trigger, not any single primitive |
|
||||
|
||||
`identity.*` topics from `IDENTITY_RESOLUTION.md` stay reserved for
|
||||
v1 (clusterer-emitted lifecycle events). v0 doesn't touch them.
|
||||
|
||||
**Wiki:** `Service-Bus.md` documents these in the same commit that
|
||||
adds the constants (`feedback_wiki_bus_signals`).
|
||||
|
||||
### API surface
|
||||
|
||||
```
|
||||
GET /api/v1/attackers/{uuid}/attribution
|
||||
→ {
|
||||
"primitives": [
|
||||
{
|
||||
"primitive": "motor.input_modality",
|
||||
"current_value": "pasted",
|
||||
"state": "stable",
|
||||
"confidence": 0.91,
|
||||
"observation_count": 7,
|
||||
"last_change_ts": 1714521660.456
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
AttackerDetail.tsx merges this with the latest-per-primitive query
|
||||
from `BEHAVE-INTEGRATION.md`. The state badge is the new bit.
|
||||
|
||||
The SSE route from `BEHAVE-INTEGRATION.md`
|
||||
(`GET /api/v1/attackers/{uuid}/events`) gains forwarded
|
||||
`attribution.profile.state_changed` events so the badge updates live.
|
||||
|
||||
---
|
||||
|
||||
## Linkage signals (v1 — not v0)
|
||||
|
||||
For when v0 is stable and we promote attacker_uuid → identity_uuid.
|
||||
Documented here so v0 doesn't paint into a corner.
|
||||
|
||||
### Signal weights
|
||||
|
||||
Each signal contributes to a linkage score. Two `attacker_uuid`s
|
||||
with combined score above the threshold get clustered.
|
||||
|
||||
| Signal | Strength | Why | Cost |
|
||||
|---|---|---|---|
|
||||
| Same `kd_digraph_simhash` (Hamming distance < 8) | **STRONG** | Keystroke rhythm is hard to fake without effort | Computed at session-end by BEHAVE engine |
|
||||
| Same C2 callback endpoint | **STRONG** | Operator infra is sticky | Already extracted |
|
||||
| Same `hassh_client` | MEDIUM | Tools change less than IPs | Already in `attacker_behavior` |
|
||||
| Same `jarm_server` (if attacker exposes services) | MEDIUM | Probed-attacker substrate (DEBT-032) | Already shipped |
|
||||
| Same `tcp_fingerprint` cluster | WEAK | OS info, easily collided | Already in `attacker_behavior` |
|
||||
| Same source IP | **REJECT** | Triggers naïvely on NAT collisions; never use IP alone | n/a |
|
||||
|
||||
### Threshold
|
||||
|
||||
Single combined score, calibrated against:
|
||||
- **False merges**: two distinct attackers collapsed into one (silent
|
||||
miscount). HARD failure — engine refuses to merge below ~0.85.
|
||||
- **Missed merges**: two observations from the same operator
|
||||
unrelated. Soft failure — operator can review unmerged candidates
|
||||
in IdentityDetail's "candidate links" panel and merge manually.
|
||||
|
||||
The threshold lives in `_thresholds.py` like the BEHAVE-SHELL
|
||||
engine's; calibration cycle ships with the linkage code.
|
||||
|
||||
### Soft-merge audit trail
|
||||
|
||||
`attacker_identities.merged_into_uuid` already exists from
|
||||
`IDENTITY_RESOLUTION.md`. v1 uses it. When the clusterer reverses an
|
||||
earlier merge (rare but real), the loser row's `merged_into_uuid` is
|
||||
NULLed and a `attribution.profile.split_proposed` event surfaces in
|
||||
the operator's review queue.
|
||||
|
||||
---
|
||||
|
||||
## Phase plan
|
||||
|
||||
Per the "commit per task" + "tests per task" memory rules. Each
|
||||
phase is one commit.
|
||||
|
||||
### Phase 1 — Schema + topics + empty handler
|
||||
|
||||
- New `attribution_state` SQLModel + migration (none needed pre-v1,
|
||||
per the memory rule — just edit the model).
|
||||
- `decnet/bus/topics.py` registers `attribution.profile.*` prefix.
|
||||
- `decnet/correlation/worker.py` gains an
|
||||
`attacker.observation.*` subscription handler that does
|
||||
**nothing yet** — just logs. Proves the wiring.
|
||||
- Wiki `Service-Bus.md` update co-commits.
|
||||
- Tests: SQLModel CRUD on `attribution_state`, bus subscription
|
||||
handler is exercised by FakeBus.
|
||||
|
||||
Commit: `feat(correlation/attribution): substrate + idle handler`.
|
||||
|
||||
### Phase 2 — Categorical merge function
|
||||
|
||||
- `attribution/aggregate.py:_aggregate_categorical(observations) → (value, state, confidence)`.
|
||||
- Implements the last-N comparison logic above.
|
||||
- Pure function. Synthetic-input tests covering each state transition
|
||||
(unknown → stable → drifting → stable, conflicted, multi_actor).
|
||||
- No DB, no bus, no I/O.
|
||||
|
||||
Commit: `feat(correlation/attribution): categorical merge state machine`.
|
||||
|
||||
### Phase 3 — Hash + numeric merge functions
|
||||
|
||||
- `_aggregate_hash` reads `attacker_fingerprint_rotation` events
|
||||
(DEBT-032 already produces them).
|
||||
- `_aggregate_numeric` does EWMA + dispersion.
|
||||
- Per-`ValueKind` dispatcher in `aggregate.py` picks the right
|
||||
function.
|
||||
- Tests for each value-kind path.
|
||||
|
||||
Commit: `feat(correlation/attribution): hash + numeric merge functions`.
|
||||
|
||||
### Phase 4 — Wire into the worker
|
||||
|
||||
- Subscription handler reads each `attacker.observation.*` event,
|
||||
loads the prior `AttributionStateRow` (if any), runs the merger,
|
||||
upserts the new state, emits `attribution.profile.state_changed`
|
||||
on transition.
|
||||
- Trigger isolation: handler exceptions logged, do not affect
|
||||
fingerprint-rotation or any other correlator path.
|
||||
- Tests: end-to-end with FakeBus + in-memory DB, observation-in →
|
||||
state-row-out + transition-event-out.
|
||||
|
||||
Commit: `feat(correlation/attribution): wire bus handler, persist state`.
|
||||
|
||||
### Phase 5 — `multi_actor_suspected` cross-primitive correlator
|
||||
|
||||
- Periodic tick (every 60s default — configurable) walks
|
||||
`attribution_state` rows where `state = 'multi_actor'`, groups by
|
||||
`attacker_uuid`, fires
|
||||
`attribution.profile.multi_actor_suspected` if ≥ 2 primitives flag
|
||||
the same attacker_uuid concurrently.
|
||||
- Tests: synthetic state rows, assert event fires only on co-flag.
|
||||
|
||||
Commit: `feat(correlation/attribution): cross-primitive multi-actor detection`.
|
||||
|
||||
### Phase 6 — API surface
|
||||
|
||||
- `GET /api/v1/attackers/{uuid}/attribution` route + Pydantic model.
|
||||
- AttackerDetail.tsx renders state badges per primitive in the
|
||||
Behavioural Primitives panel.
|
||||
- SSE route forwarding `attribution.profile.state_changed` events
|
||||
filtered by attacker_uuid.
|
||||
- Frontend Vitest coverage.
|
||||
|
||||
Commit: `feat(web): expose attribution state on AttackerDetail`.
|
||||
|
||||
### Phase 7 — v0 lockdown
|
||||
|
||||
- Synthetic calibration scenarios (extending the BEHAVE-SHELL
|
||||
calibration grid concept):
|
||||
- "Stable HUMAN over 7 sessions" → all primitives `stable`
|
||||
- "HUMAN switches to LLM mid-week" → primitives flip
|
||||
`stable` → `drifting`
|
||||
- "Two operators alternating on shared creds" → ≥ 2 primitives
|
||||
flag `multi_actor`
|
||||
- "Single short session" → all primitives `unknown`
|
||||
- All four scenarios green in CI.
|
||||
|
||||
Commit: `test(correlation/attribution): v0 calibration lockdown`.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope
|
||||
|
||||
Filed for future paydown when they bite. Do not let them creep into
|
||||
v0.
|
||||
|
||||
- **Linkage / clustering across attacker_uuids.** That's v1.
|
||||
- **Federation gossip of identities.** That's v2.
|
||||
- **Identity-level intel** (`attacker_identity_intel` from
|
||||
`IDENTITY_RESOLUTION.md`). Different lifecycle, ships with v1.
|
||||
- **Manual operator merge UI.** Operators can't fix clusterer
|
||||
mistakes from the dashboard — the read-only API stays read-only
|
||||
in v0. Editable identity rows are a v1 concern.
|
||||
- **Retroactive re-aggregation** when thresholds change. v0
|
||||
recomputes lazily on next observation per attacker; no batch
|
||||
re-walk.
|
||||
- **Confidence calibration against ground truth.** No ground-truth
|
||||
data exists yet. v0 confidence values are heuristic; calibration
|
||||
ships when red-team exercises produce labelled trace data.
|
||||
- **Persona-classification** (e.g. "this identity behaves like a
|
||||
bot"). The bright line forbids this. State machine emits
|
||||
*coherence* and *drift*, not classifier labels.
|
||||
|
||||
## Resolved decisions
|
||||
|
||||
- **Where the engine lives.** RESOLVED:
|
||||
`decnet/correlation/attribution/`, sublibrary inside the existing
|
||||
correlation worker. No new daemon. Symmetric with BEHAVE-SHELL's
|
||||
placement under `decnet/profiler/behave_shell/`.
|
||||
- **Linkage vs aggregation separation.** RESOLVED: two axes, two
|
||||
modules (`linkage.py` / `aggregate.py`). v0 ships aggregation
|
||||
only.
|
||||
- **Topic namespace.** RESOLVED: `attribution.profile.*` for
|
||||
derived state, distinct from `IDENTITY_RESOLUTION.md`'s
|
||||
`identity.*` lifecycle topics. The two namespaces compose; they
|
||||
don't overlap.
|
||||
- **State machine vocabulary.** RESOLVED:
|
||||
`unknown / stable / drifting / conflicted / multi_actor`.
|
||||
Five states, no more (resist the urge to grow the enum).
|
||||
- **Subject of attribution in v0.** RESOLVED: `attacker_uuid`,
|
||||
not `identity_uuid`. v1 widens.
|
||||
|
||||
## Real open questions
|
||||
|
||||
These are not stoppers for v0 but need answers before the engine
|
||||
ships beyond v0.
|
||||
|
||||
1. **`multi_actor` false-positive cost.** A flapping primitive can
|
||||
look like multi-actor when it's really an operator on a flaky
|
||||
network or split between phone/laptop. v0's confidence ≤ 0.6 cap
|
||||
helps but doesn't eliminate it. Open: what's the operator-facing
|
||||
UX for a `multi_actor` claim that's wrong?
|
||||
2. **Window size `N`.** v0 hardcodes `N=5` for last-N comparison.
|
||||
This is calibrated against typical session counts (most attackers
|
||||
are observed < 10 times before they go quiet). Operators with
|
||||
long-running attackers (resident threats) may want a wider
|
||||
window; needs config knob in v1.
|
||||
3. **Primitive-weight asymmetry.** Today every primitive contributes
|
||||
equally to the implicit "is this attacker behavioural-stable?"
|
||||
summary. But `motor.input_modality` is far more discriminative
|
||||
than `temporal.weekend_cadence`. Open: do we expose primitive
|
||||
weights in the API, or just sort by confidence?
|
||||
4. **Observation-to-row contention.** A burst of observations for
|
||||
the same `(attacker_uuid, primitive)` pair (e.g. a long session
|
||||
with 50 sub-observations) hits the same row 50 times. v0 reads
|
||||
the row, runs the merger, writes back — under load this is a
|
||||
serialised hot path. Open: should the merger batch-process within
|
||||
one tick, or is per-observation latency cheap enough?
|
||||
5. **What happens to `attribution_state` rows when an
|
||||
`attacker_uuid` is deleted?** No `attackers` deletion path
|
||||
exists today, but if/when one ships (GDPR purge, federation
|
||||
resync), `ON DELETE CASCADE` is the obvious choice. File when it
|
||||
matters.
|
||||
|
||||
---
|
||||
|
||||
## Implementation order checklist
|
||||
|
||||
A single page you can paste into a TODO and tick off:
|
||||
|
||||
- [ ] Phase 1 — Schema + topics + idle handler
|
||||
- [ ] Phase 2 — Categorical merge function (pure, no I/O)
|
||||
- [ ] Phase 3 — Hash + numeric merge functions
|
||||
- [ ] Phase 4 — Wire bus handler, persist state
|
||||
- [ ] Phase 5 — `multi_actor_suspected` cross-primitive correlator
|
||||
- [ ] Phase 6 — API + AttackerDetail badges + SSE forwarding
|
||||
- [ ] Phase 7 — v0 calibration scenarios lockdown
|
||||
|
||||
Seven commits, seven test sets. v0 closes DEBT-051 and gives
|
||||
operators an honest "is this attacker behaviourally stable, drifting,
|
||||
or showing multiple operators?" surface — without crossing the
|
||||
attribution-of-natural-persons bright line.
|
||||
|
||||
After v0, v1 (linkage / clustering) is gated on:
|
||||
- v0 stable in production for ≥ 1 month
|
||||
- ≥ 1 high-discrimination linkage signal calibrated
|
||||
(keystroke-dynamics simhash from BEHAVE-SHELL is the obvious
|
||||
candidate; v1 of the BEHAVE engine adds it post-step-10)
|
||||
|
||||
---
|
||||
|
||||
**Owner:** ANTI.
|
||||
**Implementation gate:** this doc reviewed → Phase 1 starts after
|
||||
`BEHAVE-INTEGRATION.md` v0 is live (observation table populated +
|
||||
worker emitting `attacker.observation.*` events).
|
||||
702
development/BEHAVE-EXTRACTOR.md
Normal file
702
development/BEHAVE-EXTRACTOR.md
Normal file
@@ -0,0 +1,702 @@
|
||||
# BEHAVE-SHELL Extraction Engine — Implementation Route
|
||||
|
||||
**Status:** pre-implementation. Sibling to `BEHAVE-INTEGRATION.md`.
|
||||
**Scope:** the inside of `decnet/profiler/behave_shell/`. Nothing else.
|
||||
**Acceptance gate:** the five-class calibration grid in
|
||||
`BEHAVE-INTEGRATION.md` §"Calibration grid IS the regression test."
|
||||
|
||||
This doc is the **construction manual** for the engine. The
|
||||
integration doc says *what* the engine plugs into; this doc says
|
||||
*how to build it from zero to v0 in a deterministic sequence*.
|
||||
|
||||
---
|
||||
|
||||
## Mission
|
||||
|
||||
Take an asciinema-style PTY event stream for one session, return an
|
||||
`Iterable[Observation]` of BEHAVE-SHELL primitives. Pure library:
|
||||
no I/O, no bus, no DB. Worker owns those.
|
||||
|
||||
```python
|
||||
def extract_session(
|
||||
events: Iterable[AsciinemaEvent], # [t_float, kind: 'i'|'o', data: str]
|
||||
*,
|
||||
sid: str,
|
||||
source: str = "decnet/profiler/behave_shell/extract.py",
|
||||
) -> Iterable[Observation]:
|
||||
```
|
||||
|
||||
`AsciinemaEvent` is a 3-tuple `(t, kind, data)` matching the on-disk
|
||||
shard line format. No fancy class — a tuple is honest about what it is.
|
||||
|
||||
## Single-pass discipline
|
||||
|
||||
A naïve engine re-walks the event stream once per primitive, paying
|
||||
O(n × primitives) for nothing. We don't do that.
|
||||
|
||||
Single pass over events builds a `SessionContext` — a precomputed
|
||||
bundle of indexes that every feature module reads from. Cheap; one
|
||||
walk; reproducible.
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class SessionContext:
|
||||
sid: str
|
||||
source: str
|
||||
evidence_ref: str
|
||||
t_start: float
|
||||
t_end: float
|
||||
duration_s: float
|
||||
|
||||
# Raw event slices (already filtered by kind)
|
||||
input_events: tuple[InputEvent, ...] # ('i', t, data)
|
||||
output_events: tuple[OutputEvent, ...] # ('o', t, data)
|
||||
|
||||
# Derived once, used everywhere
|
||||
iats: tuple[float, ...] # IATs between input events
|
||||
paste_bursts: tuple[PasteBurst, ...] # detected paste regions
|
||||
commands: tuple[Command, ...] # split on \r / \n
|
||||
inter_cmd_iats: tuple[float, ...] # IATs between command boundaries
|
||||
output_per_cmd: tuple[int, ...] # output bytes between cmd_i and cmd_{i+1}
|
||||
```
|
||||
|
||||
All feature modules take `ctx: SessionContext` and yield 0 or more
|
||||
Observations. Single source of truth, single parse cost.
|
||||
|
||||
## Engine layout
|
||||
|
||||
```
|
||||
decnet/profiler/behave_shell/
|
||||
├── __init__.py re-exports extract_session
|
||||
├── extract.py extract_session() + SessionContext build
|
||||
├── _parse.py asciinema event types + parsing helpers
|
||||
├── _ctx.py SessionContext dataclass + builders
|
||||
├── _thresholds.py all numeric thresholds, one place, named constants
|
||||
└── _features/
|
||||
├── __init__.py FEATURES tuple — registered list of feature funcs
|
||||
├── motor.py
|
||||
├── cognitive.py
|
||||
└── temporal.py (later)
|
||||
```
|
||||
|
||||
`extract.py` is short:
|
||||
|
||||
```python
|
||||
def extract_session(events, *, sid, source="..."):
|
||||
ctx = build_session_context(events, sid=sid, source=source)
|
||||
for feature_fn in FEATURES:
|
||||
yield from feature_fn(ctx)
|
||||
```
|
||||
|
||||
That's the whole orchestration. Adding a primitive = adding a function
|
||||
to `_features/<family>.py` and registering it in `FEATURES`.
|
||||
|
||||
## Threshold table convention
|
||||
|
||||
Every numeric threshold lives in `_thresholds.py` as a named constant
|
||||
with a docstring citing the registry's `notes:` field. **Never inline
|
||||
magic numbers in feature code.** When calibration drifts, you change
|
||||
one file.
|
||||
|
||||
```python
|
||||
# decnet/profiler/behave_shell/_thresholds.py
|
||||
"""Numeric thresholds for BEHAVE-SHELL primitive classification.
|
||||
|
||||
Each constant cites its calibration source. When the registry's
|
||||
`notes:` field disagrees with a constant here, the registry is
|
||||
authoritative — fix the constant, re-run the grid.
|
||||
"""
|
||||
|
||||
# motor.paste_burst_rate buckets — events per minute of session
|
||||
PASTE_RATE_OCCASIONAL_MIN = 0.5 # at least one paste every two minutes
|
||||
PASTE_RATE_HABITUAL_MIN = 3.0 # paste-driven workflow
|
||||
|
||||
# cognitive.inter_command_latency_class — seconds (median IAT between commands)
|
||||
ICL_TYPING_SPEED_MAX = 2.0
|
||||
ICL_DELIBERATE_MAX = 8.0
|
||||
ICL_LLM_LIGHTWEIGHT_MAX = 8.0 # 2-8s band; lower bound = ICL_TYPING_SPEED_MAX
|
||||
ICL_LLM_HEAVYWEIGHT_MAX = 30.0 # 8-30s band — registry primitives.py:140-149
|
||||
# > 30s = "long"
|
||||
```
|
||||
|
||||
## Full registry scope — what the engine owns, what it doesn't
|
||||
|
||||
Before the route: a sober count. The BEHAVE-SHELL registry today
|
||||
contains roughly **53 primitives** across 8 top-level domains. Not
|
||||
all of them are extractable from a single PTY session; some need
|
||||
observation history; some belong to a different sensor entirely.
|
||||
|
||||
Three tiers:
|
||||
|
||||
### Tier A — Per-session shell-extractable (37 primitives)
|
||||
|
||||
Computable from one `(decky, service, sid)` shard. The extractor
|
||||
owns these end-to-end.
|
||||
|
||||
| Domain | Primitive | Source signal |
|
||||
|---|---|---|
|
||||
| motor | `motor.input_modality` | paste-burst detector |
|
||||
| motor | `motor.paste_burst_rate` | paste-burst counter |
|
||||
| motor | `motor.keystroke_cadence` | IAT histogram shape |
|
||||
| motor | `motor.motor_stability` | IAT outlier rate |
|
||||
| motor | `motor.error_correction` | backspace-relative-to-error timing |
|
||||
| motor | `motor.command_chunking` | intra-command IAT variance |
|
||||
| motor | `motor.shell_mastery.tab_completion` | `\t` rate per command |
|
||||
| motor | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate |
|
||||
| motor | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command |
|
||||
| cognitive | `cognitive.inter_command_latency_class` | median inter-command IAT bucketed |
|
||||
| cognitive | `cognitive.inter_command_consistency` | CV of inter-command IATs |
|
||||
| cognitive | `cognitive.command_branch_diversity` | unique-first-token / total-commands |
|
||||
| cognitive | `cognitive.feedback_loop_engagement` | Pearson r(output_bytes, next_pause) |
|
||||
| cognitive | `cognitive.cognitive_load` | composite (IAT entropy + error rate + chunking) |
|
||||
| cognitive | `cognitive.exploration_style` | command-graph branching shape |
|
||||
| cognitive | `cognitive.planning_depth` | think-pause-length distribution |
|
||||
| cognitive | `cognitive.tool_vocabulary` | distinct first-tokens normalised |
|
||||
| cognitive | `cognitive.error_resilience.retry_tactic` | post-error command relation |
|
||||
| cognitive | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta |
|
||||
| cognitive | `cognitive.error_resilience.fallback_to_man` | `man`/`--help` invocation post-error |
|
||||
| temporal | `temporal.session_duration` | `duration_s` bucketed |
|
||||
| temporal | `temporal.escalation_pattern` | command-rate over rolling windows |
|
||||
| temporal | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature |
|
||||
| temporal | `temporal.lifecycle_markers.exit_behavior` | last-command + exit-code analysis |
|
||||
| operational | `operational.objective` | command-intent classifier (recon / exfil / persistence / lateral / destructive) |
|
||||
| operational | `operational.opsec_discipline` | history-clearing, log-tampering, .bash_history rm |
|
||||
| operational | `operational.cleanup_behavior` | exit-time cleanup commands |
|
||||
| operational | `operational.multi_actor_indicators` | mid-session pace/style shift detection |
|
||||
| environmental | `environmental.shell_type` | prompt-string sniff from `'o'` events |
|
||||
| environmental | `environmental.terminal_multiplexer` | tmux/screen escape sequences |
|
||||
| environmental | `environmental.keyboard_layout` | bigram-frequency layout fingerprint |
|
||||
| environmental | `environmental.locale` | `LANG`/`LC_*` envvar dump if `env` runs; output language sniff |
|
||||
| environmental | `environmental.numpad_usage` | numeric input arrival pattern (weak) |
|
||||
| emotional_valence | `emotional_valence.valence` | obscenity / praise / neutral lexicon |
|
||||
| emotional_valence | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs |
|
||||
| emotional_valence | `emotional_valence.stress_response` | post-error speed-up vs slow-down |
|
||||
| emotional_valence | `emotional_valence.frustration_venting` | `fuck`/`shit`/etc. detection (registry value is binary) |
|
||||
|
||||
The emotional_valence primitives are SOFT and will produce false
|
||||
positives. Documented as such; emit at confidence ≤ 0.5 per the
|
||||
confidence convention.
|
||||
|
||||
### Tier B — Cross-session (computed by attribution engine, not extractor)
|
||||
|
||||
8 primitives that **cannot honestly be computed from one session**.
|
||||
The extractor does not emit these. The attribution engine
|
||||
(`ATTRIBUTION-ENGINE.md`) computes them during aggregation, reading
|
||||
the per-attacker observation history. Cross-reference: a TODO in
|
||||
`ATTRIBUTION-ENGINE.md` notes that aggregation may include
|
||||
*derivation*, not just *merging*.
|
||||
|
||||
| Domain | Primitive | Why cross-session |
|
||||
|---|---|---|
|
||||
| temporal | `temporal.session_timing` | diurnal/nocturnal/irregular requires multiple sessions |
|
||||
| temporal | `temporal.persistence` | hit_and_run/return_visitor/resident is intrinsically multi-session |
|
||||
| temporal | `temporal.lifecycle_markers.idle_periodicity` | periodicity needs a long enough sample |
|
||||
| cultural | `cultural.meal_break_gaps` | gap pattern over days |
|
||||
| cultural | `cultural.periodic_micro_pauses` | needs many sessions to find regular intervals |
|
||||
| cultural | `cultural.dst_behavior` | needs sessions spanning a DST transition |
|
||||
| cultural | `cultural.weekend_cadence` | needs a week+ of sessions |
|
||||
| cultural | `cultural.holiday_gaps` | needs ≥ a year for honest claim |
|
||||
|
||||
If you find yourself implementing one of these in the extractor,
|
||||
**stop**. It's an attribution-engine concern.
|
||||
|
||||
### Tier C — Network domain (out of scope for this engine entirely)
|
||||
|
||||
The full `toolchain.*` subtree —
|
||||
TLS / transport / SSH / HTTP / C2 / protocol_abuse / payload
|
||||
fingerprints. Roughly 25 primitives. These come from the sniffer /
|
||||
prober / correlation pipeline, not from PTY session extraction.
|
||||
|
||||
Two paths to populate them, both NOT this doc:
|
||||
|
||||
1. **Wrap existing DECNET workers** (sniffer, prober, correlation,
|
||||
intel) to emit `attacker.observation.toolchain.*` from their
|
||||
existing outputs. Pragmatic, ships sooner. Filed as a future
|
||||
"wire existing producers to BEHAVE" track (mentioned in
|
||||
`BEHAVE-INTEGRATION.md` Out of Scope, around the
|
||||
`toolchain.c2.beacon_*` overlap with profiler's existing
|
||||
`behavioral.py`).
|
||||
2. **Future BEHAVE-NETWORK extractor** parallel to BEHAVE-SHELL,
|
||||
eating PCAP / netflow / TLS-handshake records. Cleaner long-term
|
||||
architecture; substantial effort.
|
||||
|
||||
Either way, **not extractor work for this doc.**
|
||||
|
||||
## Confidence convention
|
||||
|
||||
Every emitted Observation must carry a `confidence` in `[0.0, 1.0]`.
|
||||
Three rules:
|
||||
|
||||
1. **Sample-size honesty.** A primitive computed from < 5 samples
|
||||
gets `confidence ≤ 0.5`. A bucket-classification with no IATs
|
||||
should emit `unknown` (where the registry permits) at
|
||||
`confidence = 1.0` — the *fact* of insufficient data is itself a
|
||||
high-confidence observation.
|
||||
2. **Threshold proximity.** If the measured value is within 10% of a
|
||||
bucket boundary, drop confidence by 0.2. Sitting on the fence is a
|
||||
real signal; pretending you know is dishonest.
|
||||
3. **Output-stream availability.** Primitives that need `[t,"o",d]`
|
||||
events drop confidence to 0.0 and skip emission entirely if the
|
||||
shard contains no output events. Don't fabricate.
|
||||
|
||||
Confidence is **the sensor's confidence in its measurement**, not in
|
||||
any downstream verdict — same line BEHAVE draws.
|
||||
|
||||
---
|
||||
|
||||
## The route to v0 — every Tier-A primitive emits
|
||||
|
||||
**v0 ships the entire BEHAVE-SHELL Tier-A corpus.** All 37
|
||||
shell-extractable primitives in the registry must have a feature
|
||||
function emitting them before the engine tags v0. Anything less is
|
||||
v0-pre.
|
||||
|
||||
The route is broken into **eight phases (A–H)** that each ship a
|
||||
coherent slice with its own tests. With the architecture locked
|
||||
(`SessionContext`, `_features/`, `_thresholds.py` already designed),
|
||||
each primitive is a small, well-bounded chunk — most are dozens of
|
||||
lines plus tests. The two real cost centres are Phase F (prompt
|
||||
parser) and Phase G (command-intent lexicon); both bounded by the
|
||||
calibration notes already in the registry. Phase A establishes the
|
||||
6-primitive calibration floor (the discriminative grid). Phases B–G
|
||||
expand horizontally across the registry. Phase H is the full-corpus
|
||||
lockdown + v0 release.
|
||||
|
||||
Each step within a phase is one commit (per the "commit per task"
|
||||
memory rule), with its own tests in the same commit (per "tests per
|
||||
task"). No step is allowed to land red against the calibration grid
|
||||
once Phase A locks it in.
|
||||
|
||||
### Phase A — Calibration floor (Steps 0–10)
|
||||
|
||||
**Goal:** establish the 6-primitive set that discriminates the
|
||||
five-class calibration grid. Lock the gate.
|
||||
|
||||
This is the foundation. Phases B–G cannot start until Phase A green.
|
||||
|
||||
### Step 0 — Scaffold + smoke
|
||||
|
||||
**Goal:** prove the wiring before any logic.
|
||||
|
||||
- Create `decnet/profiler/behave_shell/{__init__,extract,_parse,_ctx,_thresholds}.py`.
|
||||
- `extract_session()` parses events into a minimal `SessionContext`,
|
||||
registers an empty `FEATURES = ()`, returns no observations.
|
||||
- `tests/profiler/behave_shell/test_extract_smoke.py` asserts:
|
||||
- empty events → empty iterable
|
||||
- one input event → SessionContext built, t_start/t_end/duration_s correct
|
||||
- import path works
|
||||
|
||||
Commit message: `feat(profiler/behave_shell): scaffold extract_session entry point`.
|
||||
|
||||
### Step 1 — Asciinema parser + paste-burst detector
|
||||
|
||||
**Goal:** the shared primitives that two feature modules will consume.
|
||||
|
||||
- `_parse.py`: types (`InputEvent`, `OutputEvent`, `PasteBurst`,
|
||||
`Command`) + `parse_event(line: str | dict) -> AsciinemaEvent`.
|
||||
- `_ctx.py`: `build_session_context()` populates `iats`,
|
||||
`paste_bursts` (chunks where consecutive IATs < `PASTE_IAT_MAX_S`
|
||||
AND chunk size > `PASTE_MIN_CHARS`).
|
||||
- Tests: synthetic streams covering pure-typed, pure-pasted, mixed.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): asciinema parser + paste-burst detection`.
|
||||
|
||||
### Step 2 — `motor.input_modality` (FIRST PRIMITIVE)
|
||||
|
||||
**Goal:** prove the end-to-end pipeline emits a single registry-valid
|
||||
Observation.
|
||||
|
||||
Why first: highest discriminative value (HUMAN vs everyone), simplest
|
||||
implementation (just count paste-burst chars vs typed chars).
|
||||
|
||||
- `_features/motor.py:input_modality(ctx)` yields one Observation
|
||||
with value in `{"typed", "pasted", "mixed"}`.
|
||||
- Register in `FEATURES`.
|
||||
- Tests:
|
||||
- synthetic typed stream → `typed`
|
||||
- synthetic pasted stream → `pasted`
|
||||
- HUMAN calibration shard → `typed`
|
||||
- YOU-sim calibration shard → `pasted`
|
||||
|
||||
After this step, the calibration grid passes for **one column** and
|
||||
the integration is end-to-end live (Phase 4 of the integration plan
|
||||
becomes wireable, not just blocked on theory).
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit motor.input_modality`.
|
||||
|
||||
### Step 3 — `motor.paste_burst_rate`
|
||||
|
||||
**Goal:** second primitive, builds on the paste-burst index from
|
||||
step 1. Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL.
|
||||
|
||||
- `_features/motor.py:paste_burst_rate(ctx)` → `none / occasional / habitual`.
|
||||
- Threshold constants in `_thresholds.py`.
|
||||
- Tests + grid extension.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit motor.paste_burst_rate`.
|
||||
|
||||
### Step 4 — Command segmentation (no primitive)
|
||||
|
||||
**Goal:** shared utility for the three cognitive primitives next in
|
||||
line. Pure refactor inside `_ctx.py`.
|
||||
|
||||
- `commands` populated: split input stream on `\r` (and `\n`) into
|
||||
`Command(start_ts, end_ts, first_token_hash)` records.
|
||||
- **PII discipline:** store only the *first token* (or its hash) plus
|
||||
timing. Never the full command body. Branch-diversity needs the
|
||||
first token; nothing needs the rest.
|
||||
- `inter_cmd_iats` and `output_per_cmd` populated.
|
||||
- Tests for segmentation edge cases (no trailing newline, multiple
|
||||
newlines in a paste, etc).
|
||||
|
||||
Commit: `feat(profiler/behave_shell): command segmentation in SessionContext`.
|
||||
|
||||
### Step 5 — `cognitive.inter_command_latency_class`
|
||||
|
||||
**Goal:** classify the operator's *thinking pace* between commands.
|
||||
Splits LW-sim / CLAUDE-FF / CLAUDE-CL.
|
||||
|
||||
- `_features/cognitive.py:inter_command_latency_class(ctx)` →
|
||||
`instant / typing_speed / deliberate / llm_lightweight / llm_heavyweight / long`.
|
||||
- Median of `inter_cmd_iats`, bucketed against `_thresholds.py`.
|
||||
- Confidence drops if < 5 commands.
|
||||
- Tests + grid extension.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_latency_class`.
|
||||
|
||||
### Step 6 — `cognitive.command_branch_diversity`
|
||||
|
||||
**Goal:** content-based playbook-vs-adaptive split. Splits CLAUDE-FF
|
||||
from CLAUDE-CL.
|
||||
|
||||
- `_features/cognitive.py:command_branch_diversity(ctx)` →
|
||||
`linear_playbook / adaptive_branching / unknown`.
|
||||
- `unique_first_tokens / total_commands` ratio against threshold.
|
||||
- `unknown` when total_commands < 5 (registry-allowed).
|
||||
- Tests + grid extension.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit cognitive.command_branch_diversity`.
|
||||
|
||||
### Step 7 — `cognitive.feedback_loop_engagement`
|
||||
|
||||
**Goal:** the orthogonal axis — does the operator's pause-after-command
|
||||
correlate with output bytes? Splits HUMAN/CLAUDE-CL (closed) from
|
||||
LW-sim/CLAUDE-FF (fire-and-forget).
|
||||
|
||||
- Requires `output_per_cmd[i]` paired with `inter_cmd_iats[i+1]`.
|
||||
- Pearson correlation; bucket on r > 0.3 / r ≈ 0 / insufficient.
|
||||
- `_features/cognitive.py:feedback_loop_engagement(ctx)` →
|
||||
`closed_loop / fire_and_forget / unknown`.
|
||||
- **First primitive that depends on output events.** If the shard
|
||||
carries no `'o'` events (rare but possible — minimal recorders),
|
||||
emit `unknown` at confidence 1.0.
|
||||
- Tests + grid extension.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement`.
|
||||
|
||||
### Step 8 — `cognitive.inter_command_consistency`
|
||||
|
||||
**Goal:** dispersion/bimodality of command IATs.
|
||||
HUMAN-bimodal vs LLM-metronomic.
|
||||
|
||||
- CV of `inter_cmd_iats` → `metronomic` (CV < 0.2) /
|
||||
`variable` (0.2 ≤ CV < 1.0) / `bimodal` (CV ≥ 1.0 OR Hartigan dip
|
||||
significant — v0.1 is CV-only, registry note flags v0.2 work).
|
||||
- Tests + grid extension.
|
||||
|
||||
Commit: `feat(profiler/behave_shell): emit cognitive.inter_command_consistency`.
|
||||
|
||||
### Step 9 — Calibration grid lockdown
|
||||
|
||||
**Goal:** the gate. After this step lands, no engine PR is allowed
|
||||
to drop a primitive from any of the five classes.
|
||||
|
||||
- `tests/profiler/behave_shell/test_calibration_grid.py` parametrised
|
||||
over the five shards from `BEHAVE/prototype_extractors/shell/`.
|
||||
- For each shard, assert the **required primitive set** from the
|
||||
integration doc's grid table is present in the output (subset
|
||||
check, not exact match — engine is allowed to emit *more* than
|
||||
the table requires).
|
||||
- Skip with `pytest.importorskip` style if `BEHAVE_CALIBRATION_DIR`
|
||||
unset — CI provides it, dev doesn't have to.
|
||||
- This is the v0 gate.
|
||||
|
||||
Commit: `test(profiler/behave_shell): five-class calibration grid lockdown`.
|
||||
|
||||
### Step 10 — Phase A complete: calibration floor locked
|
||||
|
||||
**Goal:** Phase A done. **NOT v0 release** — v0 requires the full
|
||||
Tier-A corpus (Phases B–H below). Phase A delivers the 6-primitive
|
||||
discriminative floor + the gate that future phases must not break.
|
||||
|
||||
- 6 primitives emitting (`motor.input_modality`,
|
||||
`motor.paste_burst_rate`,
|
||||
`cognitive.inter_command_latency_class`,
|
||||
`cognitive.command_branch_diversity`,
|
||||
`cognitive.feedback_loop_engagement`,
|
||||
`cognitive.inter_command_consistency`).
|
||||
- Calibration grid green across all five class shards.
|
||||
- Worker can be wired against Phase A safely
|
||||
(BEHAVE-INTEGRATION.md Phase 4 unblocks here, *not* at v0).
|
||||
|
||||
Commit: `feat(profiler/behave_shell): Phase A — calibration floor green`.
|
||||
|
||||
---
|
||||
|
||||
### Phase B — `motor.*` completion (4 primitives)
|
||||
|
||||
**Goal:** finish the motor family minus shell-mastery. All four
|
||||
read existing `SessionContext` derived data; no new parsing.
|
||||
|
||||
| Step | Primitive | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| B.1 | `motor.keystroke_cadence` | `ctx.iats` histogram shape | steady (uniform) / bursty (heavy-tailed) / hunt_and_peck (bimodal slow+fast) / machine (sub-typing-floor) |
|
||||
| B.2 | `motor.motor_stability` | `ctx.iats` outlier rate | tremor = high-frequency outliers above CV-of-IATs threshold |
|
||||
| B.3 | `motor.error_correction` | backspace events relative to preceding key | immediate (<500ms) / deferred (next word boundary) / absent / route_around (no backspaces, but command later replaced) |
|
||||
| B.4 | `motor.command_chunking` | per-command IAT variance + word-boundary timing | fluent (low intra-cmd variance + tight word boundaries) / fragmented (high variance) / single_command (one-shot session) |
|
||||
|
||||
Per-step deliverable: feature function in `_features/motor.py`,
|
||||
threshold constants in `_thresholds.py`, unit tests against
|
||||
synthetic streams, calibration grid still green.
|
||||
|
||||
Commits (4): `feat(profiler/behave_shell): emit motor.{keystroke_cadence,motor_stability,error_correction,command_chunking}`.
|
||||
|
||||
### Phase C — `motor.shell_mastery.*` (3 primitives)
|
||||
|
||||
**Goal:** the shell-fluency block. Per-command counters; trivial
|
||||
implementations once command segmentation is in place (Step 4).
|
||||
|
||||
| Step | Primitive | Source |
|
||||
|---|---|---|
|
||||
| C.1 | `motor.shell_mastery.tab_completion` | `\t` rate per command (none / occasional <30% / habitual ≥50%) |
|
||||
| C.2 | `motor.shell_mastery.shortcut_usage` | ^A/^E/^W/^U/^R/^B/^F rate (none / moderate / heavy) |
|
||||
| C.3 | `motor.shell_mastery.pipe_chaining_depth` | `\|` count per command, median (shallow / moderate / deep) |
|
||||
|
||||
Commits (3): `feat(profiler/behave_shell): emit motor.shell_mastery.*`.
|
||||
|
||||
### Phase D — `cognitive.*` completion (8 primitives)
|
||||
|
||||
**Goal:** finish the cognitive family. Mix of cheap and expensive;
|
||||
`cognitive_load` is a composite over earlier primitives.
|
||||
|
||||
| Step | Primitive | Source | Cost |
|
||||
|---|---|---|---|
|
||||
| D.1 | `cognitive.cognitive_load` | composite: IAT entropy + error rate + chunking variance | MEDIUM |
|
||||
| D.2 | `cognitive.exploration_style` | command-graph branching shape (revisits, backtracks) | MEDIUM |
|
||||
| D.3 | `cognitive.planning_depth` | think-pause-length distribution; deep = many >1.5s gaps before commands | LOW |
|
||||
| D.4 | `cognitive.tool_vocabulary` | distinct first-tokens normalised by session length | LOW |
|
||||
| D.5 | `cognitive.error_resilience.retry_tactic` | post-error command relation: rerun (same), modify (edit-and-retry), switch (different tool), abort (exit) | MEDIUM |
|
||||
| D.6 | `cognitive.error_resilience.frustration_typing` | error-vs-success keystroke speed delta | LOW |
|
||||
| D.7 | `cognitive.error_resilience.fallback_to_man` | `man`/`--help`/`-h` invocation post-error | LOW |
|
||||
| D.8 | `cognitive.cognitive_load` re-tune (gate) | re-run calibration once D.1-D.7 stable | — |
|
||||
|
||||
Commits (7): one per primitive, plus a re-tune commit if needed.
|
||||
|
||||
### Phase E — `temporal.*` per-session subset (4 primitives)
|
||||
|
||||
**Goal:** the four temporal primitives that don't need observation
|
||||
history. The other three temporal primitives (session_timing,
|
||||
persistence, idle_periodicity) are **Tier B** and are filed in
|
||||
`ATTRIBUTION-ENGINE.md` — do not implement here.
|
||||
|
||||
| Step | Primitive | Source | Cost |
|
||||
|---|---|---|---|
|
||||
| E.1 | `temporal.session_duration` | `ctx.duration_s` bucketed (short <60s / medium <600s / long <3600s / marathon ≥3600s) | TRIVIAL |
|
||||
| E.2 | `temporal.escalation_pattern` | command-rate over rolling windows (sustained / erratic / bursty) | LOW |
|
||||
| E.3 | `temporal.lifecycle_markers.landing_ritual` | first-N-commands signature match (`uname` / `id` / `whoami` / `pwd`) | LOW |
|
||||
| E.4 | `temporal.lifecycle_markers.exit_behavior` | last command + exit timing (graceful `exit`/`logout` / abrupt session-cut / cleanup `history -c` etc.) | LOW |
|
||||
|
||||
Commits (4): per primitive.
|
||||
|
||||
### Phase F — `environmental.*` output-stream block (5 primitives)
|
||||
|
||||
**Goal:** the output-stream-dependent cluster. Lands a shared
|
||||
prompt-string parser once, then five primitives consume it. **This
|
||||
is the most expensive single phase** — the prompt parser has to
|
||||
handle ANSI escape sequences, multi-line continuation, and
|
||||
custom prompts.
|
||||
|
||||
| Step | Primitive | Source | Cost |
|
||||
|---|---|---|---|
|
||||
| F.0 | Prompt-string parser (`_parse.py`) | shared utility, no primitive | HIGH |
|
||||
| F.1 | `environmental.shell_type` | prompt suffix sniff (`$`/`#`/`%`/`>`) + command syntax (bash / zsh / fish / cmd / powershell) | MEDIUM |
|
||||
| F.2 | `environmental.terminal_multiplexer` | tmux/screen-specific escape sequences in output stream | LOW |
|
||||
| F.3 | `environmental.locale` | `LANG`/`LC_*` envvars if attacker dumps env; output language sniff fallback (free string, BCP-47) | MEDIUM |
|
||||
| F.4 | `environmental.keyboard_layout` | bigram-frequency fingerprint against known layouts (qwerty / azerty / qwertz / other) | HIGH |
|
||||
| F.5 | `environmental.numpad_usage` | numeric input arrival pattern; weak signal — confidence cap | LOW |
|
||||
|
||||
Commits (6): F.0 prepares; F.1-F.5 ship one per primitive.
|
||||
|
||||
### Phase G — `operational.*` + `emotional_valence.*` (8 primitives)
|
||||
|
||||
**Goal:** the two soft families. Both want a small command-intent /
|
||||
sentiment lexicon; combine into one phase to share the lexical
|
||||
infrastructure.
|
||||
|
||||
| Step | Primitive | Source | Cost / Confidence |
|
||||
|---|---|---|---|
|
||||
| G.0 | Command-intent lexicon (`_features/_intent.py`) | shared first-token → category mapping (recon / exfil / persistence / lateral / destructive) | HIGH (corpus building) |
|
||||
| G.1 | `operational.objective` | majority-category over session commands | MEDIUM |
|
||||
| G.2 | `operational.opsec_discipline` | history-clearing / log-tampering / `.bash_history` removal patterns | MEDIUM |
|
||||
| G.3 | `operational.cleanup_behavior` | exit-time cleanup commands (`rm`-of-touched-files, `unset HISTFILE`) | MEDIUM |
|
||||
| G.4 | `operational.multi_actor_indicators` | mid-session pace/style shift detection (only `solo` and `handoff_detected` honest single-session; `team_coordinated` is Tier B) | HIGH |
|
||||
| G.5 | `emotional_valence.valence` | lexical sentiment; positive / neutral / negative — **CONFIDENCE CAP 0.5** | LOW (soft) |
|
||||
| G.6 | `emotional_valence.arousal` | typing-speed delta + capslock + repeated bangs — **CAP 0.5** | LOW (soft) |
|
||||
| G.7 | `emotional_valence.stress_response` | post-error speed-up (distress) vs slow-down (eustress) — **CAP 0.5** | LOW (soft) |
|
||||
| G.8 | `emotional_valence.frustration_venting` | obscenity detection (`fuck`/`shit`/`damn`); registry value is binary — **CAP 0.5** | LOW (soft) |
|
||||
|
||||
Commits (9). All four `emotional_valence.*` primitives ship under a
|
||||
**hard 0.5 confidence cap** by convention — these are the most
|
||||
likely primitives to embarrass the project, and operators must not
|
||||
act on them without corroboration.
|
||||
|
||||
### Phase H — Full-corpus lockdown + v0 release
|
||||
|
||||
**Goal:** prove every Tier-A primitive in the registry has a feature
|
||||
function, tag v0.
|
||||
|
||||
| Step | Action |
|
||||
|---|---|
|
||||
| H.1 | **Registry-coverage test**: `tests/profiler/behave_shell/test_registry_coverage.py` walks `PRIMITIVE_REGISTRY`, filters out Tier-B and Tier-C primitives (explicit allow-list), asserts every remaining primitive appears in the output of at least one calibration shard. CI fails if the registry adds a primitive DECNET hasn't implemented yet. |
|
||||
| H.2 | **Calibration grid full sweep**: re-run the five-class grid against the full primitive set; no regressions. |
|
||||
| H.3 | **Live smoke**: ship a decky, run a real session from each calibration class, observe full primitive output in `observations` table + bus + AttackerDetail panel (mirrors integration-doc Phase 6). |
|
||||
| H.4 | **Worker wired** (BEHAVE-INTEGRATION.md Phase 4 unblocks here). Pin `decnet-behave-core` / `decnet-behave-shell` in `pyproject.toml`. |
|
||||
| H.5 | Tag v0; add `__version__ = "0.1.0"` to `behave_shell/__init__.py`. |
|
||||
|
||||
Commit: `feat(profiler/behave_shell): v0 — full Tier-A corpus, all 37 primitives emitting`.
|
||||
|
||||
### Per-phase rules (binding for all of B–H)
|
||||
|
||||
1. **Calibration-grid gate is binding.** Every commit in B–G runs
|
||||
the grid; any drop in expected primitive sets fails CI.
|
||||
2. **Registry-coverage test is binding from H onward.** New Tier-A
|
||||
primitives added to BEHAVE's registry without a corresponding
|
||||
DECNET feature function fail CI.
|
||||
3. **Adding a primitive = adding a feature func + registering it +
|
||||
threshold constants + tests in the same commit.** No sneaking
|
||||
implementation in without tests, no sneaking tests in without the
|
||||
calibration assertion.
|
||||
4. **Phases B–G can ship in any order**, but finish a phase before
|
||||
starting another. Phase F is the hardest and should be sequenced
|
||||
by reader stamina, not enthusiasm.
|
||||
5. **Don't rush Phase G.** The soft primitives are the most likely
|
||||
to embarrass the project. Calibrate against real-attacker shards
|
||||
before tagging — and even then, hold the 0.5 confidence cap.
|
||||
6. **Tier-B and Tier-C scope creep is forbidden.** The moment you
|
||||
feel tempted to read a SECOND session inside `extract_session()`,
|
||||
stop. That observation belongs to the attribution engine.
|
||||
|
||||
Don't promise a delivery date for any phase. Each lands when it's
|
||||
honest. v0 ships when **every Tier-A primitive emits + every test
|
||||
green** — not before.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope for the engine
|
||||
|
||||
- **Attribution.** Per the integration doc's bright line. Engine
|
||||
emits observations; some other thing decides what they mean. See
|
||||
`ATTRIBUTION-ENGINE.md`.
|
||||
- **Cross-session merge logic.** That's DEBT-051 / Tier-B
|
||||
primitives. Engine sees one session at a time, period.
|
||||
- **Tier-C `toolchain.*` primitives.** Network-domain sensors
|
||||
(sniffer, prober, correlator) own these. Either via existing
|
||||
workers wrapping their outputs as BEHAVE observations, or a future
|
||||
BEHAVE-NETWORK extractor. Not this doc.
|
||||
- **Persistence / bus.** Worker concerns. Engine is pure.
|
||||
- **Dynamic primitive registration.** The `FEATURES` tuple is
|
||||
hand-edited; no plugin loaders. New primitive = new feature func +
|
||||
one-line registry edit + tests in the same commit.
|
||||
- **Streaming / partial extraction.** Engine assumes a complete
|
||||
session. Live mid-session inference is a v2 concern; needs a
|
||||
separate state-keeping design.
|
||||
- **`primitives.py` registry edits.** The engine consumes the
|
||||
registry; never mutates it. If a primitive is missing, file a
|
||||
BEHAVE-side commit per the integration doc's "BEHAVE-side commits"
|
||||
rule.
|
||||
- **Confidence calibration against ground truth.** The calibration
|
||||
grid is a *discrimination* test, not a *correctness* test. True
|
||||
ground-truth labels would require red-team exercises with logged
|
||||
intent. Filed when that data exists.
|
||||
|
||||
---
|
||||
|
||||
## Implementation order checklist
|
||||
|
||||
A single page you can paste into a TODO and tick off. **Every box
|
||||
unchecked = no v0 tag.**
|
||||
|
||||
### Phase A — Calibration floor (Steps 0–10)
|
||||
- [ ] Step 0 — Scaffold + smoke test
|
||||
- [ ] Step 1 — Asciinema parser + paste-burst detector
|
||||
- [ ] Step 2 — `motor.input_modality` (FIRST PRIMITIVE)
|
||||
- [ ] Step 3 — `motor.paste_burst_rate`
|
||||
- [ ] Step 4 — Command segmentation in `SessionContext`
|
||||
- [ ] Step 5 — `cognitive.inter_command_latency_class`
|
||||
- [ ] Step 6 — `cognitive.command_branch_diversity`
|
||||
- [ ] Step 7 — `cognitive.feedback_loop_engagement`
|
||||
- [ ] Step 8 — `cognitive.inter_command_consistency`
|
||||
- [ ] Step 9 — Calibration grid lockdown (the gate)
|
||||
- [ ] Step 10 — Phase A complete: floor green
|
||||
|
||||
### Phase B — `motor.*` completion
|
||||
- [ ] B.1 `motor.keystroke_cadence`
|
||||
- [ ] B.2 `motor.motor_stability`
|
||||
- [ ] B.3 `motor.error_correction`
|
||||
- [ ] B.4 `motor.command_chunking`
|
||||
|
||||
### Phase C — `motor.shell_mastery.*`
|
||||
- [ ] C.1 `motor.shell_mastery.tab_completion`
|
||||
- [ ] C.2 `motor.shell_mastery.shortcut_usage`
|
||||
- [ ] C.3 `motor.shell_mastery.pipe_chaining_depth`
|
||||
|
||||
### Phase D — `cognitive.*` completion
|
||||
- [ ] D.1 `cognitive.cognitive_load`
|
||||
- [ ] D.2 `cognitive.exploration_style`
|
||||
- [ ] D.3 `cognitive.planning_depth`
|
||||
- [ ] D.4 `cognitive.tool_vocabulary`
|
||||
- [ ] D.5 `cognitive.error_resilience.retry_tactic`
|
||||
- [ ] D.6 `cognitive.error_resilience.frustration_typing`
|
||||
- [ ] D.7 `cognitive.error_resilience.fallback_to_man`
|
||||
- [ ] D.8 cognitive.cognitive_load re-tune (gate)
|
||||
|
||||
### Phase E — `temporal.*` per-session
|
||||
- [ ] E.1 `temporal.session_duration`
|
||||
- [ ] E.2 `temporal.escalation_pattern`
|
||||
- [ ] E.3 `temporal.lifecycle_markers.landing_ritual`
|
||||
- [ ] E.4 `temporal.lifecycle_markers.exit_behavior`
|
||||
|
||||
### Phase F — `environmental.*` (output-stream block)
|
||||
- [ ] F.0 Prompt-string parser (shared utility)
|
||||
- [ ] F.1 `environmental.shell_type`
|
||||
- [ ] F.2 `environmental.terminal_multiplexer`
|
||||
- [ ] F.3 `environmental.locale`
|
||||
- [ ] F.4 `environmental.keyboard_layout`
|
||||
- [ ] F.5 `environmental.numpad_usage`
|
||||
|
||||
### Phase G — `operational.*` + `emotional_valence.*` (soft block)
|
||||
- [ ] G.0 Command-intent lexicon (`_features/_intent.py`)
|
||||
- [ ] G.1 `operational.objective`
|
||||
- [ ] G.2 `operational.opsec_discipline`
|
||||
- [ ] G.3 `operational.cleanup_behavior`
|
||||
- [ ] G.4 `operational.multi_actor_indicators`
|
||||
- [ ] G.5 `emotional_valence.valence` (cap 0.5)
|
||||
- [ ] G.6 `emotional_valence.arousal` (cap 0.5)
|
||||
- [ ] G.7 `emotional_valence.stress_response` (cap 0.5)
|
||||
- [ ] G.8 `emotional_valence.frustration_venting` (cap 0.5)
|
||||
|
||||
### Phase H — Full-corpus lockdown + v0 release
|
||||
- [ ] H.1 Registry-coverage test
|
||||
- [ ] H.2 Calibration grid full sweep, no regressions
|
||||
- [ ] H.3 Live smoke across all five calibration classes
|
||||
- [ ] H.4 Worker wired + `pyproject.toml` pin
|
||||
- [ ] H.5 Tag v0 (`__version__ = "0.1.0"`)
|
||||
|
||||
**44 boxes. 37 primitives. 1 v0.** Each box is a commit + tests in
|
||||
the same commit.
|
||||
|
||||
---
|
||||
|
||||
**Owner:** ANTI.
|
||||
**Implementation gate:** Step 0 starts after this doc is reviewed +
|
||||
Phase 1 of `BEHAVE-INTEGRATION.md` lands (storage table exists).
|
||||
680
development/BEHAVE-INTEGRATION.md
Normal file
680
development/BEHAVE-INTEGRATION.md
Normal file
@@ -0,0 +1,680 @@
|
||||
# BEHAVE Integration — Design
|
||||
|
||||
**Status:** pre-implementation. This doc is the spec; code follows.
|
||||
**Tracks:** DEBT-050 (replaces stale DEBT-036).
|
||||
**Spec source:** `/home/anti/Tools/BEHAVE` (sibling, never vendored).
|
||||
**Engine home:** this repo, `decnet/profiler/behave_shell/` (sublibrary inside the existing `profiler` worker — no new daemon).
|
||||
|
||||
## Premise
|
||||
|
||||
ANTI built BEHAVE — an out-of-tree behavioural-observation framework
|
||||
with a primitive registry, a registry-validated `Observation`
|
||||
envelope, a DECNET-bus event adapter, and a five-class calibration
|
||||
grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the
|
||||
right substrate for keystroke-dynamics extraction.
|
||||
|
||||
The original DEBT-036 plan (hand-rolled `kd_*` columns on
|
||||
`SessionProfile`) is obsolete. This doc replaces it with a
|
||||
BEHAVE-aligned ingester that emits registry-validated observations on
|
||||
the bus and persists them in a single generic table.
|
||||
|
||||
**Bright line, lifted from BEHAVE itself:** *BEHAVE emits
|
||||
observations. It does not conclude.* DECNET is a consumer of
|
||||
`attacker.observation.*` events; attribution / linkage / verdicts are
|
||||
out-of-scope for this integration and live in their own (future)
|
||||
attribution engine.
|
||||
|
||||
## Architectural placement
|
||||
|
||||
```
|
||||
/home/anti/Tools/
|
||||
├── BEHAVE/ sibling repo, separate git history
|
||||
│ ├── core/ decnet-behave-core (envelope)
|
||||
│ ├── BEHAVE-SHELL/ decnet-behave-shell (registry + adapter)
|
||||
│ └── prototype_extractors/shell/ extract.py — JSONL → Observation stream
|
||||
│
|
||||
└── DECNET/ THIS repo
|
||||
├── pyproject.toml pins decnet-behave-{core,shell}
|
||||
├── decnet/profiler/ EXISTING worker — gains a sublibrary + a new trigger
|
||||
│ ├── worker.py gains attacker.session.ended subscription
|
||||
│ ├── behavioral.py UNCHANGED — networking-domain (LogEvent IATs, beacon detection)
|
||||
│ ├── timing.py UNCHANGED — networking-domain
|
||||
│ └── behave_shell/ NEW — pure extraction library
|
||||
│ ├── __init__.py
|
||||
│ ├── extract.py orchestration: parse → dispatch → assemble Observations
|
||||
│ └── _features/ per-primitive-family modules
|
||||
└── decnet/web/db/models/observations.py NEW — generic Observation table
|
||||
```
|
||||
|
||||
**No new worker.** The existing `decnet-profiler.service` already
|
||||
supervises this codepath. No new systemd unit, no new polkit rule, no
|
||||
new heartbeat. The session-ended handler is a peer to the existing
|
||||
scoring tick inside the same async loop.
|
||||
|
||||
**Audit finding (network vs PTY domains).** `behavioral.py` and
|
||||
`timing.py` operate on `LogEvent` (network-level connection events
|
||||
from `decnet.correlation.parser`), feeding the existing
|
||||
`attacker_behavior` table — TCP fingerprint, OS guess, beacon
|
||||
interval, behavior class. **Zero overlap with BEHAVE-SHELL**, which
|
||||
operates on `AsciinemaEvent` (PTY input) and persists to the new
|
||||
`observations` table. The two coexist; no rewrite, no migration, no
|
||||
shared state.
|
||||
|
||||
Two repos, two commits, no vendoring. `pip install -e
|
||||
../BEHAVE/core ../BEHAVE/BEHAVE-SHELL` for local dev; pinned wheels in
|
||||
CI.
|
||||
|
||||
## BEHAVE is the spec. DECNET is the engine.
|
||||
|
||||
This is a *load-bearing* architectural fact, called out explicitly so
|
||||
nobody (including future me) misreads the layout.
|
||||
|
||||
- **BEHAVE ships:** the primitive registry, the registry-validated
|
||||
`Observation` envelope, the bus event adapter, the JSON schema.
|
||||
Reference prototype extractor for spec validation only. BEHAVE will
|
||||
**not** ship a production engine — that's not what the BEHAVE repo
|
||||
is for.
|
||||
- **DECNET ships:** the production extraction engine. It lives in
|
||||
`decnet/profiler/behave_shell/`, written from scratch against the
|
||||
BEHAVE spec, called from the existing profiler worker on
|
||||
`attacker.session.ended`.
|
||||
|
||||
DECNET-side BEHAVE imports are spec-only:
|
||||
|
||||
```python
|
||||
from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope, Window
|
||||
from decnet_behave_shell.spec.primitives import PRIMITIVE_REGISTRY, get as get_primitive_spec
|
||||
from decnet_behave_shell.spec.event_adapter import event_topic_for, to_event_payload
|
||||
```
|
||||
|
||||
`Observation` is aliased to `ObservationEnvelope` so the storage
|
||||
SQLModel can keep the `Observation`-flavoured class name where it's
|
||||
useful, and the BEHAVE primitive-spec accessor is aliased away from
|
||||
the bare name `get` to avoid shadowing in feature-extractor modules
|
||||
that read dicts heavily.
|
||||
|
||||
That's it. No imports from `BEHAVE/prototype_extractors/`. The
|
||||
prototype is read as **design notes** during the engine build, then
|
||||
ignored. If the prototype yields a primitive the production engine
|
||||
doesn't, that's a calibration delta to investigate, not a regression
|
||||
in either direction.
|
||||
|
||||
### The extraction engine — DECNET-side
|
||||
|
||||
```
|
||||
decnet/profiler/behave_shell/
|
||||
├── __init__.py exposes extract_session()
|
||||
├── extract.py orchestration: parse → dispatch → assemble Observations
|
||||
└── _features/ feature-extractor modules, one per primitive family
|
||||
├── motor.py cadence, paste burst, modality, shell mastery
|
||||
├── cognitive.py latency class, consistency, branch diversity, feedback loop
|
||||
├── temporal.py session timing, escalation pattern
|
||||
└── ... others added as primitives are productionised
|
||||
|
||||
tests/profiler/behave_shell/
|
||||
└── _features/ one test module per feature family, against synthetic streams
|
||||
```
|
||||
|
||||
The library is **pure** — no I/O, no bus calls, no DB writes. Events
|
||||
in → `Iterable[Observation]` out. The split between `extract.py`
|
||||
(orchestration) and `_features/` (per-family implementations) keeps
|
||||
each primitive's logic auditable in isolation — including the
|
||||
threshold tables, which are the part most likely to drift across
|
||||
calibration cycles. The worker (in `decnet/profiler/worker.py`) owns
|
||||
all I/O: disk-reach, bus publish, DB upsert.
|
||||
|
||||
**The engine is its own first-class effort, not a side-effect of
|
||||
this integration doc.** The five-class calibration grid is the
|
||||
acceptance test. Beyond that, it has its own design surface
|
||||
(threshold calibration methodology, per-primitive confidence scoring,
|
||||
feature-family precedence rules) that this doc does not attempt to
|
||||
fully specify — that belongs in a sibling `BEHAVE-EXTRACTOR.md` once
|
||||
Phase 1 lands and we have the storage shape to write into.
|
||||
|
||||
**Calibration knowledge does leak across the repo boundary.** BEHAVE's
|
||||
`primitives.py` carries empirical calibration notes (e.g. CLAUDE-FF
|
||||
vs CLAUDE-CL on 2026-05-02) inline in the registry. The clean
|
||||
separation "BEHAVE = pure spec, DECNET = pure engine" is leakier
|
||||
than this doc would prefer; both repos must agree on what a primitive
|
||||
*means* before the engine threshold tables are tuned. Treat the
|
||||
registry's `notes:` field as ground truth and tune DECNET to match.
|
||||
|
||||
### BEHAVE-side commits (rare, for spec changes only)
|
||||
|
||||
The only reasons to touch the BEHAVE repo during this integration:
|
||||
|
||||
1. The DECNET engine discovers a primitive the registry needs and the
|
||||
spec doesn't yet define → registry edit in BEHAVE → version bump
|
||||
→ DECNET pin update.
|
||||
2. The envelope schema needs a field DECNET can populate honestly
|
||||
(e.g. a structured `evidence_ref` schema) → envelope edit → schema
|
||||
`v` bump → `observations.envelope_v` column already tracks it.
|
||||
|
||||
These are not blockers for Phase 1. They land iteratively as the
|
||||
engine matures.
|
||||
|
||||
## Versioning
|
||||
|
||||
| Axis | Current | DECNET pin |
|
||||
|---|---|---|
|
||||
| Envelope schema (`Observation.v`) | `1` | column `observations.envelope_v` tracks it |
|
||||
| Schema URL | `https://behave.local/schema/observation/v1.json` | — |
|
||||
| `decnet-behave-core` | `0.1.0` | `>=0.1.0,<0.2` |
|
||||
| `decnet-behave-shell` | `0.1.0` | `>=0.1.0,<0.2` |
|
||||
|
||||
A future `v=2` envelope coexists in the same table without a
|
||||
destructive migration — query by `envelope_v` when shape diverges.
|
||||
Bump the cap in `pyproject.toml` when BEHAVE cuts `0.2.0`.
|
||||
|
||||
## Data flow
|
||||
|
||||
```
|
||||
asciinema shard on disk
|
||||
/var/lib/decnet/artifacts/{decky}/sessrec/sessions-YYYY-MM-DD.jsonl
|
||||
│
|
||||
│ disk-reach (host-local, never on bus)
|
||||
▼
|
||||
bus: attacker.session.ended ─► decnet-profiler worker (existing)
|
||||
(or poll fallback) │ → handler in worker.py
|
||||
│ → calls behave_shell.extract_session(events) → Iterable[Observation]
|
||||
│ (registry-validated by BEHAVE)
|
||||
▼
|
||||
bus.publish(event_topic_for(obs.primitive),
|
||||
to_event_payload(obs))
|
||||
│
|
||||
┌─────────────────────┼──────────────────────┐
|
||||
▼ ▼ ▼
|
||||
observations table AttackerDetail UI future: attribution engine,
|
||||
(DECNET storage) (live SSE consumer) federation gossip, webhook export
|
||||
```
|
||||
|
||||
Raw `[t,"i",d]` events never cross the worker→bus boundary. Bus
|
||||
carries observation envelopes only. Disk-reach for the input stream
|
||||
mirrors DEBT-047's pattern (filesystem-group-readable artifacts via
|
||||
DEBT-035).
|
||||
|
||||
## Storage — the `observations` table
|
||||
|
||||
Generic table holding every BEHAVE envelope field, plus a single
|
||||
DECNET-side denormalization (`attacker_uuid`) for cheap joins.
|
||||
**Not a strict 1:1 mirror** — the envelope has no `attacker_uuid`;
|
||||
DECNET adds it so AttackerDetail doesn't have to chase
|
||||
`identity_ref → AttackerIdentity → attacker_uuid` on every read.
|
||||
|
||||
The SQLModel class is named `ObservationRow` to avoid colliding
|
||||
with the BEHAVE `Observation` Pydantic class imported into the
|
||||
same module.
|
||||
|
||||
```python
|
||||
# decnet/web/db/models/observations.py
|
||||
from decnet_behave_core.spec.envelope import Observation as ObservationEnvelope
|
||||
|
||||
class ObservationRow(SQLModel, table=True):
|
||||
__tablename__ = "observations"
|
||||
|
||||
# ── envelope fields (types match BEHAVE exactly) ─────────────
|
||||
id: str = Field(primary_key=True) # envelope.id (uuid4().hex string)
|
||||
identity_ref: str | None = None # envelope.identity_ref (str, not UUID)
|
||||
primitive: str = Field(index=True) # 'motor.keystroke_cadence'
|
||||
value: dict[str, Any] | str | int | float | bool | list = \
|
||||
Field(sa_column=Column(JSON, nullable=False))
|
||||
confidence: float
|
||||
window_start_ts: float # flattened from envelope.window
|
||||
window_end_ts: float
|
||||
source: str
|
||||
evidence_ref: str = Field(nullable=False) # NOT NULL for DECNET emissions; see "Idempotency"
|
||||
envelope_v: int # envelope.v
|
||||
ts: float = Field(index=True) # emission ts
|
||||
|
||||
# ── DECNET-side denormalization (NOT in BEHAVE envelope) ─────
|
||||
attacker_uuid: UUID = Field(foreign_key="attackers.uuid", index=True)
|
||||
|
||||
__table_args__ = (
|
||||
Index("ix_observations_attacker_primitive_ts",
|
||||
"attacker_uuid", "primitive", "ts"),
|
||||
Index("ix_observations_primitive_ts", "primitive", "ts"),
|
||||
UniqueConstraint("evidence_ref", "primitive",
|
||||
name="uq_observations_evidence_primitive"),
|
||||
)
|
||||
```
|
||||
|
||||
**SQLAlchemy `JSON` not `JSONB`** per the typed-evidence-dicts memory
|
||||
rule (dual-backend MySQL + SQLite).
|
||||
|
||||
**`evidence_ref` is NOT NULL** for DECNET-emitted observations, even
|
||||
though BEHAVE's envelope makes it `Optional[str]`. The worker's
|
||||
"have we already profiled this session?" check (see Idempotency
|
||||
below) keys on `evidence_ref`; if it's NULL the check breaks. The
|
||||
shape `shard:{decky}/{service}/{date}.jsonl#sid` is mandatory at the
|
||||
worker layer. If a future BEHAVE consumer needs nullable
|
||||
evidence_ref, that's a separate observation source with its own
|
||||
worker — not this one.
|
||||
|
||||
**`UniqueConstraint(evidence_ref, primitive)`** enforces idempotency
|
||||
at the schema level, so a re-run of the worker on the same shard+sid
|
||||
produces a DB-side conflict, not silent duplicate rows. SQLite and
|
||||
MySQL both treat distinct (non-NULL) tuples as distinct in unique
|
||||
indexes — safe across both backends since `evidence_ref` is
|
||||
NOT NULL.
|
||||
|
||||
**No `_migrate_*` helper.** Pre-v1; `SessionProfile` and its `kd_*`
|
||||
columns are deleted from `decnet/web/db/models/attackers.py`
|
||||
outright. DEBT-011 (Alembic) remains deferred.
|
||||
|
||||
### Canonical queries
|
||||
|
||||
**Latest observation per primitive, for one attacker** (AttackerDetail
|
||||
"current state" panel):
|
||||
|
||||
```sql
|
||||
SELECT primitive, value, confidence, ts
|
||||
FROM observations
|
||||
WHERE attacker_uuid = :uuid
|
||||
AND ts = (SELECT MAX(ts) FROM observations o2
|
||||
WHERE o2.attacker_uuid = observations.attacker_uuid
|
||||
AND o2.primitive = observations.primitive)
|
||||
ORDER BY primitive;
|
||||
```
|
||||
|
||||
(SQLite — no `DISTINCT ON`; window-function rewrite available if the
|
||||
correlated subquery hot-spots.)
|
||||
|
||||
**Time-series for one primitive across all sessions of one attacker**
|
||||
(for "is this typist drifting" charts, future):
|
||||
|
||||
```sql
|
||||
SELECT ts, value, confidence
|
||||
FROM observations
|
||||
WHERE attacker_uuid = :uuid AND primitive = :primitive
|
||||
ORDER BY ts;
|
||||
```
|
||||
|
||||
## The session-ended handler — riding the existing profiler worker
|
||||
|
||||
```
|
||||
decnet/profiler/
|
||||
├── worker.py EXISTING — gains attacker.session.ended subscription
|
||||
└── behave_shell/ NEW — pure extraction library (no I/O)
|
||||
├── __init__.py
|
||||
└── extract.py wraps the engine + disk-reach call site
|
||||
|
||||
tests/profiler/behave_shell/
|
||||
├── __init__.py
|
||||
├── test_extract.py unit tests against synthetic event streams
|
||||
├── test_calibration_grid.py the five-class regression suite (Phase 5)
|
||||
├── test_worker_session_ended_bus.py FakeBus path
|
||||
└── test_worker_session_ended_poll.py DECNET_BUS_ENABLED=false path
|
||||
```
|
||||
|
||||
(All tests live under `tests/`, mirroring the source tree per repo
|
||||
convention. Existing `tests/profiler/test_session_profile.py` is
|
||||
deleted alongside the `SessionProfile` model in Phase 1.)
|
||||
|
||||
**Trigger.** Subscribe to `attacker.session.ended` on the bus. Poll
|
||||
fallback walks `Log` rows where `event_type='session_recorded'` and
|
||||
no `observations` row carries the matching `evidence_ref`. Bus path
|
||||
ships first; poll fallback ships in the same commit so
|
||||
`DECNET_BUS_ENABLED=false` is supported from day one (DEBT-031
|
||||
pattern).
|
||||
|
||||
**Disk-reach.** For each `(decky, service, sid)`, resolve the shard
|
||||
via `_find_shard_with_sid` (already shipped, `323077b`). Open the
|
||||
JSONL via `decnet/artifacts/paths.py:resolve_artifact_path`
|
||||
(DEBT-047 — symlink-escape check, regex validation,
|
||||
`ARTIFACTS_ROOT` env override). Slice the per-sid event list. Pass
|
||||
to BEHAVE.
|
||||
|
||||
**Extraction.** Call
|
||||
`decnet.profiler.behave_shell.extract_session(events, sid=..., source=...)`.
|
||||
Receive `Iterable[Observation]`. Each is registry-validated at
|
||||
construction by BEHAVE's `Observation` subclass; DECNET does not
|
||||
re-validate.
|
||||
|
||||
**Resolve `attacker_uuid`.** Sessrec carries `(decky_name, service,
|
||||
sid, src_ip, src_port)` per shard line. Resolve src_ip → attacker
|
||||
via the existing `attackers.ip` index; create-if-missing per the
|
||||
existing observe path. Stamp `identity_ref=NULL` until attribution
|
||||
exists.
|
||||
|
||||
**Bus emission.** For each observation, **DECNET overrides BEHAVE's
|
||||
adapter** to preserve sensor-side identifiers across the bus:
|
||||
|
||||
```python
|
||||
# BEHAVE's to_event_payload() excludes id/ts/v because BEHAVE assumes
|
||||
# the bus envelope carries them at the Event level. DECNET's bus
|
||||
# (DEBT-029) auto-generates fresh id/ts/v on publish — there's no
|
||||
# bus.publish overload that accepts envelope-level overrides. Without
|
||||
# this merge, BEHAVE's id/ts/v would be silently lost, breaking
|
||||
# cross-host dedup and federation gossip.
|
||||
payload = to_event_payload(obs) | {"id": obs.id, "ts": obs.ts, "v": obs.v}
|
||||
|
||||
bus.publish(
|
||||
topic = event_topic_for(obs.primitive), # 'attacker.observation.motor.keystroke_cadence'
|
||||
payload = payload,
|
||||
)
|
||||
```
|
||||
|
||||
Subscribers reconstructing the envelope via
|
||||
`from_event_payload(primitive, payload)` see the original BEHAVE id /
|
||||
ts / v because they ride along in `payload`. The DECNET-bus Event
|
||||
envelope's *own* id/ts/v (auto-generated) are bus-routing concerns,
|
||||
distinct from observation identity.
|
||||
|
||||
**This is a known deviation from BEHAVE's wire-format docstring**
|
||||
(`core/decnet_behave_core/spec/envelope.py:77-84`). If DECNET's bus
|
||||
later grows envelope-level overrides on `publish()`, revert to the
|
||||
upstream contract. Filed as a low-priority follow-up — not blocking.
|
||||
|
||||
Adapter import path is pure-stdlib — no DECNET imports inside BEHAVE.
|
||||
DECNET is the consumer of BEHAVE's contract, never the other way
|
||||
around.
|
||||
|
||||
**Persistence.** All observations from one session — i.e. one
|
||||
`(decky, service, sid)` triple — commit as **a single transaction**.
|
||||
Either the entire session lands in `observations` or none of it
|
||||
does; partial-failure mid-session never leaves a half-profiled
|
||||
attacker row.
|
||||
|
||||
Persist **first**, then publish to the bus best-effort. Bus is
|
||||
fire-and-forget (DEBT-029 §6) — a publish failure does **not** roll
|
||||
back the persisted rows, and a persist failure means nothing is
|
||||
published. DB is the source of truth; the bus is the notification
|
||||
layer only. Order matters: a downstream subscriber receiving an
|
||||
`attacker.observation.*` event can immediately query the table and
|
||||
find it; the inverse (publish-then-persist) would create a window
|
||||
where subscribers chase rows that don't exist yet.
|
||||
|
||||
**Idempotency.** Enforced at the schema level by
|
||||
`UniqueConstraint(evidence_ref, primitive)`. Re-running the worker
|
||||
on the same shard+sid produces a DB-side conflict per row, which the
|
||||
worker handles via `INSERT … ON CONFLICT DO UPDATE` (SQLAlchemy
|
||||
upsert). Worker marks a session "profiled" by the existence of any
|
||||
row matching its `evidence_ref` — no separate marker column. Because
|
||||
the unique index makes accidental duplicates structurally
|
||||
impossible, the marker check is honest.
|
||||
|
||||
## Bus topics
|
||||
|
||||
Add to `decnet/bus/topics.py`:
|
||||
|
||||
```python
|
||||
ATTACKER_OBSERVATION_PREFIX = "attacker.observation"
|
||||
# Wildcard patterns:
|
||||
# attacker.observation.motor.*
|
||||
# attacker.observation.cognitive.*
|
||||
# attacker.observation.> (everything BEHAVE-SHELL emits)
|
||||
```
|
||||
|
||||
Topic shape locked by BEHAVE's `event_topic_for()`; DECNET registers
|
||||
the prefix for documentation and pattern-matching only. **Bus auth
|
||||
is not topic-level** — per DEBT-029 §2 the bus uses
|
||||
kernel-authenticated peer delivery (UNIX socket file permissions),
|
||||
not topic ACLs. `bus/topics.py` change co-commits with a
|
||||
wiki-checkout `Service-Bus.md` update (memory rule: "Document new
|
||||
bus signals in the wiki").
|
||||
|
||||
## AttackerDetail consumer
|
||||
|
||||
### REST surface
|
||||
|
||||
`decnet/web/router/attackers/api_get_attacker_detail.py` swaps the
|
||||
`SessionProfile` join for the latest-per-primitive query above.
|
||||
Response shape gains:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
// ... existing attacker fields ...
|
||||
"observations": [
|
||||
{
|
||||
"primitive": "motor.input_modality",
|
||||
"value": "pasted",
|
||||
"confidence": 0.91,
|
||||
"ts": 1714521660.456,
|
||||
"source": "decnet/profiler/behave_shell/extract.py"
|
||||
},
|
||||
// ... one row per primitive observed for this attacker ...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Frontend (`AttackerDetail.tsx`) renders a "Behavioural primitives"
|
||||
panel grouped by the registry's top-level domain (`motor.*`,
|
||||
`cognitive.*`, `temporal.*`, `operational.*`, `environmental.*`,
|
||||
`cultural.*`, `emotional_valence.*`, `toolchain.*`). Day-one render
|
||||
priorities for the panel:
|
||||
|
||||
1. `motor.input_modality` — pasted vs typed vs mixed
|
||||
2. `cognitive.feedback_loop_engagement` — closed_loop vs fire_and_forget
|
||||
3. `cognitive.command_branch_diversity` — linear_playbook vs adaptive_branching
|
||||
4. `cognitive.inter_command_latency_class` — typing_speed / llm_lightweight / llm_heavyweight / long
|
||||
5. Everything else, alphabetised by primitive path.
|
||||
|
||||
These four are the highest-discriminative-value primitives in the
|
||||
calibration grid; surfacing them first is what unblocks the "is this
|
||||
the same operator class" hover story.
|
||||
|
||||
### Live-update SSE route
|
||||
|
||||
`GET /api/v1/attackers/{uuid}/events` — per-attacker SSE stream,
|
||||
mirrors the per-topology pattern shipped in DEBT-030.
|
||||
The route subscribes to `attacker.observation.*` filtered by
|
||||
`identity_ref` / resolved `attacker_uuid`, plus
|
||||
`attacker.fingerprint_rotated` / `attacker.scored` for the same
|
||||
attacker.
|
||||
|
||||
Envelope identical to topology events:
|
||||
`{v, type, ts, payload}`. Day-one event types:
|
||||
`observation.<primitive>`, `fingerprint.rotated`, `attacker.scored`.
|
||||
|
||||
Auth: `?token=` query-param matching the existing per-topology and
|
||||
`/stream` pattern. Snapshot-on-connect serves the latest-per-primitive
|
||||
query result so the panel hydrates immediately, then live-forwards
|
||||
bus events. 15s keepalive, mirrors the topology route.
|
||||
|
||||
The global `/stream` is **not** the right fit here — it fans out
|
||||
every attacker's events to every subscriber, and the AttackerDetail
|
||||
page only cares about one. Per-attacker route, like
|
||||
per-topology.
|
||||
|
||||
## PII discipline
|
||||
|
||||
Binds at the BEHAVE layer; DECNET does not get to "improve" the
|
||||
envelope by reading raw bodies into payloads.
|
||||
|
||||
- Raw `[t,"i",d]` keystroke events stay on disk. Worker reads,
|
||||
extracts, discards.
|
||||
- `evidence_ref` is a *pointer* (`shard:path#sid`), never the
|
||||
evidence itself.
|
||||
- `value` JSON is bounded by the registry's `ValueTypeSpec` — no
|
||||
free-form blobs that could smuggle keystrokes.
|
||||
- Bigram simhashes (when emitted via `cognitive.*` digraph
|
||||
primitives) are *characters*, not *content* — already documented in
|
||||
BEHAVE's primitives module.
|
||||
|
||||
**Canonical PII binding.** The authoritative statement is the module
|
||||
docstring at `core/decnet_behave_core/spec/envelope.py:3-19` — it
|
||||
forbids raw keystrokes, command bodies, credentials, and payload
|
||||
bytes in observation values; `evidence_ref` is a pointer, never the
|
||||
evidence. That docstring is binding on this DECNET integration.
|
||||
*Not* `BEHAVE-SHELL/scratchpad.md` — scratchpads, by definition,
|
||||
aren't binding policy surfaces.
|
||||
|
||||
## Calibration grid IS the regression test
|
||||
|
||||
`tests/profiler/behave_shell/test_calibration_grid.py` runs the
|
||||
**pure engine** (`behave_shell.extract_session()` called directly,
|
||||
no worker, no bus, no DB) against each of the five
|
||||
`BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl`
|
||||
shards (gitignored — fixture path resolved via
|
||||
`BEHAVE_CALIBRATION_DIR` env var, skipped if unset). Asserts the
|
||||
expected primitive set fires per class:
|
||||
|
||||
| Shard | Class | Required primitives in output |
|
||||
|---|---|---|
|
||||
| `sessions-2026-05-02.jsonl` | HUMAN | `motor.input_modality=typed`, `cognitive.inter_command_consistency=bimodal`, `cognitive.feedback_loop_engagement=closed_loop`, `cognitive.command_branch_diversity=adaptive_branching` |
|
||||
| `sessions-2026-05-02-with-llm.jsonl` | YOU-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=occasional`, `cognitive.inter_command_latency_class=typing_speed`, `cognitive.command_branch_diversity=linear_playbook` |
|
||||
| `sessions-2026-05-02-new.jsonl` | LW-sim | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_lightweight`, `cognitive.command_branch_diversity=linear_playbook` |
|
||||
| `sessions-2026-05-02-with-claude.jsonl` | CLAUDE-FF | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=llm_heavyweight`, `cognitive.command_branch_diversity=linear_playbook`, `cognitive.feedback_loop_engagement=fire_and_forget` |
|
||||
| `sessions-2026-05-02-closed-loop.jsonl` | CLAUDE-CL | `motor.input_modality=pasted`, `motor.paste_burst_rate=habitual`, `cognitive.inter_command_latency_class=long`, `cognitive.command_branch_diversity=adaptive_branching`, `cognitive.feedback_loop_engagement=closed_loop` |
|
||||
|
||||
Any extractor change that breaks one of these classifications fails
|
||||
CI. The grid is the discriminative-power floor — calibration
|
||||
refinement can *add* primitives, never silently *drop* them.
|
||||
|
||||
## Phase plan
|
||||
|
||||
Per the "commit per task" memory rule, each phase ships as one commit
|
||||
with its own tests.
|
||||
|
||||
### Phase 1 — DECNET-side storage (no BEHAVE coupling yet)
|
||||
|
||||
- New `observations` table + SQLModel + repository methods.
|
||||
- Drop `SessionProfile` + `kd_*` columns from
|
||||
`decnet/web/db/models/attackers.py`.
|
||||
- AttackerDetail API switches to the latest-per-primitive query.
|
||||
Returns empty `observations: []` since nothing populates the table.
|
||||
- `decnet/bus/topics.py` registers `attacker.observation.*` prefix.
|
||||
- Tests: SQLModel CRUD, latest-per-primitive query against fixture
|
||||
rows, empty-attacker contract.
|
||||
|
||||
### Phase 2 — DECNET extraction engine (`decnet/profiler/behave_shell/`)
|
||||
|
||||
- Production extractor written against the BEHAVE spec, pure library
|
||||
(no I/O).
|
||||
- One feature-family module per `_features/{motor,cognitive,temporal,...}.py`.
|
||||
- Public entry: `extract_session(events, *, sid, source) -> Iterable[Observation]`.
|
||||
- Tests in `tests/profiler/behave_shell/_features/`: per-feature unit
|
||||
tests against synthetic event streams. The calibration-grid suite
|
||||
(Phase 5) is the integration test.
|
||||
- This phase has its own design surface — see `BEHAVE-EXTRACTOR.md`
|
||||
(filed as a sibling doc when Phase 1 lands). Phases 1 and 2 are
|
||||
largely independent; can run in parallel.
|
||||
|
||||
### Phase 3 — BEHAVE pin
|
||||
|
||||
- `pyproject.toml` pins `decnet-behave-core` and `decnet-behave-shell`
|
||||
at whatever versions the engine settles on.
|
||||
- CI install-time smoke: registry imports cleanly, envelope validates
|
||||
a known-good observation.
|
||||
|
||||
### Phase 4 — Wire the trigger into the existing profiler worker
|
||||
|
||||
- `decnet/profiler/worker.py` gains an `attacker.session.ended`
|
||||
subscription handler.
|
||||
- Handler does: resolve shard via disk-reach → call
|
||||
`behave_shell.extract_session()` → upsert into `observations` table
|
||||
→ publish each observation on the bus.
|
||||
- Poll fallback for `DECNET_BUS_ENABLED=false`.
|
||||
- Trigger isolation: handler exceptions logged, do not affect the
|
||||
existing scoring tick.
|
||||
- Tests in `tests/profiler/behave_shell/`: FakeBus path, poll-only
|
||||
path, disk-reach error paths, idempotency on re-run.
|
||||
- **No new systemd unit.** The existing `decnet-profiler.service`
|
||||
already supervises this code.
|
||||
|
||||
### Phase 5 — Calibration regression suite + UI surface
|
||||
|
||||
- `tests/profiler/behave_shell/test_calibration_grid.py` against all
|
||||
five BEHAVE shards.
|
||||
- New `GET /api/v1/attackers/{uuid}/events` SSE route (mirrors the
|
||||
per-topology pattern from DEBT-030); snapshot-on-connect +
|
||||
bus-forwarded `attacker.observation.*` events. Tests in
|
||||
`tests/api/attackers/test_events_stream.py`.
|
||||
- AttackerDetail.tsx renders the Behavioural primitives panel and
|
||||
consumes the SSE route for live updates.
|
||||
- Frontend Vitest coverage for the panel (DEBT-043 harness, shipped).
|
||||
|
||||
### Phase 6 — Live smoke
|
||||
|
||||
- Ship a decky, run a real SSH session from each calibration class
|
||||
manually, disconnect, observe `observations` rows + bus events +
|
||||
AttackerDetail panel.
|
||||
- Document the smoke procedure in
|
||||
`scripts/behave_shell/smoke.sh` (parallel to
|
||||
`scripts/bus/smoke-mutator.sh` — per-feature dirs).
|
||||
|
||||
## Out of scope
|
||||
|
||||
Filed for future paydown when they bite. Do not let them creep into
|
||||
this integration.
|
||||
|
||||
- **Attribution engine.** Consumes `attacker.observation.*`, emits
|
||||
`attribution.profile.candidate.*`. BEHAVE explicitly separates
|
||||
observation from attribution.
|
||||
- **Federation gossip** of observations across swarm hosts.
|
||||
- **Backfill** over historical shards (one-shot script when the
|
||||
table lands; not a worker feature).
|
||||
- **Webhook export** of observation streams (rides DEBT-037).
|
||||
- **Observation retention / vacuum.** Pre-v1, no users to mislead;
|
||||
filed when storage actually pressures.
|
||||
- **`SessionProfile` data migration.** None — table ships empty
|
||||
today, drop is destructive but lossless.
|
||||
- **Cross-domain BEHAVE** (BEHAVE-TEXT integration for stylometric
|
||||
analysis of attacker-typed messages, e.g. captured emails). Same
|
||||
`observations` table will accept those envelopes when their primitive
|
||||
registry is registered, but the wiring is a separate paydown.
|
||||
|
||||
## Resolved decisions (formerly open questions)
|
||||
|
||||
- **Q1 — engine location.** RESOLVED: BEHAVE's prototype is reference
|
||||
code only, never imported by DECNET. The production extraction
|
||||
engine lives in `decnet/profiler/behave_shell/` as a sublibrary of
|
||||
the existing profiler worker — no new daemon, no new systemd unit.
|
||||
(See "BEHAVE is the spec. DECNET is the engine.")
|
||||
- **Q2 — emission granularity.** RESOLVED: **per-(sid, primitive).**
|
||||
Every session emits its full primitive set; every emission
|
||||
persists. The schema already supports it; this just locks in the
|
||||
worker write loop. *More detail the better.*
|
||||
- **Q3 — cross-session aggregation, day one.** RESOLVED: latest wins
|
||||
per primitive in the AttackerDetail "current state" query. Simple,
|
||||
honest, easy to reason about.
|
||||
|
||||
## Real open question — Cross-session aggregation, the right way
|
||||
|
||||
Q3's "latest wins" is a stopgap. The actual question is harder and
|
||||
deserves its own design pass before AttackerDetail starts surfacing
|
||||
attribution-flavoured claims:
|
||||
|
||||
> **When two sessions from the same attacker (or identity) emit
|
||||
> conflicting values for the same primitive, what does the
|
||||
> attacker-level view say?**
|
||||
|
||||
Concrete cases:
|
||||
|
||||
- Session A: `motor.input_modality = typed` (conf 0.92).
|
||||
Session B (next day): `motor.input_modality = pasted` (conf 0.88).
|
||||
Is this attacker `mixed`? Or did they switch tooling? Or did a
|
||||
*different operator* take over the same credentialed access?
|
||||
- `cognitive.feedback_loop_engagement` flips from `closed_loop` to
|
||||
`fire_and_forget` between two sessions. Is this fatigue, a
|
||||
handoff (`operational.multi_actor_indicators=handoff_detected`?),
|
||||
or a script taking over from a human?
|
||||
- `cognitive.command_branch_diversity = unknown` in a short session
|
||||
vs `adaptive_branching` in a long session. Latest-wins would
|
||||
collapse this to `unknown` if the short session lands second —
|
||||
exactly the wrong answer.
|
||||
|
||||
**This is genuinely an attribution-engine concern**, not an
|
||||
extraction concern. BEHAVE is firm on that bright line. The clean
|
||||
answer is:
|
||||
|
||||
1. **DECNET stores all observations** (per-sid, per-primitive — Q2).
|
||||
2. **AttackerDetail's day-one "current state" query is latest-wins**
|
||||
(Q3) — not because it's right, but because it's *honestly
|
||||
transparent* about being naïve.
|
||||
3. **The right answer ships with the attribution engine** as a
|
||||
separate paydown — likely as new `attribution.profile.*` topics
|
||||
that emit a *derived* per-attacker primitive map with explicit
|
||||
merge semantics (`stable` / `drifting` / `conflicted` /
|
||||
`multi_actor`). Day-zero, that engine doesn't exist; day-one,
|
||||
AttackerDetail just shows raw latest values + a "N
|
||||
observations" hover.
|
||||
|
||||
Filed as **DEBT-051 — Cross-session BEHAVE primitive aggregation
|
||||
(attribution engine)** when this doc is reviewed. Out of scope for
|
||||
this integration; explicitly listed under "Out of scope" above.
|
||||
|
||||
---
|
||||
|
||||
**Owner:** ANTI.
|
||||
**Implementation gate:** this doc reviewed → Phase 1 starts.
|
||||
@@ -277,7 +277,17 @@ The Workers panel (Config → Workers) landed with bus-based STOP but every STAR
|
||||
|
||||
**Status:** Open. Depends on the Workers panel (shipped) and `deploy/decnet-bus.service` pattern being extended to the other workers.
|
||||
|
||||
### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards)
|
||||
### DEBT-036 — Session-profile ingester (keystroke-dynamics extraction from transcript shards) — **STALE 2026-05-03, SUPERSEDED BY DEBT-050**
|
||||
|
||||
> **Stale.** This entry was drafted before BEHAVE-SHELL existed. It bakes the
|
||||
> feature schema into hand-rolled `SessionProfile` columns (`kd_iki_mean`,
|
||||
> `kd_burst_ratio`, …), which duplicates the registry in
|
||||
> `BEHAVE/BEHAVE-SHELL/decnet_behave_shell/spec/primitives.py`, bypasses the
|
||||
> registry-validated `Observation` envelope, and skips the bus event adapter
|
||||
> (`event_topic_for` / `to_event_payload`) that already speaks DECNET's
|
||||
> `attacker.observation.*` topic shape. The replacement plan is **DEBT-050**
|
||||
> below. Original text preserved unchanged for context.
|
||||
|
||||
**Files:** `decnet/web/ingester.py` (or new sibling under `decnet/session_profiler/`), `decnet/web/db/models/attackers.py:SessionProfile` (table already exists, ships empty), `decnet/templates/_shared/sessrec/sessrec.c` (emitter side — already done), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer — already joins SessionProfile when present).
|
||||
|
||||
The `SessionProfile` SQLModel table has been committed to storage since session recording v1 landed (see `decnet/web/db/models/attackers.py:97-143`). Every column — `kd_iki_mean`, `kd_iki_stdev`, `kd_iki_p50`, `kd_iki_p95`, `kd_enter_latency_p50/p95`, `kd_burst_ratio`, `kd_think_ratio`, `kd_ctrl_backspace/wkill/ukill/abort/eof`, `kd_arrow_rate`, `kd_tab_rate`, `kd_digraph_simhash`, `total_keystrokes`, `session_duration_s` — is nullable by design because the **ingester that populates them does not exist yet** (documented as gap #2 in `SIGNAL_CAPTURE_AUDIT.md`). Every session that gets recorded lands an empty row (or, today, no row at all) while the `[t, "i", d]` event stream in the shard carries every signal those columns exist to capture.
|
||||
@@ -317,7 +327,83 @@ All four signals fall out of the schema for free. CoV from `kd_iki_mean` + `kd_i
|
||||
- The motivating-case wget session produces CoV ≈ 0.74 ± 0.05 when the ingester processes it — sanity check against the manual analysis.
|
||||
- The AttackerDetail page surfaces at least `kd_iki_mean` + `kd_burst_ratio` somewhere in the keystroke-dynamics section, unblocking the "is this the same typist" hover story.
|
||||
|
||||
**Status:** Open. Depends on the shard-scan fallback (shipped in `323077b`) and `SessionProfile` schema (shipped with session recording v1). The bus-trigger path depends on DEBT-031's deferred `attacker.session.started/ended` topics, but poll-driven ingestion works today and can ship first.
|
||||
**Status:** ⚠️ Stale — superseded by DEBT-050. Do not implement against this entry; the column-zoo design is the wrong shape now that BEHAVE-SHELL exists.
|
||||
|
||||
### DEBT-050 — BEHAVE-SHELL session-profile ingester worker (replaces DEBT-036)
|
||||
**Files:** `decnet/session_profiler/worker.py` (**new**), `decnet/web/db/models/observations.py` (**new** — generic Observation table, see Storage), `decnet/web/db/models/attackers.py` (drop `SessionProfile` and its `kd_*` columns), `decnet/web/router/attackers/api_get_attacker_detail.py` (consumer surface — switch from SessionProfile join to per-primitive Observation latest-state query), `decnet/bus/topics.py` (admit `attacker.observation.*` prefix), `decnet/web/db/sqlmodel_repo/observations.py` (**new** — repository methods), `packaging/systemd/decnet-session-profiler.service` (**new**), `pyproject.toml` (pin `decnet-behave-core`, `decnet-behave-shell`), **BEHAVE repo (separate commit):** `BEHAVE/prototype_extractors/shell/extract.py` (refactor `__main__` into importable `extract_session()`).
|
||||
|
||||
**Context.** ANTI built BEHAVE — an out-of-tree behavioural-observation framework with its own primitive registry, registry-validated `Observation` envelope, DECNET-bus event adapter, and a five-class calibration grid (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). It is the right substrate for keystroke-dynamics extraction; the original DEBT-036 entry predates it and got the schema wrong by inventing parallel columns. BEHAVE is a **separate repo** (mirrors `wiki-checkout` discipline — two repos, two commits per change).
|
||||
|
||||
**Design:**
|
||||
|
||||
1. **New worker** `decnet/session_profiler/worker.py`. Sibling of `decnet/ingester/`, supervised by a new `packaging/systemd/decnet-session-profiler.service` unit (mirrors DEBT-034's pattern). One process per host, agent-or-master-agnostic.
|
||||
2. **Trigger.** Subscribe on the bus to `attacker.session.ended`; poll-fallback over `Log.event_type='session_recorded'` rows lacking a "profiled" marker (see Storage). Bus-optional per DEBT-031: `try get_bus(); except: warn-and-degrade-to-poll`.
|
||||
3. **Disk-reach** (per DEBT-047 precedent). For each `(decky, service, sid)`, resolve the shard via `_find_shard_with_sid` (already shipped in `323077b`), open the JSONL, walk the per-sid event slice. **No raw `d` values cross the worker→bus boundary** — BEHAVE's envelope rules prohibit it, and disk-reach keeps the input stream host-local.
|
||||
4. **Extraction.** Refactor `BEHAVE/prototype_extractors/shell/extract.py`'s `__main__` into an importable `extract_session(events: Iterable[AsciinemaEvent]) -> Iterable[Observation]`. Feed it the per-sid `[t,"i",d]` slice. Output is a stream of registry-validated `Observation`s, one per primitive that fired for the session. **Refactor lands in the BEHAVE repo as a separate commit** (two repos, two commits).
|
||||
5. **Bus emission.** For each `obs`: `bus.publish(event_topic_for(obs.primitive), to_event_payload(obs))`. The adapter is pure-stdlib, no DECNET imports — DECNET is the consumer of *its* contract, not the other way around. Topic prefix `attacker.observation.*` registered in `decnet/bus/topics.py`.
|
||||
6. **Storage — drop `SessionProfile`, new generic `Observation` table.** Schema mirrors the BEHAVE envelope 1:1 so persistence cannot drift from the wire format:
|
||||
|
||||
```
|
||||
observations (
|
||||
id UUID PRIMARY KEY, -- BEHAVE Observation.id
|
||||
attacker_uuid UUID NOT NULL FK, -- denormalised from identity_ref or join-resolved
|
||||
identity_ref UUID NULL, -- raw envelope field, may be null pre-attribution
|
||||
primitive TEXT NOT NULL, -- 'motor.keystroke_cadence' etc.
|
||||
value JSON NOT NULL, -- envelope shape; SQLAlchemy JSON not JSONB (memory rule)
|
||||
confidence REAL NOT NULL,
|
||||
window_start_ts REAL NOT NULL,
|
||||
window_end_ts REAL NOT NULL,
|
||||
source TEXT NOT NULL,
|
||||
evidence_ref TEXT NULL, -- shard:sid pointer for disk-reach audit, never evidence itself
|
||||
envelope_v INTEGER NOT NULL, -- BEHAVE Observation.v (currently 1)
|
||||
ts REAL NOT NULL, -- emission ts
|
||||
INDEX (attacker_uuid, primitive, ts DESC),
|
||||
INDEX (primitive, ts DESC)
|
||||
)
|
||||
```
|
||||
|
||||
AttackerDetail's "current state per primitive" view = `SELECT DISTINCT ON (primitive) … ORDER BY primitive, ts DESC` (or the SQLite equivalent via window function). `SessionProfile` and its `kd_*` columns are dropped outright — pre-v1, no users to mislead, no migration ceremony (DEBT-011 still deferred; just edit the SQLModel).
|
||||
7. **Packaging.** Pin `decnet-behave-core>=0.1.0,<0.2` and `decnet-behave-shell>=0.1.0,<0.2` in DECNET's `pyproject.toml`. Envelope schema is currently `v=1` (`https://behave.local/schema/observation/v1.json`); the `observations.envelope_v` column tracks it so a future `v=2` envelope can land alongside without a destructive migration. Local dev: `pip install -e ../BEHAVE/core ../BEHAVE/BEHAVE-SHELL`. CI installs the pinned wheels from a BEHAVE release tag — bump the cap when BEHAVE cuts `0.2.0`.
|
||||
|
||||
**Non-negotiables:**
|
||||
- Registry validation is enforced at construction time by BEHAVE's `Observation` subclass — no DECNET-side primitive whitelist, no drift.
|
||||
- Extractor refactor must keep `extract.py --summary` and the calibration-grid CLI flow working; the library entry-point is *additive*.
|
||||
- `DECNET_BUS_ENABLED=false` keeps the worker functional in poll-only mode (mirrors DEBT-031).
|
||||
- Idempotent on re-run: same shard + same sid → same observation set (sort+dedupe by primitive before emitting).
|
||||
- PII discipline binds at the BEHAVE layer; DECNET does not get to "improve" the envelope by reading raw bodies into payloads.
|
||||
|
||||
**Acceptance:**
|
||||
- Replay each of the five `BEHAVE/prototype_extractors/shell/sessions-2026-05-02-*.jsonl` calibration shards through the worker. Each session produces the BEHAVE-SHELL primitives that the README's class-signature column predicts (e.g. CLAUDE-FF: `motor.input_modality=pasted` + `motor.paste_burst_rate=habitual` + `cognitive.inter_command_latency_class=llm_heavyweight` + `cognitive.command_branch_diversity=linear_playbook` + `cognitive.feedback_loop_engagement=fire_and_forget`).
|
||||
- AttackerDetail surfaces at least `motor.input_modality`, `cognitive.feedback_loop_engagement`, and `cognitive.command_branch_diversity` for any attacker with a profiled session.
|
||||
- The five-class grid IS the regression test — any extractor change must keep all five sessions classifying within their expected primitive sets.
|
||||
|
||||
**Out of scope (defer to DEBT-051+ as they bite):**
|
||||
- Attribution engine (consumes `attacker.observation.*`, emits `attribution.profile.candidate.*`). BEHAVE deliberately separates observation from attribution.
|
||||
- Federation gossip of observations across swarm hosts.
|
||||
- Backfill over historical shards.
|
||||
- Webhook export of observation streams (rides DEBT-037).
|
||||
|
||||
**Status:** Open. Replaces DEBT-036. Depends on (a) BEHAVE-SHELL spec frozen at v0.x, (b) `extract.py` library refactor in the BEHAVE repo, (c) shard-scan fallback (shipped `323077b`).
|
||||
|
||||
### DEBT-051 — Cross-session BEHAVE primitive aggregation (attribution engine)
|
||||
**Files:** `decnet/correlation/attribution/` (**new**), `decnet/web/db/models/attribution_state.py` (**new**), `decnet/bus/topics.py` (`attribution.profile.*` prefix), `decnet/web/router/attackers/api_get_attacker_detail.py` (state-badge wiring).
|
||||
|
||||
`BEHAVE-INTEGRATION.md`'s Q3 settled the AttackerDetail "current state" surface as **latest-wins per primitive** for v0 — honest about being naïve. The harder question — *how do conflicting observations across sessions of the same attacker resolve into a stable view?* — is filed here.
|
||||
|
||||
Concrete cases:
|
||||
- Session A says `motor.input_modality = typed`, session B says `pasted`. Mixed? Operator switched tooling? Different operator on shared creds?
|
||||
- `cognitive.feedback_loop_engagement` flips closed_loop ↔ fire_and_forget across sessions. Fatigue, handoff (`operational.multi_actor_indicators=handoff_detected`), or scripted takeover?
|
||||
- A short session emits `cognitive.command_branch_diversity=unknown`; a long one emits `adaptive_branching`. Latest-wins would collapse to `unknown` if the short one lands second — exactly the wrong answer.
|
||||
|
||||
**This is genuinely an attribution-engine concern**, not an extraction concern (BEHAVE's bright line is firm on the split). The clean answer:
|
||||
|
||||
1. DECNET stores all observations per-(sid, primitive). ✅ Substrate ships in DEBT-050.
|
||||
2. AttackerDetail's day-one query is latest-wins (Q3 above). ✅ Substrate ships in DEBT-050.
|
||||
3. The right answer ships as a derived per-(attacker, primitive) state machine emitting `attribution.profile.state_changed` events with explicit merge semantics: `stable / drifting / conflicted / multi_actor / unknown`.
|
||||
|
||||
Full design in `development/ATTRIBUTION-ENGINE.md`. v0 scope: aggregation only over per-`attacker_uuid` proto-identities (sidesteps the still-deferred clusterer from `IDENTITY_RESOLUTION.md`); v1 widens to identity_uuid clustering; v2 federation gossip.
|
||||
|
||||
**Status:** Open. Depends on DEBT-050 v0 in production for ≥ 1 month (so the engine has observation data to merge against) + a calibration corpus that exercises drift / multi-actor scenarios end-to-end.
|
||||
|
||||
### ~~DEBT-035 — Artifacts written as the container uid, not the API's~~ ✅ RESOLVED 2026-05-02
|
||||
**Files:** `decnet/cli/init.py`, `decnet/web/router/transcripts/api_get_transcript.py` (soft-fail kept as defence-in-depth).
|
||||
@@ -717,7 +803,9 @@ user who needs it.
|
||||
| ~~DEBT-032~~ | ✅ | Correlation / Prober | resolved 2026-05-03 |
|
||||
| DEBT-033 | 🟡 Medium | Storage / Session recording | open |
|
||||
| ~~DEBT-035~~ | ✅ | Artifacts / Filesystem perms | resolved 2026-05-02 |
|
||||
| DEBT-036 | 🟡 Medium | Correlation / Keystroke dynamics | open |
|
||||
| DEBT-036 | ⚠️ Stale | Correlation / Keystroke dynamics | superseded by DEBT-050 |
|
||||
| DEBT-050 | 🟡 Medium | BEHAVE-SHELL session-profile ingester | open (replaces DEBT-036) |
|
||||
| DEBT-051 | 🟡 Medium | Attribution engine / cross-session aggregation | open (depends on DEBT-050) |
|
||||
| DEBT-037 | 🟡 Medium | Integration / Webhooks | open (tracks MVP follow-ups) |
|
||||
| DEBT-038 | 🟡 Medium | Honeypot / SSH cred capture | open (document-only) |
|
||||
| ~~DEBT-039~~ | ✅ | Honeypot / Cred emitters | resolved |
|
||||
@@ -732,5 +820,5 @@ user who needs it.
|
||||
| DEBT-048 | 🟡 Medium | TTP / Intel provider mapping review (recurring) | open / recurring |
|
||||
| DEBT-049 | 🟡 Medium | TTP / Sigma adapter (post-v1) | open |
|
||||
|
||||
**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-036 (session-profile ingester), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1).
|
||||
**Remaining open:** DEBT-011 (Alembic), DEBT-027 (Dynamic bait store), DEBT-028 (deploy endpoint tests), DEBT-033 (transcript shard rotation), DEBT-037 (webhook delivery hardening), DEBT-038 (SSH PAM cred-capture limitations — document-only), DEBT-045 (EmailLifter heavyweight — partial paid; carved-out follow-ups remain), DEBT-048 (TTP intel provider mapping review — recurring quarterly), DEBT-049 (TTP Sigma adapter — post-v1), DEBT-050 (BEHAVE-SHELL session-profile ingester — replaces DEBT-036), DEBT-051 (attribution engine / cross-session aggregation). DEBT-036 is stale.
|
||||
**Estimated remaining effort:** ~21 hours plus the new EmailLifter / TTP follow-ups. DEBT-030 Phase B (optimistic staged-buffer editor) is a follow-up, not debt.
|
||||
|
||||
Reference in New Issue
Block a user