feat(correlation/attribution): substrate + idle handler (Phase 1)

v0 Phase 1 of ATTRIBUTION-ENGINE.md: * AttributionStateRow SQLModel keyed on (identity_uuid, primitive) per ANTI direction — re-keying state rows when the v1 clusterer merges attackers is the migration debt v0 should not bake in. ATTRIBUTION-ENGINE.md updated with the deviation note. * AttributionMixin: ensure_stub_identity_for_attacker, idempotent upsert_attribution_state, get_attribution_state[_for_identity], list_multi_actor_identities (the Phase 5 correlator's read). * attribution.profile.{state_changed,multi_actor_suspected} bus topics + builder; wiki Service-Bus.md updated separately. * attribution_worker.py: subscribes to attacker.observation.>, ensures stub identity per event, logs and continues. No merger, no state writes, no derived events — Phase 4 wires those. * attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2 fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
2026-05-08 23:16:13 -04:00
parent e94ab608d9
commit c2891d6cca
15 changed files with 1203 additions and 0 deletions
--- a/decnet/correlation/attribution/init.py
+++ b/decnet/correlation/attribution/init.py
@@ -0,0 +1,21 @@
+"""DECNET attribution engine — v0 aggregation library.
+
+Pure library: per-(identity, primitive) state machine over BEHAVE-SHELL
+observations. No I/O, no bus, no DB. The bus subscriber and DB writes
+live in :mod:`decnet.correlation.attribution_worker` so this package
+stays trivially testable with synthetic observation lists.
+
+See ``development/ATTRIBUTION-ENGINE.md`` for the full design and the
+explicit bright line: this engine does NOT do persona classification
+(HUMAN/LLM/SCRIPTED), does NOT gate access, does NOT attribute to
+named persons. It surfaces *behavioural coherence* and *behavioural
+drift*, and stops there.
+"""
+from __future__ import annotations
+
+from decnet.correlation.attribution.aggregate import (
+    AttributionState,
+    aggregate_observations,
+)
+
+__all__ = ["AttributionState", "aggregate_observations"]
--- a/decnet/correlation/attribution/_thresholds.py
+++ b/decnet/correlation/attribution/_thresholds.py
@@ -0,0 +1,62 @@
+"""Calibration thresholds for the attribution engine — every magic
+number lives here, named, with the calibration source cited.
+
+v0 values are heuristic. Real calibration ships when red-team
+exercises produce labelled trace data
+(``ATTRIBUTION-ENGINE.md`` §"Out of scope"). Until then these constants
+are the engine's only knobs; aggregate.py never embeds a literal.
+"""
+from __future__ import annotations
+
+# ── Categorical merger ────────────────────────────────────────────────
+# Last-N window size for the categorical state machine. 5 calibrates
+# against typical session counts (most attackers are observed < 10
+# times before they go quiet — ATTRIBUTION-ENGINE.md §"Open question
+# 2"). Operators with long-running attackers will want a wider window
+# in v1.
+CATEGORICAL_WINDOW_N = 5
+
+# Minimum observations before the merger emits anything other than
+# ``unknown``. Below this floor the state machine has no signal.
+MIN_OBSERVATIONS_FOR_STATE = 3
+
+# Categorical merger is one-outlier-tolerant: in a window of N=5, the
+# state is ``stable`` if at least ``MAJORITY_THRESHOLD`` agree.
+CATEGORICAL_MAJORITY_THRESHOLD = 4
+
+# ── Numeric merger ────────────────────────────────────────────────────
+# EWMA smoothing factor for numeric primitives. 0.3 weights recent
+# observations enough to surface drift quickly without flapping on
+# single outliers.
+NUMERIC_EWMA_ALPHA = 0.3
+
+# Coefficient-of-variation thresholds: dispersion / |mean|.
+NUMERIC_STABLE_DISPERSION_PCT = 0.20    # < 20% of mean → stable
+NUMERIC_DRIFT_MEAN_SHIFT_PCT = 0.30     # mean moved > 30% → drifting
+NUMERIC_CONFLICT_DISPERSION_PCT = 1.0   # > 100% of mean → conflicted
+
+# ── Hash merger ───────────────────────────────────────────────────────
+# Rotations within HASH_DRIFT_WINDOW count toward state transitions.
+# Below DRIFT_MAX → drifting; above → conflicted. The values mirror the
+# DEBT-032 fingerprint-rotation calibration — bumped by one because
+# the attribution engine takes one rotation as evidence-of-life, not
+# yet evidence-of-drift.
+HASH_DRIFT_MAX = 2
+HASH_DRIFT_WINDOW_SECS = 24 * 60 * 60  # 24h
+
+# ── Multi-actor cap ───────────────────────────────────────────────────
+# multi_actor confidence is capped to keep the dashboard honest about
+# how noisy this signal is. ATTRIBUTION-ENGINE.md §"Open question 1":
+# flapping primitives on flaky networks look like two operators.
+MULTI_ACTOR_MAX_CONFIDENCE = 0.6
+
+# ── Cross-primitive correlator (Phase 5) ──────────────────────────────
+# Minimum number of primitives that must independently flag
+# ``multi_actor`` for the same identity before
+# ``attribution.profile.multi_actor_suspected`` fires.
+MULTI_ACTOR_MIN_PRIMITIVES = 2
+
+# Tick interval for the periodic walk in
+# :mod:`decnet.correlation.attribution_worker`. Configurable via env
+# var in v1; hardcoded in v0.
+MULTI_ACTOR_TICK_SECS = 60.0
--- a/decnet/correlation/attribution/aggregate.py
+++ b/decnet/correlation/attribution/aggregate.py
@@ -0,0 +1,87 @@
+"""Per-(identity, primitive) state-machine — the attribution engine's
+core merge logic.
+
+Pure: given a list of BEHAVE observations for one
+``(identity_uuid, primitive)`` pair, returns the derived state and
+mirror metadata. No DB, no bus, no I/O. The worker
+(``decnet.correlation.attribution_worker``) is responsible for loading
+the observations and writing the state row.
+
+State vocabulary is frozen at five values (see
+``ATTRIBUTION-ENGINE.md``):
+
+* ``unknown``      — < 3 observations (insufficient signal)
+* ``stable``       — recent N agree
+* ``drifting``     — recent N stable but disagree with older N
+* ``conflicted``   — recent N split
+* ``multi_actor``  — conflicted + cross-session alternation pattern
+
+Phase 2 ships :func:`_aggregate_categorical`. Phase 3 will add
+:func:`_aggregate_numeric` and :func:`_aggregate_hash` and the
+ValueKind dispatcher.
+"""
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any, Iterable, Sequence
+
+__all__ = ["AttributionState", "aggregate_observations"]
+
+
+@dataclass(frozen=True)
+class AttributionState:
+    """Output of the merger for one ``(identity, primitive)`` pair.
+
+    The fields map 1:1 onto :class:`AttributionStateRow` columns —
+    callers compose the final dict for ``upsert_attribution_state``
+    by adding ``identity_uuid`` and ``primitive`` (the merger does not
+    own the natural key).
+    """
+
+    current_value: Any
+    state: str
+    confidence: float
+    observation_count: int
+    last_observation_ts: float
+
+
+def aggregate_observations(
+    observations: Sequence[dict[str, Any]],
+) -> AttributionState:
+    """Run the merger over *observations* and return the derived state.
+
+    *observations* is a list of dicts with at minimum ``value``,
+    ``ts``, and ``confidence`` fields (matching the BEHAVE
+    ``Observation`` envelope shape that
+    ``ObservationRow.observations_time_series`` returns). They MUST
+    arrive ordered by ``ts`` ascending; the merger assumes that.
+
+    Phase 2 only supports categorical values. Phase 3 will dispatch
+    on the BEHAVE primitive's ``ValueKind`` and pick the right merger.
+    """
+    if not observations:
+        return AttributionState(
+            current_value=None,
+            state="unknown",
+            confidence=0.0,
+            observation_count=0,
+            last_observation_ts=0.0,
+        )
+    # Phase 2 stub — categorical only. Phase 3 will inspect
+    # ``primitive`` (passed in alongside observations) to pick a
+    # merger; for now defer to the categorical implementation
+    # (``_aggregate_categorical``) which Phase 2 lands.
+    raise NotImplementedError(
+        "aggregate_observations is implemented in Phase 2 (categorical) "
+        "and Phase 3 (numeric + hash). v0 Phase 1 ships the substrate "
+        "only; the worker logs without invoking the merger.",
+    )
+
+
+def _coerce_obs_iter(
+    observations: Iterable[dict[str, Any]],
+) -> list[dict[str, Any]]:
+    """Defensive: accept any iterable, return a list. Used by the
+    worker which pulls observations off the bus + DB into mixed
+    iterables."""
+    return list(observations)