feat(correlation/attribution): substrate + idle handler (Phase 1)

v0 Phase 1 of ATTRIBUTION-ENGINE.md: * AttributionStateRow SQLModel keyed on (identity_uuid, primitive) per ANTI direction — re-keying state rows when the v1 clusterer merges attackers is the migration debt v0 should not bake in. ATTRIBUTION-ENGINE.md updated with the deviation note. * AttributionMixin: ensure_stub_identity_for_attacker, idempotent upsert_attribution_state, get_attribution_state[_for_identity], list_multi_actor_identities (the Phase 5 correlator's read). * attribution.profile.{state_changed,multi_actor_suspected} bus topics + builder; wiki Service-Bus.md updated separately. * attribution_worker.py: subscribes to attacker.observation.>, ensures stub identity per event, logs and continues. No merger, no state writes, no derived events — Phase 4 wires those. * attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2 fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
2026-05-08 23:16:13 -04:00
parent e94ab608d9
commit c2891d6cca
15 changed files with 1203 additions and 0 deletions
--- a/decnet/correlation/attribution/init.py
+++ b/decnet/correlation/attribution/init.py
@@ -0,0 +1,21 @@
+"""DECNET attribution engine — v0 aggregation library.
+
+Pure library: per-(identity, primitive) state machine over BEHAVE-SHELL
+observations. No I/O, no bus, no DB. The bus subscriber and DB writes
+live in :mod:`decnet.correlation.attribution_worker` so this package
+stays trivially testable with synthetic observation lists.
+
+See ``development/ATTRIBUTION-ENGINE.md`` for the full design and the
+explicit bright line: this engine does NOT do persona classification
+(HUMAN/LLM/SCRIPTED), does NOT gate access, does NOT attribute to
+named persons. It surfaces *behavioural coherence* and *behavioural
+drift*, and stops there.
+"""
+from __future__ import annotations
+
+from decnet.correlation.attribution.aggregate import (
+    AttributionState,
+    aggregate_observations,
+)
+
+__all__ = ["AttributionState", "aggregate_observations"]
--- a/decnet/correlation/attribution/_thresholds.py
+++ b/decnet/correlation/attribution/_thresholds.py
@@ -0,0 +1,62 @@
+"""Calibration thresholds for the attribution engine — every magic
+number lives here, named, with the calibration source cited.
+
+v0 values are heuristic. Real calibration ships when red-team
+exercises produce labelled trace data
+(``ATTRIBUTION-ENGINE.md`` §"Out of scope"). Until then these constants
+are the engine's only knobs; aggregate.py never embeds a literal.
+"""
+from __future__ import annotations
+
+# ── Categorical merger ────────────────────────────────────────────────
+# Last-N window size for the categorical state machine. 5 calibrates
+# against typical session counts (most attackers are observed < 10
+# times before they go quiet — ATTRIBUTION-ENGINE.md §"Open question
+# 2"). Operators with long-running attackers will want a wider window
+# in v1.
+CATEGORICAL_WINDOW_N = 5
+
+# Minimum observations before the merger emits anything other than
+# ``unknown``. Below this floor the state machine has no signal.
+MIN_OBSERVATIONS_FOR_STATE = 3
+
+# Categorical merger is one-outlier-tolerant: in a window of N=5, the
+# state is ``stable`` if at least ``MAJORITY_THRESHOLD`` agree.
+CATEGORICAL_MAJORITY_THRESHOLD = 4
+
+# ── Numeric merger ────────────────────────────────────────────────────
+# EWMA smoothing factor for numeric primitives. 0.3 weights recent
+# observations enough to surface drift quickly without flapping on
+# single outliers.
+NUMERIC_EWMA_ALPHA = 0.3
+
+# Coefficient-of-variation thresholds: dispersion / |mean|.
+NUMERIC_STABLE_DISPERSION_PCT = 0.20    # < 20% of mean → stable
+NUMERIC_DRIFT_MEAN_SHIFT_PCT = 0.30     # mean moved > 30% → drifting
+NUMERIC_CONFLICT_DISPERSION_PCT = 1.0   # > 100% of mean → conflicted
+
+# ── Hash merger ───────────────────────────────────────────────────────
+# Rotations within HASH_DRIFT_WINDOW count toward state transitions.
+# Below DRIFT_MAX → drifting; above → conflicted. The values mirror the
+# DEBT-032 fingerprint-rotation calibration — bumped by one because
+# the attribution engine takes one rotation as evidence-of-life, not
+# yet evidence-of-drift.
+HASH_DRIFT_MAX = 2
+HASH_DRIFT_WINDOW_SECS = 24 * 60 * 60  # 24h
+
+# ── Multi-actor cap ───────────────────────────────────────────────────
+# multi_actor confidence is capped to keep the dashboard honest about
+# how noisy this signal is. ATTRIBUTION-ENGINE.md §"Open question 1":
+# flapping primitives on flaky networks look like two operators.
+MULTI_ACTOR_MAX_CONFIDENCE = 0.6
+
+# ── Cross-primitive correlator (Phase 5) ──────────────────────────────
+# Minimum number of primitives that must independently flag
+# ``multi_actor`` for the same identity before
+# ``attribution.profile.multi_actor_suspected`` fires.
+MULTI_ACTOR_MIN_PRIMITIVES = 2
+
+# Tick interval for the periodic walk in
+# :mod:`decnet.correlation.attribution_worker`. Configurable via env
+# var in v1; hardcoded in v0.
+MULTI_ACTOR_TICK_SECS = 60.0
--- a/decnet/correlation/attribution/aggregate.py
+++ b/decnet/correlation/attribution/aggregate.py
@@ -0,0 +1,87 @@
+"""Per-(identity, primitive) state-machine — the attribution engine's
+core merge logic.
+
+Pure: given a list of BEHAVE observations for one
+``(identity_uuid, primitive)`` pair, returns the derived state and
+mirror metadata. No DB, no bus, no I/O. The worker
+(``decnet.correlation.attribution_worker``) is responsible for loading
+the observations and writing the state row.
+
+State vocabulary is frozen at five values (see
+``ATTRIBUTION-ENGINE.md``):
+
+* ``unknown``      — < 3 observations (insufficient signal)
+* ``stable``       — recent N agree
+* ``drifting``     — recent N stable but disagree with older N
+* ``conflicted``   — recent N split
+* ``multi_actor``  — conflicted + cross-session alternation pattern
+
+Phase 2 ships :func:`_aggregate_categorical`. Phase 3 will add
+:func:`_aggregate_numeric` and :func:`_aggregate_hash` and the
+ValueKind dispatcher.
+"""
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any, Iterable, Sequence
+
+__all__ = ["AttributionState", "aggregate_observations"]
+
+
+@dataclass(frozen=True)
+class AttributionState:
+    """Output of the merger for one ``(identity, primitive)`` pair.
+
+    The fields map 1:1 onto :class:`AttributionStateRow` columns —
+    callers compose the final dict for ``upsert_attribution_state``
+    by adding ``identity_uuid`` and ``primitive`` (the merger does not
+    own the natural key).
+    """
+
+    current_value: Any
+    state: str
+    confidence: float
+    observation_count: int
+    last_observation_ts: float
+
+
+def aggregate_observations(
+    observations: Sequence[dict[str, Any]],
+) -> AttributionState:
+    """Run the merger over *observations* and return the derived state.
+
+    *observations* is a list of dicts with at minimum ``value``,
+    ``ts``, and ``confidence`` fields (matching the BEHAVE
+    ``Observation`` envelope shape that
+    ``ObservationRow.observations_time_series`` returns). They MUST
+    arrive ordered by ``ts`` ascending; the merger assumes that.
+
+    Phase 2 only supports categorical values. Phase 3 will dispatch
+    on the BEHAVE primitive's ``ValueKind`` and pick the right merger.
+    """
+    if not observations:
+        return AttributionState(
+            current_value=None,
+            state="unknown",
+            confidence=0.0,
+            observation_count=0,
+            last_observation_ts=0.0,
+        )
+    # Phase 2 stub — categorical only. Phase 3 will inspect
+    # ``primitive`` (passed in alongside observations) to pick a
+    # merger; for now defer to the categorical implementation
+    # (``_aggregate_categorical``) which Phase 2 lands.
+    raise NotImplementedError(
+        "aggregate_observations is implemented in Phase 2 (categorical) "
+        "and Phase 3 (numeric + hash). v0 Phase 1 ships the substrate "
+        "only; the worker logs without invoking the merger.",
+    )
+
+
+def _coerce_obs_iter(
+    observations: Iterable[dict[str, Any]],
+) -> list[dict[str, Any]]:
+    """Defensive: accept any iterable, return a list. Used by the
+    worker which pulls observations off the bus + DB into mixed
+    iterables."""
+    return list(observations)
--- a/decnet/correlation/attribution_worker.py
+++ b/decnet/correlation/attribution_worker.py
@@ -0,0 +1,178 @@
+"""Attribution-engine bus subscriber — v0 Phase 1 skeleton.
+
+Subscribes to ``attacker.observation.>`` and, for each event, ensures
+the source attacker has a stub identity in ``attacker_identities``.
+Phase 1 does **not** invoke the merger or write
+``attribution_state`` rows; that wiring lands in Phase 4 once the
+Phase 2/3 mergers are in.
+
+Pattern mirrors :mod:`decnet.correlation.reuse_worker`: bus-subscribe
+with a wake event, fall back to poll-only if the bus is unavailable,
+publish derived events with :func:`publish_safely`, log per-handler
+exceptions and continue.
+
+Trigger isolation: the per-event handler is wrapped in a single
+try/except. Any exception is logged and the loop continues with the
+next event. This is the same posture BEHAVE-SHELL's
+``_handler.handle_session_ended`` adopts.
+"""
+from __future__ import annotations
+
+import asyncio
+import contextlib
+from typing import Any
+
+from decnet.bus import topics as _topics
+from decnet.bus.base import BaseBus
+from decnet.bus.factory import get_bus
+from decnet.bus.publish import (
+    run_control_listener_signal as _run_control_listener_signal,
+    run_health_heartbeat as _run_health_heartbeat,
+)
+from decnet.logging import get_logger
+from decnet.web.db.repository import BaseRepository
+
+log = get_logger("correlation.attribution_worker")
+
+_WORKER_NAME = "attribution"
+_OBSERVATION_PATTERN = f"{_topics.ATTACKER}.{_topics.ATTACKER_OBSERVATION_PREFIX}.>"
+
+
+async def run_attribution_loop(
+    repo: BaseRepository,
+    *,
+    shutdown: asyncio.Event | None = None,
+) -> None:
+    """Run the attribution worker until cancelled.
+
+    *shutdown* is an optional external stop signal; the loop also
+    exits cleanly on ``CancelledError`` and ``KeyboardInterrupt``.
+    """
+    log.info("attribution worker started pattern=%s", _OBSERVATION_PATTERN)
+
+    bus: BaseBus | None = None
+    sub_task: asyncio.Task | None = None
+    heartbeat_task: asyncio.Task | None = None
+    control_task: asyncio.Task | None = None
+    try:
+        candidate = get_bus(client_name=f"{_WORKER_NAME}-correlator")
+        await candidate.connect()
+        bus = candidate
+        sub_task = asyncio.create_task(
+            _consume_observations(bus, repo),
+        )
+        heartbeat_task = asyncio.create_task(
+            _run_health_heartbeat(bus, _WORKER_NAME),
+        )
+        control_task = asyncio.create_task(
+            _run_control_listener_signal(bus, _WORKER_NAME),
+        )
+    except Exception as exc:  # noqa: BLE001
+        log.warning(
+            "attribution worker: bus unavailable, idle until bus returns: %s",
+            exc,
+        )
+
+    if shutdown is None:
+        shutdown = asyncio.Event()
+
+    try:
+        await shutdown.wait()
+    except (asyncio.CancelledError, KeyboardInterrupt):
+        log.info("attribution worker stopped")
+    finally:
+        for task in (sub_task, heartbeat_task, control_task):
+            if task is None:
+                continue
+            task.cancel()
+            with contextlib.suppress(asyncio.CancelledError, Exception):
+                await task
+        if bus is not None:
+            with contextlib.suppress(Exception):
+                await bus.close()
+
+
+async def _consume_observations(
+    bus: BaseBus, repo: BaseRepository,
+) -> None:
+    """Pull events off ``attacker.observation.>`` and dispatch each
+    to :func:`handle_observation_event`.
+
+    Per-event exceptions are caught and logged; the subscription
+    survives bad payloads. If the subscription itself dies (bus
+    disconnect), the worker idles — the supervisor systemd unit
+    will restart on a clean exit.
+    """
+    try:
+        sub = bus.subscribe(_OBSERVATION_PATTERN)
+        async with sub:
+            async for event in sub:
+                try:
+                    await handle_observation_event(bus, repo, event)
+                except Exception:  # noqa: BLE001
+                    log.exception("attribution worker: handler failed")
+    except asyncio.CancelledError:
+        raise
+    except Exception as exc:  # noqa: BLE001
+        log.warning(
+            "attribution worker: subscriber for %s died (%s)",
+            _OBSERVATION_PATTERN, exc,
+        )
+
+
+async def handle_observation_event(
+    bus: BaseBus | None,
+    repo: BaseRepository,
+    event: Any,
+) -> None:
+    """Handle one ``attacker.observation.<primitive>`` event.
+
+    Phase 1: ensure the source attacker has a stub identity, then log
+    and return. Phase 4 will: load prior state, run merger, upsert
+    new state, emit ``attribution.profile.state_changed`` on
+    transition.
+
+    *event* is whatever shape :class:`BaseBus`'s subscription yields —
+    a ``BusEvent`` with ``payload`` (dict) and ``event_type`` (str)
+    fields. The payload carries the BEHAVE envelope plus DECNET-side
+    ``attacker_uuid`` denorm (see
+    ``decnet.profiler.behave_shell._handler._publish_observation``).
+    """
+    payload = _payload_of(event)
+    attacker_uuid = payload.get("attacker_uuid")
+    primitive = payload.get("primitive")
+    if not attacker_uuid or not primitive:
+        log.debug(
+            "attribution worker: skipping malformed event (uuid=%r primitive=%r)",
+            attacker_uuid, primitive,
+        )
+        return
+    identity_uuid = await repo.ensure_stub_identity_for_attacker(
+        str(attacker_uuid),
+    )
+    if identity_uuid is None:
+        log.info(
+            "attribution worker: no Attacker row for uuid=%s yet; deferring",
+            attacker_uuid,
+        )
+        return
+    # Phase 4 will run the merger here and emit
+    # ``attribution.profile.state_changed`` on transition. Phase 1
+    # ends with stub materialisation only.
+    log.debug(
+        "attribution worker: stub identity=%s for attacker=%s primitive=%s",
+        identity_uuid, attacker_uuid, primitive,
+    )
+
+
+def _payload_of(event: Any) -> dict[str, Any]:
+    """Extract the dict payload from a BusEvent or fall through if
+    *event* is already a dict (test fixtures may pass either)."""
+    payload = getattr(event, "payload", event)
+    return payload if isinstance(payload, dict) else {}
+
+
+__all__ = [
+    "run_attribution_loop",
+    "handle_observation_event",
+]