feat(web/db): observations table + repo + bus prefix (BEHAVE-INTEGRATION Phase 1)

Additive Phase 1 of BEHAVE-INTEGRATION.md. Lays the storage layer
the BEHAVE-SHELL extractor (DEBT-050) will write into. Nothing
breaks; SessionProfile coexists for now and is dropped in the
follow-up commit.

decnet/web/db/models/observations.py — new ObservationRow SQLModel
mirroring the BEHAVE Observation envelope field-for-field
(core/decnet_behave_core/spec/envelope.py). ``id`` is a hex-string
UUID (matching BEHAVE), not a typed UUID column. ``identity_ref``
is str | None — written by the future attribution engine, NULL
until then. ``attacker_uuid`` is the one DECNET-side
denormalisation; FK'd to attackers.uuid for cheap AttackerDetail
joins. ``evidence_ref`` is NOT NULL for DECNET emissions even
though the upstream envelope makes it optional — the worker's
"already profiled?" check keys on it. UniqueConstraint(evidence_ref,
primitive) enforces idempotency at the schema level so re-running
the extractor on the same shard+sid produces a DB-side conflict
the upsert path resolves deterministically. Class is named
``ObservationRow`` (not ``Observation``) to avoid colliding with
the BEHAVE Pydantic envelope at sites that import both.

decnet/web/db/sqlmodel_repo/observations.py — ObservationsMixin.
Three public methods backing the canonical queries from
BEHAVE-INTEGRATION.md §"Storage": ``upsert_observation`` (idempotent
on the natural key), ``latest_observation_per_primitive`` (per-
primitive MAX(ts) subquery, portable across SQLite and MySQL — no
DISTINCT ON), ``observations_time_series`` (asc-by-ts). Plus
``has_observations_for_evidence`` for the worker's session-already-
profiled check.

decnet/bus/topics.py — ATTACKER_OBSERVATION_PREFIX = "observation"
constant + ``attacker_observation(primitive)`` builder. Full topic
shape ``attacker.observation.<primitive>`` matches what BEHAVE's
spec.event_adapter.event_topic_for produces upstream. Documentation
+ pattern matching only — bus auth is socket file perms (DEBT-029
§2), not topic-level.

decnet/web/db/repository.py — abstract ``upsert_observation``,
``latest_observation_per_primitive``, ``observations_time_series``
on BaseRepository.

tests/db/test_observations.py — 11 tests covering upsert round-trip,
idempotency under the unique constraint, latest-per-primitive
ordering across multiple sessions, time-series asc-ordering, empty-
attacker contract, every BEHAVE ValueKind round-tripping through
the JSON column, and the has_observations_for_evidence check.

tests/db/test_base_repo.py — DummyRepo gains the three new abstract
overrides so its coverage suite still instantiates.
This commit is contained in:
2026-05-03 07:25:10 -04:00
parent 11f474556c
commit 0972325527
8 changed files with 683 additions and 0 deletions

View File

@@ -17,6 +17,7 @@ Token structure (NATS-style, dot-separated):
attacker.scored
attacker.session.started
attacker.session.ended
attacker.observation.{primitive}
identity.formed
identity.observation.linked
identity.merged
@@ -129,6 +130,19 @@ ATTACKER_SESSION_ENDED = "session.ended"
# returned a verdict). Payload carries the aggregate verdict + per-
# provider summary so SIEM-bound webhooks don't need to re-query the DB.
ATTACKER_INTEL_ENRICHED = "intel.enriched"
# Per-primitive BEHAVE-SHELL observation. Full topic shape:
# attacker.observation.<primitive>
# e.g. ``attacker.observation.motor.input_modality``. Producer:
# ``decnet/profiler/behave_shell/`` (extractor library called from the
# profiler worker on ``attacker.session.ended``); consumers: dashboard
# SSE relay, attribution engine state machine, federation gossip
# (post-v0). See development/BEHAVE-INTEGRATION.md §"Bus topics" for
# the wire-format contract — the prefix is documentation + pattern
# match only; bus auth is socket file perms (DEBT-029 §2), not
# topic-level. The ``primitive`` segment MAY contain dots
# (``motor.shell_mastery.tab_completion``) — the same dotted-leaf
# rule that ``attacker.session.ended`` uses.
ATTACKER_OBSERVATION_PREFIX = "observation"
# Identity-resolution event types (second/third tokens under ``identity``).
# Published by the (future) clusterer worker — see
@@ -366,6 +380,28 @@ def attacker(event_type: str) -> str:
return f"{ATTACKER}.{event_type}"
def attacker_observation(primitive: str) -> str:
"""Build ``attacker.observation.<primitive>``.
*primitive* is the fully-qualified BEHAVE-SHELL primitive path
(e.g. ``motor.input_modality``,
``cognitive.feedback_loop_engagement``,
``motor.shell_mastery.tab_completion``). Dotted primitives are
permitted — this matches the format
``decnet_behave_shell.spec.event_adapter.event_topic_for`` produces
upstream, and DECNET's bus admits the dotted leaf the same way
:func:`attacker` does for ``session.started``.
Empty string is rejected so a downstream typo doesn't ship as
``attacker.observation.``.
"""
if not primitive:
raise ValueError(
"attacker_observation topic requires a non-empty primitive",
)
return f"{ATTACKER}.{ATTACKER_OBSERVATION_PREFIX}.{primitive}"
def campaign(event_type: str) -> str:
"""Build ``campaign.<event_type>``.

View File

@@ -57,6 +57,9 @@ from .attacker_intel import (
from .attachments import (
ObservedAttachment,
)
from .observations import (
ObservationRow,
)
from .campaigns import (
Campaign,
CampaignsResponse,
@@ -250,6 +253,7 @@ __all__ = [
"AttackerIdentity",
"AttackerIntel",
"AttackersResponse",
"ObservationRow",
"ObservedAttachment",
"SessionProfile",
"SmtpTarget",

View File

@@ -0,0 +1,80 @@
"""BEHAVE-SHELL observation rows — generic table holding every
emitted Observation envelope.
Mirrors the BEHAVE-SHELL ``Observation`` Pydantic envelope
(``decnet_behave_core.spec.envelope.Observation``) field-for-field, plus
one DECNET-side denormalisation (``attacker_uuid``) for cheap joins.
The class is named ``ObservationRow`` to avoid colliding with the
BEHAVE Pydantic class when both are imported into the same module —
the Pydantic envelope is the wire format; this is the storage row.
See ``development/BEHAVE-INTEGRATION.md`` §"Storage" for the full
rationale.
Idempotency is enforced at the schema level by the
``UniqueConstraint(evidence_ref, primitive)`` index — re-running the
extractor on the same shard+sid produces a DB-side conflict that the
repo's upsert path resolves deterministically. ``evidence_ref`` is
NOT NULL for DECNET-emitted observations even though the BEHAVE
envelope makes it ``Optional[str]``: the worker's "have we already
profiled this session?" check keys on it, and the shape
``shard:{decky}/{service}/{date}.jsonl#sid`` is mandatory at the
worker layer.
"""
from __future__ import annotations
from typing import Any
from sqlalchemy import JSON, Column, Index, UniqueConstraint
from sqlmodel import Field, SQLModel
class ObservationRow(SQLModel, table=True):
"""One BEHAVE-SHELL observation persisted to ``observations``.
Re-derivable from the upstream session shard; this row is a cache
for cheap dashboard reads, not the source of truth (which is the
asciinema shard on disk + the BEHAVE-SHELL extractor).
Type alignment with BEHAVE: ``id`` is a hex-string UUID (matching
BEHAVE's ``Observation.id: str = Field(default_factory=lambda:
uuid.uuid4().hex)``), not a typed UUID column. ``identity_ref``
is ``str | None``, ditto.
"""
__tablename__ = "observations"
__table_args__ = (
Index(
"ix_observations_attacker_primitive_ts",
"attacker_uuid", "primitive", "ts",
),
Index("ix_observations_primitive_ts", "primitive", "ts"),
UniqueConstraint(
"evidence_ref", "primitive",
name="uq_observations_evidence_primitive",
),
)
# ── envelope fields (types match BEHAVE exactly) ─────────────────────
id: str = Field(primary_key=True)
identity_ref: str | None = Field(default=None)
primitive: str = Field(index=True)
value: dict[str, Any] | str | int | float | bool | list = Field(
sa_column=Column(JSON, nullable=False),
)
confidence: float
window_start_ts: float
window_end_ts: float
source: str
evidence_ref: str = Field(nullable=False)
envelope_v: int
ts: float = Field(index=True)
# ── DECNET-side denormalisation (NOT in BEHAVE envelope) ─────────────
# The envelope identifies the attacker via ``identity_ref`` once
# attribution exists; pre-attribution, observations carry no
# attacker linkage. DECNET resolves the (decky, service, sid, src_ip)
# tuple to ``attacker_uuid`` at write time so AttackerDetail can
# query without joining through the (still-empty)
# ``attacker_identities`` table.
attacker_uuid: str = Field(foreign_key="attackers.uuid", index=True)

View File

@@ -313,6 +313,44 @@ class BaseRepository(ABC):
"""Retrieve the keystroke-dynamics profile row for a session."""
pass
# ─── BEHAVE-SHELL observations ─────────────────────────────────────
# See development/BEHAVE-INTEGRATION.md §"Storage" for the full
# schema rationale. Every observation envelope emitted by the
# BEHAVE-SHELL extractor lands in the ``observations`` table; this
# is the abstract surface the worker calls.
@abstractmethod
async def upsert_observation(self, data: dict[str, Any]) -> str:
"""Insert or update an ``ObservationRow`` keyed on
``(evidence_ref, primitive)``.
``data`` MUST carry the BEHAVE envelope fields (``primitive``,
``value``, ``confidence``, ``window_start_ts``,
``window_end_ts``, ``source``, ``evidence_ref``,
``envelope_v``, ``ts``) plus the DECNET-side ``attacker_uuid``
denorm. Returns the row ``id``.
"""
raise NotImplementedError
@abstractmethod
async def latest_observation_per_primitive(
self, attacker_uuid: str,
) -> dict[str, dict[str, Any]]:
"""Return the latest observation per primitive for one attacker.
Empty dict when the attacker has no observations. Backs the
AttackerDetail "current state" panel.
"""
raise NotImplementedError
@abstractmethod
async def observations_time_series(
self, attacker_uuid: str, primitive: str,
) -> list[dict[str, Any]]:
"""Every observation of ``primitive`` for ``attacker_uuid``,
ordered by ``ts`` ASC. Empty list when none."""
raise NotImplementedError
async def upsert_observed_attachment(
self,
*,

View File

@@ -44,6 +44,7 @@ from decnet.web.db.sqlmodel_repo.deckies import DeckiesMixin
from decnet.web.db.sqlmodel_repo.fleet import FleetMixin
from decnet.web.db.sqlmodel_repo.identities import IdentitiesMixin
from decnet.web.db.sqlmodel_repo.logs import LogsMixin
from decnet.web.db.sqlmodel_repo.observations import ObservationsMixin
from decnet.web.db.sqlmodel_repo.observed_attachments import ObservedAttachmentsMixin
from decnet.web.db.sqlmodel_repo.orchestrator import OrchestratorMixin
from decnet.web.db.sqlmodel_repo.realism import RealismMixin
@@ -66,6 +67,7 @@ class SQLModelRepository(
FleetMixin,
IdentitiesMixin,
LogsMixin,
ObservationsMixin,
ObservedAttachmentsMixin,
OrchestratorMixin,
RealismMixin,

View File

@@ -0,0 +1,187 @@
"""Repo mixin for the ``observations`` table.
Composed onto :class:`SQLModelRepository`. Three public methods:
* :meth:`upsert_observation` — idempotent on
``(evidence_ref, primitive)``. Caller passes the BEHAVE envelope as
a dict plus the DECNET-side ``attacker_uuid`` denorm.
* :meth:`latest_observation_per_primitive` — backs the AttackerDetail
"current state" panel. Implements the canonical query from
``BEHAVE-INTEGRATION.md`` §"Storage".
* :meth:`observations_time_series` — every observation of one
primitive for one attacker, ordered ASC by ``ts``. Backs future
drift charts.
PII discipline is the BEHAVE envelope's job
(``core/decnet_behave_core/spec/envelope.py:3-19``); this mixin does
not validate values — that happens at construction time by the BEHAVE
``Observation`` subclass before the dict reaches us.
"""
from __future__ import annotations
import uuid as _uuid
from typing import Any, Optional
from sqlalchemy import desc, func, select
from sqlmodel import col
from decnet.web.db.models import ObservationRow
from decnet.web.db.sqlmodel_repo._helpers import _MixinBase
class ObservationsMixin(_MixinBase):
"""Mixin: methods composed onto :class:`SQLModelRepository`."""
async def upsert_observation(self, data: dict[str, Any]) -> str:
"""Insert or update an observation row keyed on
``(evidence_ref, primitive)``.
``data`` MUST carry every non-default ``ObservationRow`` field:
``primitive``, ``value``, ``confidence``, ``window_start_ts``,
``window_end_ts``, ``source``, ``evidence_ref``, ``envelope_v``,
``ts``, ``attacker_uuid``. ``id`` is generated if absent.
``identity_ref`` is optional.
Returns the row's ``id``. Idempotent: a second call with the
same ``(evidence_ref, primitive)`` overwrites the prior row's
mutable fields (value, confidence, ts, etc.) without
violating the unique constraint.
"""
evidence_ref = data["evidence_ref"]
primitive = data["primitive"]
async with self._session() as session:
stmt = select(ObservationRow).where(
ObservationRow.evidence_ref == evidence_ref,
ObservationRow.primitive == primitive,
)
existing = (await session.execute(stmt)).scalar_one_or_none()
if existing:
# Mutable fields from the new envelope; ``id`` and the
# natural key stay locked to the existing row.
for k, v in data.items():
if k in ("id", "evidence_ref", "primitive"):
continue
setattr(existing, k, v)
session.add(existing)
row_id = existing.id
else:
row_id = data.get("id") or _uuid.uuid4().hex
row_data = {**data, "id": row_id}
session.add(ObservationRow(**row_data))
await session.commit()
return row_id
async def latest_observation_per_primitive(
self, attacker_uuid: str,
) -> dict[str, dict[str, Any]]:
"""Return the most recent observation per primitive for one
attacker.
Output shape::
{
"motor.input_modality": {"value": "pasted",
"confidence": 0.91,
"ts": 1714521660.456,
"source": "..."},
"cognitive.feedback_loop_engagement": {...},
...
}
Empty dict when the attacker has zero observations.
Implementation uses a per-primitive MAX(ts) subquery; portable
across SQLite + MySQL (no ``DISTINCT ON``).
"""
async with self._session() as session:
# Subquery: per-primitive max(ts) for this attacker.
max_ts_subq = (
select(
col(ObservationRow.primitive).label("primitive"),
func.max(col(ObservationRow.ts)).label("max_ts"),
)
.where(ObservationRow.attacker_uuid == attacker_uuid)
.group_by(col(ObservationRow.primitive))
.subquery()
)
stmt = (
select(ObservationRow)
.join(
max_ts_subq,
(ObservationRow.primitive == max_ts_subq.c.primitive)
& (ObservationRow.ts == max_ts_subq.c.max_ts),
)
.where(ObservationRow.attacker_uuid == attacker_uuid)
.order_by(ObservationRow.primitive)
)
rows = (await session.execute(stmt)).scalars().all()
return {
row.primitive: {
"value": row.value,
"confidence": row.confidence,
"ts": row.ts,
"source": row.source,
}
for row in rows
}
async def observations_time_series(
self, attacker_uuid: str, primitive: str,
) -> list[dict[str, Any]]:
"""Return every observation of ``primitive`` for ``attacker_uuid``,
ordered by ``ts`` ASC.
Empty list when no rows match.
"""
async with self._session() as session:
stmt = (
select(ObservationRow)
.where(
ObservationRow.attacker_uuid == attacker_uuid,
ObservationRow.primitive == primitive,
)
.order_by(ObservationRow.ts)
)
rows = (await session.execute(stmt)).scalars().all()
return [
{
"ts": row.ts,
"value": row.value,
"confidence": row.confidence,
}
for row in rows
]
async def get_observation_by_id(
self, row_id: str,
) -> Optional[dict[str, Any]]:
"""Single ``ObservationRow`` lookup by ``id``. Used by tests
and a future "fetch one event" surface; not exposed on the
public API today."""
async with self._session() as session:
stmt = select(ObservationRow).where(ObservationRow.id == row_id)
row = (await session.execute(stmt)).scalar_one_or_none()
if not row:
return None
return row.model_dump(mode="json")
async def has_observations_for_evidence(
self, evidence_ref: str,
) -> bool:
"""True iff any observation row carries this ``evidence_ref``.
Worker uses this as the "have we already profiled this session?"
check before kicking the extractor — equivalent to "is this
``(decky, service, sid)`` already in the table?"
"""
async with self._session() as session:
stmt = (
select(col(ObservationRow.id))
.where(ObservationRow.evidence_ref == evidence_ref)
.limit(1)
)
return (await session.execute(stmt)).scalar_one_or_none() is not None
# Order desc(ts) reserved as the most-recent-first listing if a
# paginated UI surface lands later. Not exposed today; named here
# so a future grep finds the canonical desc-ts pattern.
_LATEST_FIRST = staticmethod(desc)