Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.
Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.
- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
(shebang- and PEP 263-aware)
Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
- Add "ipv6_leak" to KNOWN_SOURCE_KINDS in ttp/base.py
- Register Ipv6LeakLifter(store) in factory.py get_tagger()
- Subscribe worker to attacker.fingerprinted; route by Event.type
so JARM/HASSH/ipv6_leak share the topic without source_kind collision
- Add bump_attacker_ipv6_leak() to BaseRepository (abstract) +
TTPMixin (implementation): increments ipv6_leak_count, sets last_ipv6_*
denorm fields, appends-with-dedup to AttackerIdentity.ipv6_link_local_iids
- Call bump_attacker_ipv6_leak from _process_event after insert_tags
- Add DummyRepo stub + coverage call in tests/db/test_base_repo.py
Ipv6LeakLifter subscribes to source_kind="ipv6_leak" events from both
the passive sniffer and active prober. Emits T1090 (Proxy) under TA0011
(C2) when fe80:: source address is observed — the attacker's VPN only
tunnels IPv4 so their link-local IID leaks their NIC identity.
Rule R0059 sets base confidence 0.85; iid_kind in the evidence carries
the per-observation strength (eui64 = MAC-derived, deterministic;
stable_privacy = RFC 7217; temporary = RFC 4941).
- test_evidence_shape.py: replace broken (command, BehavioralLifter)
pairing with correct (http_fingerprint, HttpFingerprintLifter) case;
expand _LIFTER_CASES to 5-tuples with per-lifter payloads and rule
factories; wire StubRuleStore + _index.install() per lifter; remove
xfail marker — all 4 parametrized cases now pass
- factory.py: add _span() helper gated on _telemetry._ENABLED; wrap
each per-lifter dispatch in _tag_one() that opens a
ttp.lifter.{name} child span per call
- http_fingerprint_lifter.py: add missing name = "http_fingerprint"
- test_tracing.py: replace pytest.fail() stubs in
test_lifter_child_spans_emitted and test_no_pii_canary_in_span_attributes
with real test bodies; remove xfail markers
On every attacker.session.ended event, the TTP worker now reads the
persisted AttackerIntel row (if any) and synthesizes an intel-source
TaggerEvent so intel-derived tags emit even when attacker.intel.enriched
was dropped or arrived before the worker started.
Key changes:
- AttackerIntel.to_intel_event_payload() — single source of truth for
the intel-row → lifter payload projection; shared by future callers
without importing decnet.intel.* (no-SPOF contract preserved).
- BaseRepository.get_attacker_intel_row_by_uuid() — returns the live
SQLModel instance so the catch-up path can call to_intel_event_payload().
- _build_intel_catchup_event() in ttp/worker.py — looks up the intel row,
builds the TaggerEvent, returns None on absent row (silence, not error).
- _process_event() extended: appends the catch-up event to tagger_events
when topic contains "session.ended". Deterministic source_id keeps
compute_tag_uuid idempotent across replays; INSERT OR IGNORE deduplicates
against any prior attacker.intel.enriched path.
DummyRepo stub + coverage call added per feedback_run_base_repo_test.md.
- TolerantTagger.tag validates evidence keys against EVIDENCE_SCHEMA TypedDicts;
TypeError (programmer error) propagates instead of being swallowed
- IntelEvidence and EmailEvidence expanded from stubs to full per-provider
key sets (total=False); IntelEvidence old stub fields replaced wholesale
- EVIDENCE_SCHEMA map added to models/ttp.py and imported by base.py
- TTPTag __table_args__ gains confidence [0,1] CheckConstraint (DB-enforced)
- xfail removed from test_confidence_outside_range_rejected_at_insert and
test_evidence_shape_violation_propagates_as_typeerror — both now pass
- TypeError removed from _SWALLOWED_EXCS fuzz list; test_intel_evidence_keys
updated to assert the real provider key set
Four-part fix for the collection bottleneck that was blocking the dev loop:
1. Lazy mitreattack.stix20 import in attack_stix.py — deferred to first
_load() call (TYPE_CHECKING guard at top level)
2. Lazy misp_stix_converter import in both MISP export routers — moved
from module level into the route handler body
3. Lazy attack_catalog / attack_stix in ttp.py repo mixin — thin wrapper
functions so the import chain never fires at module load time
4. tests/api/conftest.py — `from decnet.web.api import app` moved inside
the `client()` fixture; `pytest_ignore_collect` broadened to skip all
test_schemathesis*.py variants (not just test_schemathesis.py), which
were launching a subprocess server at module-import time
5. pyproject.toml — `norecursedirs` for tests/live, tests/stress,
tests/service_testing, tests/docker, tests/perf so these directories
are never entered; `-m` filter removed from addopts (now redundant);
`--dist loadscope` → `--dist load` to unblock workers immediately
6. behave_core / behave_shell rename — BEHAVE packages dropped the
`decnet_` prefix; reinstalled editable installs and updated all 14
import sites across profiler, ttp, bus, and correlation modules
Remaining files from the fingerprint-bounties + characterizes-SRO commit:
misp_export, repository, bounties mixin, all 4 router endpoints, and test suite
updates. Prerequisite: previous commit added _extract_fingerprint_bounty_data
and the stix_export changes.
Wire fingerprint bounties (JARM hashes, HTTP header quirks) from the bounties
table into the DecnetActorFingerprintExt.protocol_fingerprints group so the
sniffer/profiler-captured HTTP fingerprinting data surfaces in every STIX export.
Add a stix2.Relationship(relationship_type="characterizes") SRO linking each
x-decnet-behave-profile SDO back to its ThreatActor so graph-traversal tools
can follow the edge without relying on the bare x_decnet_behave_profile_ref
custom string property alone.
New repo surface:
- get_fingerprint_bounties_by_ip(ip) -> list[dict]
- get_all_fingerprint_bounties_for_export() -> dict[str, list[dict]]
All 4 export endpoints (per-attacker + fleet, STIX + MISP) extended with the
new gather slot. 50/50 tests green, mypy clean.
Adds GET /api/v1/attackers/{uuid}/export/misp and
GET /api/v1/attackers/export/misp backed by misp_export.py, which
converts existing STIX bundles to MISP events via misp-stix
ExternalSTIX2toMISPParser. Fleet endpoint emits {response:[...]}
collection (one event per attacker). Frontend: STIX/MISP buttons on
AttackerDetail header and Attackers list. 13 new tests green.
GET /api/v1/attackers/{uuid}/export/stix returns a self-contained STIX
2.1 bundle: ip observation, threat-actor, ATT&CK attack-patterns with
canonical MITRE IDs, uses relationships, per-tag sightings, file SCOs
for artifacts, domain-name SCOs for SMTP targets, and a provider intel
note. Attack-pattern SDOs carry the MITRE bundle IDs so consumers
deduplicating against the public ATT&CK bundle get exact matches.
Phase 2 attached mitre_url to intel-emitted tags' evidence JSON;
Phase 3 promotes it to a real column populated for *every* tag —
intel, credential, behavioral, canary, identity, email, rule-engine —
from one source. Pre-v1, so the SQLModel field is added directly
without an Alembic migration.
- TTPTag gains mitre_url: Optional[str] (not indexed — derived
deeplink, not a query target; technique_id is already indexed).
- _emit.py and rule_engine._evaluate_rules both populate mitre_url
via attack_stix.mitre_url_for(sub_technique_id or technique_id).
Sub-technique URL when present, else parent. The two construction
sites stay separate because the rule_engine path carries per-emit
span instrumentation that emit_tags() can't preserve without
threading a span object through; minimal-change beats forced
refactor here.
- intel_lifter strips mitre_url from evidence_extra in all four
decision functions. The column is canonical now; duplicating in
the JSON column would drift when the bundle moves. The unused
TechniqueEmission import + tracking dicts removed too.
- IdentityTechniqueRow / TechniqueRollupRow / TTPTagDetailRow /
CampaignTechniqueRow gain mitre_url: Optional[str].
- sqlmodel_repo/ttp.py:_mitre_url_for added; the 5 row-builder sites
pass mitre_url=_mitre_url_for(sub_technique_id or technique_id)
alongside the existing technique_name resolution.
- api_get_tag_details.py needs no change — list_tags_by_scope_and
_technique already returns model_dump() rows that flow the new
column through **row spread to TTPTagDetailRow.
- tests/ttp/test_emit_attaches_mitre_url.py covers both construction
paths (top-level, sub-tech, unknown, multi-emit) and a regression
test that intel_lifter evidence dicts no longer contain mitre_url.
Two reusable bundle-derived lookups that the next two commits build
on:
- mitre_url_for(tid) returns the canonical attack.mitre.org URL by
reading external_references on the cached attack-pattern. Backed
by the existing lru-cached _attack_pattern_by_id so per-call cost
is constant. Handles top-level techniques and sub-techniques
(T1059.004 -> .../techniques/T1059/004).
- GroupRef + groups_using_technique(tid) surface the intrusion-set
reverse index from the loaded bundle: given a technique, return
the MITRE-tracked groups documented as using it. Sorted by
group_id for deterministic responses; lru-cached. Sub-technique
semantics match ATT&CK Navigator (do NOT auto-union with parent).
- decnet/ttp/data/intel_loader._mitre_url_for collapses to a thin
re-export of attack_stix.mitre_url_for; the loader keeps mitre_url
on TechniqueEmission for the eventual STIX export.
- tests/ttp/test_attack_url.py covers both helpers: top-level + sub
URLs, unknown -> None / (), GroupRef immutability + hashability,
deterministic ordering, sub-technique distinct from parent.
The four provider→technique tables (AbuseIPDB cat→techniques,
GreyNoise tag→techniques, ThreatFox threat_type→techniques, plus
the Feodo binary-listed signal) used to live as Final[dict] constants
in intel_lifter.py. Two real problems with that:
1. Drift between rules/ttp/R0054.yaml..R0058.yaml (which declare
the full slate per provider) and the Python dicts (which decide
which slate-member fires per signal). The v2 audit comment in
intel_lifter.py documented that they had silently drifted.
2. No ATT&CK provenance on emissions — the loaded STIX bundle has
rich external_references (canonical attack.mitre.org URLs) that
never surfaced because the lifter had no path back to them.
Mappings now live as YAML at decnet/ttp/data/intel/{provider}.yaml,
validated at load against the loaded ATT&CK bundle, with each entry
enriched by attack_stix._attack_pattern_by_id to attach the canonical
MITRE URL to every emission.
- decnet/ttp/data/intel_loader.py: pydantic-validated schema +
ProviderMapping/Signal/TechniqueEmission frozen dataclasses +
load_provider_mapping(provider) lru-cached.
- Per-technique high_score_threshold inlined into YAML
(collapses the separate _ABUSEIPDB_HIGH_SCORE_GATED dict).
- external_reference field follows the STIX 2.1 external-reference
shape (source_name + url + optional external_id) so the future
STIX/MISP exporter is a direct translation.
- intel_lifter.py: dicts deleted, decision functions read from
ProviderMapping accessors. Decision-flow constants (T1071/T1595
bare-classification fallbacks in _greynoise_decisions) stay in
code — they're not table rows.
- Each emit slot's evidence_extra now carries mitre_url for any
technique resolved in the bundle (every one in practice).
- tests/ttp/test_intel_mappings.py: snapshot equivalence vs the
legacy dicts, high-score gate behavior, every-signal-has-an-
external-reference, every-emission-has-a-mitre-url, negative
paths (unknown technique_id raises AttackBundleError, mismatched
provider field rejected, dir listing matches expected providers).
The YAML schema + mitre_url enrichment lays groundwork for the
future STIX exporter; this commit does NOT build that exporter.
MITRE's ATT&CK Terms of Use require reproducing their copyright +
license alongside any cached copy of ATT&CK data. Today we ship the
bundle but not the license — this commit closes that compliance gap.
- attack_version.py pins ATTACK_LICENSE_URL +
ATTACK_LICENSE_SHA256 + ATTACK_LICENSE_FILENAME, sourced from the
same attack-stix-data repo as the bundle.
- attack_stix.py:_fetch_license downloads LICENSE.txt next to the
bundle. License sha mismatch is logged + refreshed (license text
gets occasional formatting tweaks; not a security event), unlike
the bundle which stays fail-closed.
- _ensure_license is the compliance ratchet: resolve_bundle_path
refuses to return without LICENSE.txt on disk. Override-mode
(DECNET_ATTACK_BUNDLE) checks for a sibling LICENSE.txt first,
then DECNET_ATTACK_LICENSE, then the cache dir.
- python -m decnet.ttp.attack_stix license prints the cached license
to stdout for operator audit.
- loaded_license_path() exposes the active license path read-only.
- tests/ttp/test_attack_license.py covers happy paths (sibling +
explicit env), refusal when DECNET_ATTACK_LICENSE points at a
missing file, the CLI subcommand, and the pinned-sha shape.
Drift between the technique/tactic IDs hardcoded in the lifters and
what the loaded ATT&CK STIX bundle actually contains is silent in the
status quo: a renamed-or-retired technique just stops being tagged.
Every emission point now has an explicit validator that asserts its
IDs resolve in the loaded bundle, called once at TTP-worker boot.
- intel_lifter.all_emitted_technique_ids() collects every technique
the four provider tables (AbuseIPDB / GreyNoise / Feodo / ThreatFox)
plus the decision-flow constants in _greynoise_decisions and
_feodo_decisions can emit. validate_against_attack_bundle() runs it
through attack_stix.assert_known_technique_ids().
- ukc.validate_against_attack_bundle() asserts every key in
ATTACK_TACTIC_TO_UKC resolves, with TA0100..TA0106 documented as
_NON_ENTERPRISE_TACTICS (lives in the ICS bundle, not the
enterprise bundle DECNET loads).
- decnet/ttp/worker.py:run_ttp_worker_loop calls both validators
before subscribing to the bus. A bundle-vs-code mismatch refuses
to start the worker rather than silently mistagging events.
- tests/ttp/test_attack_bundle_validation.py covers the happy path
for both validators, the negative path (injected bogus tactic ID
raises AttackBundleError), the ICS exemption, and the lone T1078
reference in credential_lifter.
Replace the hand-maintained TECHNIQUE_NAMES dict (pinned to v15.1) with
a runtime loader that reads the official enterprise-attack-N.json STIX
bundle. Version bumps now require only updating attack_version.py;
sub-technique parents, tactic IDs, and kill-chain phases all come from
MITRE's published data.
- decnet/ttp/attack_version.py pins version 19.0 + sha256 + URL
- decnet/ttp/attack_stix.py is the lazy STIX loader. Resolution order:
DECNET_ATTACK_BUNDLE env -> ~/.cache/decnet/attack/ -> fetch from
the pinned MITRE GitHub URL. SHA-256 verified before parse;
mismatch fails closed.
- decnet/ttp/attack_catalog.py collapses to a shim re-exporting
technique_name() so the ~9 router/repo call sites don't churn.
- python -m decnet.ttp.attack_stix fetch warms the cache and can
print sha256 for version-bump workflows.
- test_attack_catalog.py now asserts every rule-emitted ID resolves
in the loaded bundle (same contract, real source) and exercises
the SHA-256-mismatch fail-closed path.
R0047 (BEC) and the encoded-payload predicate substring-match against
the email body. Shipping raw body text on the abstracted service bus
is the wrong privacy stance — the bus transport may swap from UNIX
socket to networked at any time, and "loopback today" is not a license
to put PII on the wire.
EmailLifter now opens the .eml lazily from
/var/lib/decnet/artifacts/{decky_id}/smtp/{stored_as} when a body-aware
predicate runs and parses the body in-process via stdlib email +
policy.default. The decoded body is memoized into the payload dict so
multiple body-aware predicates on the same event open the file once.
Bus envelope only carries the artifact pointer (decky_id + stored_as);
raw body bytes never cross the host disk boundary on the agent → master
hop. Filesystem access on agents is unblocked by DEBT-035 (setgid +
group-readable artifacts root, paid 2026-05-02).
The legacy inline body_text path is preserved — when the producer ships
body_text on the bus the helper short-circuits without opening the file.
A previous agent (and several of my own commits) wrote to a top-level
DEBT.md without seeing the existing development/DEBT.md — the
canonical register since DEBT-001. Resulted in two parallel files,
inconsistent numbering schemes, and references that resolved to the
wrong place.
Migrate the six entries that landed in the rogue file into the
canonical register as DEBT-044 through DEBT-049, preserving their
status (resolved / partial / open) and cross-references. The
TTP_TAGGING.md references to "DEBT.md" already resolve to
development/DEBT.md by virtue of being in the same directory; only
the comment in decnet/ttp/impl/intel_lifter.py needed disambiguation
to "development/DEBT.md DEBT-048".
* DEBT-044 — `attacker.email.received` producer wiring (✅ RESOLVED 2026-05-02)
* DEBT-045 — EmailLifter heavyweight feature extraction (PARTIAL PAID 2026-05-02)
* DEBT-046 — EmailLifter mal-hash feed integration (open)
* DEBT-047 — EmailLifter R0047 BEC unblock (open, gated on DEBT-035)
* DEBT-048 — TTP intel provider mapping review (recurring quarterly)
* DEBT-049 — TTP Sigma adapter — post-v1 (open)
Summary table extended; "Remaining open" line updated; root file
removed. The DEBT-047 entry now explicitly cross-references DEBT-035
as the gating dependency for the R0047 BEC unblock.
Three bug classes uncovered by the 2026-05-02 ship-time audit:
* AbuseIPDB code/name mismatch in v1: cat 10 was treated as DDoS (it's
Web Spam — DDoS is cat 4, intentionally unmapped per A.10) and cat 17
as VPN IP (it's Spoofing — VPN IP is cat 13). Both typos mirrored in
code AND the design doc Appendix A.10. Code now matches the AbuseIPDB
taxonomy exactly; cat 17 retargets to T1566 (email-spoofing as a
phishing precursor), and cats 7 (Phishing) and 16 (SQL Injection)
pick up T1566 / T1190 emissions that v1 didn't cover.
* ThreatFox dispatch keyed on `ioc_type` in v1, but `ioc_type` is the
indicator format (url / domain / hash variants) and carries no ATT&CK
signal. The canonical taxonomy field per ThreatFox's API is
`threat_type` (botnet_cc / payload_delivery / payload / cc_skimming).
Repoint dispatch through the new `threatfox_threat_types` payload
field; `ioc_type` rides as evidence only. Also adds the missing
cc_skimming -> T1056 (Input Capture) mapping and registers T1056 in
attack_catalog.py.
* GreyNoise bare-malicious lane: a `classification == "malicious"` row
with no recognised tag used to emit nothing. Now lights T1071 at a
half multiplier, suppressed when a tag already fires T1071 to avoid
double-stamping at conflicting confidence levels.
The inspector was dumping the whole `CMD uid=0 user=root src=… pwd=…
cmd=nmap -p- 192.168.1.0/24` syslog body into a single ``command_text``
blob. ANTI: "I'd like to separate the fields." Done — three layers
work together:
1. Collector session aggregator: new `_parse_cmd_msg` splits the bash
PROMPT_COMMAND msg into `{uid, user, src, pwd, command}`. The
session-ended envelope's per-command dict now carries the
structured fields, with `command_text` set to just the cmd= value
(preserving embedded whitespace — `nmap -p- 1.2.3.0/24` etc.).
2. Rule engine: per-source_kind auxiliary evidence list
(`_AUX_EVIDENCE_FIELDS`). For `command` events the engine
automatically promotes uid/user/src/pwd into the persisted
`evidence` dict on top of the rule's explicit `evidence_fields`.
Engine-controlled, not per-rule — adding a new aux field is one
line here, not a 30-rule YAML sweep, and rule authors can't
accidentally drop it.
3. TTPInspector frontend: evidence renders as a structured
`kvs` grid (UID / USER / SRC / PWD / CMD rows) instead of
pretty-printed JSON. Primary-order list keeps shell fields at
the top; everything else falls below alphabetically so unfamiliar
evidence shapes still surface predictably.
Tests:
- session_aggregator pins the structured-fields emit (uid/user/src/
pwd/command_text without "CMD" prefix, embedded whitespace
preserved).
- rule_engine_tagger pins the aux-field auto-promotion + the
no-`None`-leakage path when payload doesn't carry an aux key.
"T1595" alone is opaque; "T1595 — Active Scanning" tells you the
story at a glance. The names come from a backend-side static catalogue
pinned to the same ATT&CK release as the rule engine
(_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels,
not author-supplied strings on rules, so a rule author can't typo a
name and the entire fleet sees the typo.
- New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering
every technique_id + sub_technique_id emitted by `rules/ttp/`
(R0001..R0058 → 69 IDs in the v0 pack).
- `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow`
/ `TTPTagDetailRow` gain optional `technique_name` /
`sub_technique_name` fields. Repo + router populate them from the
catalogue at row-construction time. None when an ID isn't in the
catalogue — UI falls back to the bare ID.
- Coverage test (`tests/ttp/test_attack_catalog.py`) walks every
YAML rule and asserts every emitted ID has a catalogue entry, so
a future rule author who forgets to update the catalogue gets a
loud failure rather than a silent UI fallback.
Frontend:
- `TTPsObservedSection` shows "T1595.002 — Active Scanning:
Vulnerability Scanning" instead of just the ID, with overflow
ellipsis + tooltip for narrow viewports. Inspector header /
TECHNIQUE row also surface the names.
The collector's `attacker.session.ended` envelope carries
`attacker_uuid: null` and `attacker_ip: <ip>` because the collector
doesn't talk to the DB. The TTP worker passed that null straight
through, and `TTPTag.__init__` raised the documented invariant:
ValueError: ttp_tag requires at least one of attacker_uuid /
identity_uuid; both NULL is not a valid anchor.
The worker now resolves `attacker_uuid` from `attacker_ip` via
`BaseRepository.get_attacker_uuid_by_ip` before fanning out the
event. When the IP isn't in the DB yet (profiler hasn't ingested
the row), the event is dropped with one log line — better than
exploding mid-tag.
- New `get_attacker_uuid_by_ip(ip) -> str | None` on the repo
(BaseRepository abstract + AttackersCoreMixin impl).
- `_resolve_attacker_uuid` helper in `decnet/ttp/worker.py` runs
before `_build_events`. Short-circuits when the payload already
has either anchor; drops the event when neither anchor is
resolvable.
- Tests pin: short-circuit on existing uuid/identity, repo lookup,
drop on unknown IP, drop on "Unknown" sentinel, drop on
no-anchor payload, drop on repo failure.
The endpoint was a contract-phase stub returning `[]` even though the
RuleStore loaded all 58 YAML rules at worker startup. UI saw an empty
table; operators couldn't tell whether anything was wired up.
- `api_list_rules` now calls `get_rule_store().load_compiled()` and
serializes each CompiledRule + its operational state into a
RuleCatalogueRow. Sorted by rule_id for stable golden snapshots.
- Add `description: str` to RuleSchema (pydantic) and CompiledRule
(NamedTuple, defaulted) + propagate through `_compile_one` so the
catalogue surfaces the human-readable YAML description, not just
the slug-style `name`.
- Update `tests/ttp/test_rule_engine.py` _fields assertion for the
new column; new `tests/api/ttp/test_rules_catalogue.py` pins the
catalogue contents (R0001/R0014 presence, row shape, sort order).
Worker behaviour is unchanged: it was already loading rules
correctly. This is purely a read-side wiring fix on the operator API.
The canonical rule-based engine from §"Tagging engines, layered §1"
of TTP_TAGGING.md was fully implemented but never instantiated as a
composite child — pure pattern rules (R0014/R0017/R0023/... 23 rules
total) had no tagger to dispatch them.
- Add `RuleEngineTagger(Tagger)` adapter in rule_engine.py wrapping
`RuleEngine.evaluate()`. `HANDLES = {command, http_request,
auth_attempt, payload}` — the source kinds whose rules typically
live outside any per-source lifter.
- Adapter's `watch_store()` filters via `_is_engine_owned` so the
engine's dispatch index excludes lifter-claimed rules
(`match.kind: lifter:*`) and stays disjoint from per-lifter ownership.
- Prepend `RuleEngineTagger` to the `CompositeTagger` lifter list so
generic pattern rules dispatch before per-source cross-event logic.
- Composes with E.3.18a (worker hydrates `watch_store`) and E.3.18b
(worker fans session payloads into per-`command` events) — together
these three commits make R0001–R0030 actually fire at runtime.
R0001–R0030 declare `applies_to: [command]` and match per command, not
per session. The worker now translates one `attacker.session.ended`
payload carrying a `commands: list` into:
- one source_kind="session" event (behavioral / cross-event lifters)
- one source_kind="command" event per command (RuleEngineTagger)
Both string and dict command shapes are accepted; dicts contribute
their `id` / `uuid` / `command_id` as the per-command source_id so
the deterministic `compute_tag_uuid` keeps replays idempotent. Tags
from session + per-command dispatch are aggregated into a single
`ttp.tagged` envelope per upstream session.
Each per-source lifter holds its own RuleIndex and exposes an
`async watch_store()` that loads the corpus and drains store change
events forever. Until this commit nothing called `watch_store()` in
production — every dispatch index stayed empty and no rule fired.
- Add `WatchableTagger` runtime-checkable Protocol in `decnet.ttp.base`.
- `CompositeTagger.iter_watchables()` yields lifters that satisfy it.
- `run_ttp_worker_loop` fans out one task per watchable, cancelled
and awaited alongside pump/heartbeat/control in the existing finally.
- Watch failures log and exit the watch task without taking the
worker down — mirrors the pump-task tolerance contract.
Inner loop drains a per-process asyncio.Queue populated by one pump
task per topic in _TOPICS, dispatches each event through
CompositeTagger, persists via repo.insert_tags(), and publishes
ttp.tagged + per-technique ttp.rule.fired.<id> only when the insert
returned a non-zero rowcount.
CompositeTagger seeded with all six lifters (Behavioral, Intel,
CanaryFingerprint, Email, Identity, Credential).
Loop-prevention invariant from TTP_TAGGING.md §"Bus topics" enforced:
N replays of the same upstream event publish exactly one ttp.tagged
event. test_worker_bus covers both the direct invocation path and
the idempotency replay path.
Intel catch-up via attacker.session.ended is intentionally deferred
to E.3.14b — needs a session→intel join the repo doesn't expose yet.
IdentityLifter owns lifter:identity_* — currently R0003 (password
spraying). CredentialLifter owns lifter:credential_* — R0001 generic
auth brute, R0002 password guessing, R0004 credential reuse, R0005
valid-account use, R0006 default credentials.
YAMLs R0001/R0002/R0003/R0005/R0006 had their match.kind normalised
to fit the lifter prefix scheme — the design doc's promised "YAMLs
normalised in a separate refactor commit" lands here.
Identity-rollup tags null out attacker_uuid on emit so the worked-
example invariant holds (the tag belongs to the Identity, never to
one member IP).
Tests: test_identity_lifter.py + test_credential_lifter.py cover
each predicate's positive/negative path, state modulation
(disabled/clipped/expired), source-kind gating, and idempotent
replay. test_lifter_absence and test_lifters updated for the new
ctor signature.
SMTP message-level technique tagger per Appendix A.6: open relay abuse
(rcpt_count + foreign From), mass phishing (rcpt_count + body simhash),
phishing-kit X-Mailer, IDN/punycode URL, sender masquerade composite
(From/Return-Path/DKIM/SPF), malicious attachment (macro/.lnk/.iso/.img/
hash match), BEC subject+body composite, encoded payload in body.
PII discipline (TTP_TAGGING.md §'Hard parts §6') is enforced at the
lifter layer via _filter_evidence(): emitted TTPTag.evidence is
restricted to the EmailEvidence-allowed allowlist (body_sha256,
matched_headers — names only, rcpt_domain_set — domains only,
attachment_sha256s, rcpt_count) plus PII-safe match discriminators
(matched_kit, matched_trigger, matched_url_host, etc). Raw addresses,
raw body bytes, full URLs, and decoded base64 previews NEVER appear in
evidence — defense-in-depth over the YAML evidence_fields hint.
Tests: tests/ttp/test_email_lifter.py per-rule positive + negative +
PII allowlist guard + state modulation. tests/ttp/rule_precision/
test_email_rules.py xfail flipped to real precision (R0041-R0048
H-band ≥95%). Corpus rows updated to acknowledge that R0045 (masquerade)
co-fires with R0041 / R0047 when the sender-masquerade signals are
present alongside open-relay or BEC patterns — overlap is by design,
not a precision bug.
Browser-payload derivations per Appendix A.9: navigator.webdriver flag,
canvas/audio/WebGL automation hash matches (Puppeteer/Playwright/
Selenium/curl-impersonate), WebRTC IP leak, TZ/language vs source-IP
geo mismatch, navigator.platform vs userAgent vs WebGL renderer
inconsistency.
Evidence shape pinned to CanaryFingerprintEvidence (metric +
matched_signature) — raw fingerprint blobs (canvas hashes, full UAs,
navigator.platform values) explicitly NOT carried into TTPTag.evidence
per TTP_TAGGING.md §'Hard parts §7' (enrichment vs tag boundary). The
identity-merge guard rail is preserved: composite fp.id matches across
IPs are NOT a TTP, so no rule fires on the bare hash.
Tests: tests/ttp/test_canary_fingerprint_lifter.py per-rule positive +
negative + evidence-shape guard + state modulation.
tests/ttp/rule_precision/test_canary_rules.py xfail flipped to real
precision (R0049/R0050/R0051/R0053 H-band ≥95%; R0052 M-band ≥80%).
Per-provider verdict translator for AbuseIPDB, GreyNoise, Feodo Tracker,
and ThreatFox per Appendix A.10. Each rule's predicate inspects payload
fields produced by the enrich worker (no DB I/O, no decnet.intel.*
imports — E.2.7 decoupling guard preserved). AbuseIPDB confidence is
scaled by abuse_confidence_score / 100; categories drive per-technique
fan-out. R0058 aggregate-bump is a no-op in v0 (cross-tag bump deferred
to E.3.14 worker bootstrap).
Per-provider null tolerance is the steady state — a missing provider
column produces zero tags from that rule, never an error.
Tests:
- tests/ttp/test_intel_lifter.py — per-provider positive + negative +
state modulation + decoupling source-import guard.
- tests/ttp/rule_precision/test_intel_rules.py — xfail flipped, real
precision driven over seed_intel.jsonl (R0054-R0057 H-band ≥95%;
R0058 skipped as bump-only).
- tests/ttp/test_lifter_absence.py — IntelLifter all-populated test
flipped from xfail-strict to real assertion with realistic payload.
- tests/ttp/test_lifters.py — partial-null xfail flipped to real
assertion.
Reads pre-shaped session aggregates from TaggerEvent.payload and emits
techniques per Appendix A behavior tables. Per-rule predicates dispatch
on match.kind (lifter:behavioral_<name>); the lifter holds its own
RuleIndex watching the same RuleStore as the engine, so disable / clip /
TTL state reaches lifter-bound rules through the same atomic-swap path.
R0032/R0036/R0037/R0040 YAMLs had over-escaped regex strings (\\
instead of \\) — fixed in place.
Factory wired so default get_tagger() returns CompositeTagger with
BehavioralLifter shipped; remaining three lifters (E.3.10-E.3.12) land
in subsequent commits.
E.2.6 contract preserved via TolerantTagger: empty payload steady-state
yields [] with zero ERROR records. Disabled / clipped / expired state
verified.
E.3.9.0 prerequisite for the per-source lifters (E.3.9-E.3.13). The
dispatch index, install/evict/apply_change atomic-swap protocol, and
state-modulation helpers (is_active / apply_ceiling) move out of
rule_engine.py into _rule_index.py and _state.py. RuleEngine wraps a
RuleIndex; back-compat shims preserve _by_kind / _by_rule / _install
attribute access for tests poking at the dispatch internals.
Lifters in E.3.9-E.3.12 will each hold their own RuleIndex, watching
the same RuleStore via subscribe_changes() fan-out. Hot-reload
semantics (disable / clip / TTL via set_state API) now reach
lifter-bound rules through the same atomic-swap path the engine uses,
not a future composite-rebuild compromise.
Implements the rule engine body left empty at contract phase: evaluate()
dispatches by source_kind through self._by_kind, runs the rule's match
spec against event.payload, and emits one TTPTag per emits entry.
watch_store() loads the initial corpus from RuleStore.load_compiled,
then drains subscribe_changes, applying definition changes via
single-statement dict assignment (atomic swap, GIL-atomic to readers)
and state changes via NamedTuple._replace on the existing CompiledRule.
Why: with the FS + DB stores in place (E.3.5/E.3.6), the engine is the
last piece of the rule plane. Lifters (E.3.9–E.3.13) consume the
engine; the worker bootstrap (E.3.14) wires watch_store into the
asyncio event loop. After this commit a CompositeTagger constructed
with a RuleEngine + a populated rules dir will produce real tags.
Notes:
- CompiledRule.emits extended to 4-tuple
(technique_id, sub_technique_id, tactic, confidence). Tactic + confidence
ride per-emit so a single rule can carry multiple precision targets
(the "one event maps to many techniques" property). Compile helpers in
both backends extract them from the YAML emits dict; missing tactic
or confidence is a deploy-time error.
- v0 match operator is "pattern" (regex). The field defaults per
source_kind (command_text / raw_url / subject / verdict / …) and is
overridable via match.field. Future ops (contains, equals, in_set)
extend _match_event without touching the engine surface.
- Confidence model: rules with state="clipped" + confidence_max set
cap the per-emit confidence downward; clipped is a soft suppress, not
a hard skip. Disabled rules are skipped wholly; expires_at past is
re-checked at evaluate as defense-in-depth (the store auto-reverts,
but a racing read between expiry and revert must not fire the rule).
- _span(name, **attrs) helper in engine + both stores short-circuits on
decnet.telemetry._ENABLED — matches the project's @traced /
wrap_repository zero-overhead-when-disabled pattern instead of relying
solely on the no-op tracer indirection.
- Late-bound tracer (telemetry.get_tracer called per-span, not at
module load) so test_tracing's monkeypatch reaches the production
code path.
xfails flipped: tests/ttp/test_rule_engine.py multi-emit fan-out +
rule_version-collision-via-engine; tests/ttp/test_multi_mapping.py
N×M engine fan-out + idempotent replay; tests/ttp/test_tracing.py
ttp.eval span hierarchy + ttp.rule.fire span attributes.
Tests: 214 passed, 19 xfailed (gated on E.3.8 lifters / rule pack /
worker bootstrap).
mypy: clean on prod code; pre-existing test-stub arg-type warnings
unchanged.
Implements the DB-backed rule store body left empty at contract phase:
load_compiled reads from ttp_rule + ttp_rule_state; get_state /
set_state hit ttp_rule_state with the same expires_at auto-revert and
bus-event semantics as the FS backend; subscribe_changes returns a
per-subscriber queue. State persists across process restarts — the
swarm property the FS backend deliberately doesn't have.
Also lands two swarm-mode helpers:
- sync_from_filesystem(fs_store) — master-side, subscribes to a
FilesystemRuleStore and projects each RuleChange onto a ttp_rule
upsert/delete.
- tail_db(poll_interval) — worker-side, watermark poll over
ttp_rule.updated_at; emits RuleChange("definition", ...) for each
row that moved.
Why: swarm mode needs rule definitions and operator state to
propagate across hosts. The filesystem backend (E.3.5) was the
single-host-dev variant; this one survives restart and serves N
workers from a shared DB.
Notes:
- DatabaseRuleStore() with no args lazy-inits an in-memory SQLite
repo so the conformance fixture works without test plumbing. In
production the worker bootstrap (E.3.14) passes an explicit repo.
- The conftest.py rule_store fixture became async (pytest_asyncio),
per-backend creates/initializes a SQLite repo for the DB run.
- Adds a `seed_rule(store, rule_id, yaml)` helper to bridge backend
semantics: drop a YAML file (FS) vs insert a ttp_rule row (DB).
Used by the parametrized load_compiled conformance test.
- Late-bound _tracer() in both backends (was module-level get_tracer
binding) so test_tracing's monkeypatch of decnet.telemetry.get_tracer
actually affects span output.
xfails flipped: tests/ttp/store/test_database.py set_state-writes-to-
ttp_rule_state + filesystem-to-DB sync; tests/ttp/store/test_conformance.py
DB-side load_compiled / set_state isolation / round-trip / per-rule
fan-out / expired-state revert / set_state failure / get_state default
(was xfail-only-on-DB); tests/ttp/test_tracing.py set_state span
hierarchy.
Tests: 208 passed, 25 xfailed (gated on E.3.7 + lifters).
mypy: clean on all touched files.
Implements the filesystem-backed rule store body left empty at contract
phase: YAML parse + Pydantic validation, asyncinotify watch over
./rules/ttp/, in-process state cache with auto-revert on expires_at,
and a subscribe_changes() async iterator yielding one RuleChange per
per-rule edit. Bus topic builders ttp_rule_reloaded / ttp_rule_state
ship alongside.
Why: the rule plane needed a store before the engine (E.3.7) could
consume RuleChange events and atomically swap compiled rules into its
dispatch index.
Notes:
- Linux-only by construction (asyncinotify wheel gated by sys_platform
marker; FilesystemRuleStore.__init__ raises on non-Linux).
- Filename allowlist is the FIRST check on every inotify event.
- Content-hash dedup so a single write firing IN_CREATE + IN_CLOSE_WRITE
produces exactly one RuleChange.
- All compile work serializes on a single asyncio.Lock.
- Subscribers register their queue eagerly so events fired between
subscribe_changes() and the first __anext__() are buffered.
xfails flipped: per-save-style + filter-ordering + atomic-swap in
test_filesystem.py; load_compiled / set_state isolation / round-trip /
per-rule fan-out / expired-state revert / set_state failure semantics
in test_conformance.py (FS side; DB side stays xfail until E.3.6);
malformed-YAML compile-time check in test_rule_engine.py.
Tests: 197 passed, 35 xfailed (gated on E.3.6 / E.3.7 / lifters).
mypy + bandit: clean on all touched files.
Wiki update for the per-rule reload + state-change topics lands in a
matching wiki-checkout/Service-Bus.md edit (separate repo).
Adds decnet/ttp/store/ subpackage:
- base.py: RuleState frozen dataclass, RuleChange NamedTuple, RuleStore ABC
- factory.py: get_rule_store() reading DECNET_TTP_RULE_STORE_TYPE
- impl/filesystem.py: FilesystemRuleStore with sys.platform=='linux'
fail-fast guard, allowlist filename regex, raw inotify mask bits
(lib import deferred to E.3 so contract phase compiles without the
asyncinotify dep installed)
- impl/database.py: DatabaseRuleStore stub (no platform guard)
TTPRule + TTPRuleState SQLModels were already shipped at E.1.1; this
commit closes the type-only TYPE_CHECKING forward-ref in
rule_engine.py via real runtime imports through the new package.
Third and fourth TTP-tagging contract commits, plus a scoped subset
of the E.2.4 conformance tests covering the contract surface shipped
here (full hypothesis-fuzz suite still lands with E.2.4).
E.1.3 — decnet/ttp/base.py
- TaggerEvent NamedTuple: source_kind, source_id, attacker_uuid,
identity_uuid, session_id, decky_id, opaque payload.
- Tagger(ABC) with abstract async tag(); class-level name and
HANDLES: frozenset[str] (default empty so a misconfigured subclass
is loudly idle, not loudly noisy).
- TolerantTagger(Tagger): concrete tag() wraps abstract _tag_impl()
in try/except Exception (deliberately not BaseException — so
KeyboardInterrupt / SystemExit / asyncio.CancelledError propagate
and the worker can shut down cleanly). Swallowed exceptions log
at WARNING with exc_info, never ERROR — absence is the steady
state, not a bug. Subclasses override _tag_impl, never tag — the
tolerance contract is enforced in the base class, not on trust.
- KNOWN_SOURCE_KINDS: Final[frozenset[str]] enumerating every
source_kind a producer is allowed to emit. Closed-by-enumeration
at the runtime layer; the composite tagger keys its WARNING/INFO
bridge off this constant to surface the silent-drop trap from
the design doc (lines 160–195).
E.1.4 — decnet/ttp/factory.py
- get_tagger() reads DECNET_TTP_TAGGER_TYPE (default 'composite');
unknown values raise ValueError with the known-list. Mirrors
decnet.intel.factory and decnet.clustering.factory.
- _KNOWN = ('composite',). Per-lifter classes (E.1.6) are children
of the composite, not standalone tagger types.
- CompositeTagger(Tagger): pre-computes a dict[str, list[Tagger]]
dispatch index from each lifter's HANDLES; fans events out
concurrently with asyncio.gather and concatenates results.
Empty lifters=[] is the legal contract-phase state — E.1.6
wires the real lifters in.
- Unhandled-event observability: source_kind in KNOWN_SOURCE_KINDS
but no lifter claims it -> WARNING once per kind per process
(missed E.1.6 update). Unknown kind -> INFO once per kind per
process (future-feature telemetry, by design). Per-process dedup
via plain set; E.1.6 may swap in a proper rate-limiter once
production traffic shapes are known.
Tests — tests/ttp/test_base.py, tests/ttp/test_factory.py
- Tagger / TolerantTagger abstractness, missing-tag-impl rejection,
WARNING-not-ERROR log level, propagation of KeyboardInterrupt /
SystemExit / asyncio.CancelledError.
- Factory env-var routing, unknown-name ValueError, dispatch-index
correctness, only-claiming-lifter invocation, WARNING-once for
known-but-unclaimed kinds, INFO-once for unknown kinds, result
concatenation across lifters.
Mypy clean under .311/bin/mypy --ignore-missing-imports.