docs(debt,ttp): retire shipped lanes; file mal-hash-feed and R0047-disk-reach entries
Mark the EmailLifter heavyweight follow-up as PARTIAL PAID — R0042 /
R0046 (macro / password / smuggling lanes) / R0048 fire end-to-end
after commits 291b78c1 (decky extractors) and the ingester producer
projection that follows.
Two narrower DEBT entries replace the lanes that remain gated:
* "EmailLifter mal-hash feed integration" — R0046's mal_hash_match
lane needs a curated bad-hash feed (MalwareBazaar SHA-256 dump as
the v0 candidate, mirroring the FeodoProvider bulk-feed pattern at
decnet/intel/feodo.py). Feed integration, not extraction. Lifter
predicate already reads `payload.get("mal_hash_match")` — silent
today only because the field is absent.
* "EmailLifter R0047 BEC — unblock when artifact disk-reach lands"
cross-references the agent UID/GID DEBT entry that blocks
`decnet ttp` from reading artifacts written by deckies on the
same host. Disk-reach is the intended solution; raw body_text on
the bus is rejected because the bus transport is abstracted (the
UNIX-socket implementation may swap to networked at any time, and
privacy decisions must hold regardless of transport).
Append to TTP_TAGGING.md §"Producer wiring": the email.received
producer pointer (was "none — DEBT"), the full per-message payload
shape with the new heavyweight fields, and an explanatory block on
why the bus is body-text-free + how R0047 / R0048 each handle their
body dependency (R0048 via the precomputed scalar; R0047 deferred).
This commit is contained in:
118
DEBT.md
118
DEBT.md
@@ -63,42 +63,92 @@ R0043 / R0044 / R0045, and partial for R0046 (extension lane only).
|
||||
The remaining R0042 / R0046-deep / R0047 / R0048 lanes ride on the
|
||||
heavyweight extraction follow-up below.
|
||||
|
||||
### EmailLifter heavyweight feature extraction — R0042 / R0046 / R0047 / R0048
|
||||
### EmailLifter heavyweight feature extraction — PARTIAL PAID 2026-05-02
|
||||
|
||||
The cheap header / domain / extension extractions landed with the
|
||||
2026-05-02 producer paydown above. These predicates still need
|
||||
deeper signal before they fire:
|
||||
The Layer-2 extractors for R0042 / R0046 (macro / password /
|
||||
smuggling lanes) / R0048 landed in commits `291b78c1` (decky
|
||||
`_summarize_message` extension) and the follow-up ingester producer
|
||||
projection. After paydown the bus payload carries:
|
||||
|
||||
- **R0042 (mass phish)** — needs `body_simhash`. A near-duplicate
|
||||
hash (simhash / minhash) over the body lets the lifter score
|
||||
"same template fanned out to many recipients." The extractor is
|
||||
decky-side; the wire field is a single string.
|
||||
- **R0046 (malicious attachment)** — extension lane fires today.
|
||||
The remaining lanes need:
|
||||
- `attachment_macros: bool` — Office macro detection (oletools or
|
||||
a minimal VBA-stream sniff inside the .ole / .docx zip).
|
||||
- `attachment_password_protected: bool` — encrypted-archive
|
||||
detection across .zip / .7z / .rar.
|
||||
- `html_smuggling: bool` — heuristic over HTML body parts looking
|
||||
for the canonical `<a download>` + base64-blob / Blob() pattern.
|
||||
- `mal_hash_match: bool` — match against a curated bad-hash feed
|
||||
(provider TBD; could ride on the same enrich worker as
|
||||
AttackerIntel).
|
||||
- **R0047 (BEC) / R0048 (encoded payload)** — both predicates read
|
||||
`body_text`. We deliberately do NOT ship raw body text on the bus
|
||||
today: PII concerns, payload size, and the EmailLifter's evidence
|
||||
filter strips it anyway. The wire-up needs either (a) a hashed /
|
||||
truncated body projection, (b) the lifter reaching back to fetch
|
||||
the .eml off disk on the same host, or (c) a privacy-safe
|
||||
intermediate (BEC-keyword presence flags, base64 byte counts)
|
||||
that satisfies the predicates without leaking raw text. Pick one
|
||||
before the extractor work.
|
||||
- `body_simhash` — inlined 64-bit Charikar simhash for R0042
|
||||
- `body_base64_bytes` — largest decoded base64 chunk size for R0048
|
||||
- `attachment_macros` — OOXML `vbaProject.bin` sniff for R0046
|
||||
- `attachment_password_protected` — ZIP encryption flag + 7z / RAR
|
||||
/ CFBF magic-byte match for R0046
|
||||
- `html_smuggling` — lxml structural parse (with regex fallback) for
|
||||
R0046's HTML-smuggling lane
|
||||
|
||||
Field map per rule: `development/TTP_TAGGING.md` §"Bus topics →
|
||||
Producer wiring" + `decnet/ttp/impl/email_lifter.py` predicates.
|
||||
R0042 / R0046 (three lanes) / R0048 fire end-to-end after the
|
||||
2026-05-02 paydown. The remaining lanes are split into two narrower
|
||||
follow-up entries below: `R0046 mal_hash_match` (needs a curated
|
||||
bad-hash feed — feed integration, not extraction) and `R0047 BEC`
|
||||
(needs body_text on the wire, blocked on the agent UID/GID DEBT
|
||||
entry that gates artifact disk-reach).
|
||||
|
||||
Trigger: any of these rules generates enough signal in production
|
||||
to justify the extractor cost, OR a bad-hash feed becomes available
|
||||
and unblocks R0046's mal_hash_match lane in particular.
|
||||
### EmailLifter mal-hash feed integration — R0046 mal_hash_match
|
||||
|
||||
R0046's `mal_hash_match` lane stays gated until DECNET has a
|
||||
curated bad-hash feed it can lookup attachment SHA-256s against.
|
||||
Until then the producer ships
|
||||
`attachment_sha256s: list[str]` on the bus (already does as of the
|
||||
2026-05-02 paydown) but no producer or worker resolves a
|
||||
`mal_hash_match: bool` against a feed.
|
||||
|
||||
Design sketch (mirrors the Feodo bulk-feed pattern at
|
||||
`decnet/intel/feodo.py`):
|
||||
|
||||
- **Feed source**: MalwareBazaar's public SHA-256 dump as the v0
|
||||
candidate (free, daily refresh, ~100 MB compressed). Operators
|
||||
with paid VT subscriptions can swap the provider behind the same
|
||||
factory.
|
||||
- **Storage**: in-memory set keyed by sha256, TTL-cached on a slow
|
||||
refresh loop. Mirror `FeodoProvider`'s `_ensure_fresh` /
|
||||
`_refresh` shape exactly — the same trade-offs apply (free at
|
||||
call-site, one network round-trip per refresh window).
|
||||
- **Wiring**: ingester reads each `attachment_sha256` in the
|
||||
manifest at `_publish_email_received` time, checks against the
|
||||
cached feed, sets `mal_hash_match: bool` on the bus payload.
|
||||
- **Rule pack**: no rule changes. `_p_malicious_attachment` already
|
||||
reads `payload.get("mal_hash_match")` — silent today because the
|
||||
field is absent.
|
||||
|
||||
Trigger: a curated feed source is selected (MalwareBazaar dump or
|
||||
better) and the operator has bandwidth / disk for a fresh refresh
|
||||
loop.
|
||||
Owner: TBD.
|
||||
Filed: 2026-05-02 alongside the DEBT #3 paydown.
|
||||
Filed: 2026-05-02 alongside the heavyweight paydown.
|
||||
|
||||
### EmailLifter R0047 BEC — unblock when artifact disk-reach lands
|
||||
|
||||
R0047's predicate (`_p_bec` at
|
||||
`decnet/ttp/impl/email_lifter.py:244`) reads `body_text` and
|
||||
`subject`, substring-matching them against per-rule keyword lists.
|
||||
Shipping raw body text on the abstracted service bus is the wrong
|
||||
privacy stance — the bus transport is abstracted (the UNIX-socket
|
||||
implementation today may swap to a networked transport tomorrow),
|
||||
and treating "loopback today" as a license to ship PII would bite
|
||||
the moment that swap happens.
|
||||
|
||||
The right solution is **disk-reach**: the EmailLifter on tag-time
|
||||
opens the `.eml` from the artifact tree at
|
||||
`/var/lib/decnet/artifacts/{decky}/smtp/{stored_as}` and runs the
|
||||
predicate against the body parsed in-process. Bus carries only the
|
||||
artifact pointer; raw body text never leaves the host disk
|
||||
boundary.
|
||||
|
||||
This is currently **blocked** by an unresolved UID/GID DEBT entry
|
||||
— `decnet ttp` will run on agents but cannot read artifact files
|
||||
written by the SMTP decky even on the same host because of the
|
||||
permission mismatch. R0047 stays gated until that resolves; the
|
||||
legacy `_p_bec` body_text path remains in place untouched, so
|
||||
when disk-reach lands the predicate works without any code
|
||||
change.
|
||||
|
||||
Trigger: the agent UID/GID DEBT entry is paid, allowing
|
||||
`decnet ttp` to read artifacts written by deckies. Then add a
|
||||
disk-reach helper to the EmailLifter that opens the `.eml` lazily
|
||||
when a body-aware predicate runs.
|
||||
Owner: TBD.
|
||||
Cross-reference: this entry is gated on the agent UID/GID DEBT
|
||||
entry. Resolution of that unblocks R0047 BEC immediately.
|
||||
Filed: 2026-05-02 alongside the heavyweight paydown.
|
||||
|
||||
@@ -674,7 +674,7 @@ debugging silent rule-engine output.
|
||||
| `credential.reuse.detected` | `decnet/correlation/reuse_worker.py` | Per-finding publish; gated on `min_targets ≥ 2`. |
|
||||
| `attacker.session.ended` | `decnet/collector/worker.py:_SessionAggregator` | Indexes shell `command` events per `attacker_ip` and emits one envelope per `session_recorded` log event. |
|
||||
| `canary.{token}.triggered` | `decnet/canary/planter.py` | Per-token canary callbacks. |
|
||||
| `email.received` | **none** | No producer in tree (DEBT — wire when SMTP-receive persistence lands). |
|
||||
| `email.received` | `decnet/web/ingester.py:_publish_email_received` | Per-message publish after `repo.add_bounty(...)` for `bounty_type="artifact" payload.kind="mail"`. Gated on `repo.get_attacker_uuid_by_ip` resolving — orphans are dropped, never published. |
|
||||
|
||||
**`attacker.session.ended` payload shape** (commit-1 of the
|
||||
collector producer wiring):
|
||||
@@ -701,6 +701,69 @@ DB; the TTP worker resolves it from `attacker_ip` on the consume
|
||||
side. `id` per command is `f"{sid}#{idx}"` so the deterministic
|
||||
`compute_tag_uuid` collapses on replay (loop-prevention).
|
||||
|
||||
**`email.received` payload shape** (Layer-2 paydown 2026-05-02 —
|
||||
both cheap projections from commit `e9324aca` and heavyweight
|
||||
projections from commit `291b78c1`):
|
||||
|
||||
```json
|
||||
{
|
||||
"source_id": "<msg_id or stored_as>",
|
||||
"attacker_uuid": "att-7",
|
||||
"attacker_ip": "203.0.113.7",
|
||||
"decky_id": "mail-decky",
|
||||
"service": "smtp",
|
||||
"subject": "URGENT: invoice",
|
||||
"from_domain": "bigcorp.com",
|
||||
"mail_from_domain": "evil.example",
|
||||
"return_path_domain": "kit.evil",
|
||||
"rcpt_count": 3,
|
||||
"rcpt_domains": ["target.tld", "other.tld"],
|
||||
"x_mailer": "PHPMailer 6.0.7",
|
||||
"dkim_signed": true,
|
||||
"spf_pass": false,
|
||||
"urls": ["https://xn--80ak6aa92e.example/login"],
|
||||
"attachment_count": "2",
|
||||
"attachment_sha256s": ["...", "..."],
|
||||
"attachment_extensions": [".docm", ".zip"],
|
||||
"body_simhash": "deadbeefcafebabe",
|
||||
"body_base64_bytes": 8192,
|
||||
"attachment_macros": true,
|
||||
"attachment_password_protected": true,
|
||||
"html_smuggling": false,
|
||||
"stored_as": "<.eml basename in artifact tree>",
|
||||
"body_sha256": "<full .eml hash>"
|
||||
}
|
||||
```
|
||||
|
||||
The bus payload is intentionally **body-text-free**. Per the bus
|
||||
transport's abstract-factory model, sensitive content must hold up
|
||||
under any future networked transport, not just today's UNIX-socket
|
||||
implementation. R0047 BEC and R0048 encoded-payload predicates that
|
||||
need raw body text are handled differently:
|
||||
|
||||
- **R0048** fires from `body_base64_bytes` — precomputed by the
|
||||
decky during the same parse pass that builds the attachments
|
||||
manifest. The lifter's `_p_encoded_payload:289–298` body_text
|
||||
fallback becomes dead in normal operation but stays in place for
|
||||
tests and unusual payload shapes.
|
||||
- **R0047** is **deferred**. The intended solution is disk-reach:
|
||||
the EmailLifter on tag-time opens the `.eml` from the artifact
|
||||
tree using the bus-shipped `stored_as`, parses the body in-
|
||||
process, and runs `_p_bec` against it. Bus carries only the
|
||||
pointer; raw body text never leaves the host disk boundary.
|
||||
Currently blocked by an unresolved agent UID/GID DEBT entry —
|
||||
`decnet ttp` running on agents cannot read artifacts written by
|
||||
deckies on the same host because of the permission mismatch. See
|
||||
DEBT.md "EmailLifter R0047 BEC — unblock when artifact disk-reach
|
||||
lands" for the cross-reference.
|
||||
|
||||
Per-attachment booleans (`macro_indicator`, `encrypted`) ride
|
||||
inside the decky's `attachments_json` manifest and are reduced to
|
||||
top-level OR-flags by `_publish_email_received` at publish time —
|
||||
R0046's `_p_malicious_attachment` predicate fires on a single
|
||||
positive lane, so the OR-reduction matches the rule's semantics
|
||||
exactly.
|
||||
|
||||
### Producer–consumer health checks
|
||||
|
||||
Each producer is pinned by a regression test that drives one tick
|
||||
|
||||
Reference in New Issue
Block a user