From 2ce150a53e772e0ae56ba43df26c3833e35b5871 Mon Sep 17 00:00:00 2001 From: anti Date: Sat, 2 May 2026 18:24:51 -0400 Subject: [PATCH] docs(debt): mark email.received producer as paid; file heavyweight follow-up The 2026-05-02 paydown wires the producer at ingester.py after add_bounty(), with the cheap projections (domains, rcpt_count, attachment_count, x_mailer, dkim/spf, attachment shas + extensions, URLs). R0041 / R0043 / R0044 / R0045 fire end-to-end after this PR; R0046 partial. The remaining lanes (R0042 body_simhash, R0046 macro / smuggling / password / mal_hash, R0047 / R0048 body_text projection) are filed as a new entry "EmailLifter heavyweight feature extraction" with the field map and the privacy-vs-completeness fork on body_text called out for the next maintainer to pick a side. --- DEBT.md | 70 +++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 17 deletions(-) diff --git a/DEBT.md b/DEBT.md index ce96763a..e713fe95 100644 --- a/DEBT.md +++ b/DEBT.md @@ -45,24 +45,60 @@ stays YAML-only. Trigger: v0 precision targets met + at least one downstream user who needs it. -### `attacker.email.received` producer — wire when SMTP-receive -### persistence lands +### `attacker.email.received` producer — PAID 2026-05-02 -The TTP worker subscribes to `email.received` for the EmailLifter -(R0041–R0048), but no upstream component publishes the topic today. -The honeypot SMTP-relay path (`decnet/services/smtp_relay.py`) does -not persist received emails to a DB table the way ingester / -collector persist log events, so there is no source row to fan out -on. See `development/TTP_TAGGING.md` §"Bus topics → Producer -wiring" for the full producer audit. +Originally deferred under the premise that "the honeypot SMTP-relay +path does not persist received emails to a DB table." That was wrong +— SMTPProtocol persists every received message as a Bounty artifact +(`bounty_type="artifact"`, `payload.kind="mail"`) at +`decnet/web/ingester.py:596–615`, and the `_summarize_message` helper +already extracts the headers + per-attachment metadata. -**STALE PREMISE (2026-05-02):** ANTI noted during the intel audit -that the SMTP honeypots DO persist all received messages today. -Re-triage this entry — the gating premise above may no longer -hold and the producer wiring may be paydown-able directly. Map -the actual SMTP-receive persistence to `ReceivedEmail` (or its -extant analogue), then wire the publisher. +The producer was wired in the same commit that struck this entry. +The TTP worker subscribes to `email.received` (per +`decnet/ttp/worker.py:66`) and dispatches to the EmailLifter +(R0041–R0048). After paydown the channel is live for R0041 / +R0043 / R0044 / R0045, and partial for R0046 (extension lane only). -Trigger: SMTP-receive persistence model lands (a `ReceivedEmail` -SQLModel + ingest path). Wire the publisher in the same PR. +The remaining R0042 / R0046-deep / R0047 / R0048 lanes ride on the +heavyweight extraction follow-up below. + +### EmailLifter heavyweight feature extraction — R0042 / R0046 / R0047 / R0048 + +The cheap header / domain / extension extractions landed with the +2026-05-02 producer paydown above. These predicates still need +deeper signal before they fire: + +- **R0042 (mass phish)** — needs `body_simhash`. A near-duplicate + hash (simhash / minhash) over the body lets the lifter score + "same template fanned out to many recipients." The extractor is + decky-side; the wire field is a single string. +- **R0046 (malicious attachment)** — extension lane fires today. + The remaining lanes need: + - `attachment_macros: bool` — Office macro detection (oletools or + a minimal VBA-stream sniff inside the .ole / .docx zip). + - `attachment_password_protected: bool` — encrypted-archive + detection across .zip / .7z / .rar. + - `html_smuggling: bool` — heuristic over HTML body parts looking + for the canonical `` + base64-blob / Blob() pattern. + - `mal_hash_match: bool` — match against a curated bad-hash feed + (provider TBD; could ride on the same enrich worker as + AttackerIntel). +- **R0047 (BEC) / R0048 (encoded payload)** — both predicates read + `body_text`. We deliberately do NOT ship raw body text on the bus + today: PII concerns, payload size, and the EmailLifter's evidence + filter strips it anyway. The wire-up needs either (a) a hashed / + truncated body projection, (b) the lifter reaching back to fetch + the .eml off disk on the same host, or (c) a privacy-safe + intermediate (BEC-keyword presence flags, base64 byte counts) + that satisfies the predicates without leaking raw text. Pick one + before the extractor work. + +Field map per rule: `development/TTP_TAGGING.md` §"Bus topics → +Producer wiring" + `decnet/ttp/impl/email_lifter.py` predicates. + +Trigger: any of these rules generates enough signal in production +to justify the extractor cost, OR a bad-hash feed becomes available +and unblocks R0046's mal_hash_match lane in particular. Owner: TBD. +Filed: 2026-05-02 alongside the DEBT #3 paydown.