docs(dev): post-v1 roadmap + check off shipped "Commands executed" item

- DEVELOPMENT_V2.md (new): post-v1 direction. Everything here is after the v1 box is closed — federation, advanced behavioral profiling, maze-scale topology work. - DEVELOPMENT.md: flip "Commands executed" checkbox — full per-session command log already landed in the profiler's _extract_commands_from _events path.
2026-04-23 21:52:15 -04:00
parent 6cbf8de6a8
commit a6356abe27
2 changed files with 546 additions and 1 deletions
--- a/development/DEVELOPMENT.md
+++ b/development/DEVELOPMENT.md
@@ -125,7 +125,7 @@
 - [ ] **Latency triangulation** — JA4L RTT estimates for rough geolocation

 ### Service-Level Behavioral Profiling
- [ ] **Commands executed** — Full command log per session (SSH, Telnet, FTP, Redis, DB services)
+- [x] **Commands executed** — Full command log per session (SSH, Telnet, FTP, Redis, DB services)
 - [ ] **Services actively interacted with** — Distinguish port scans from live exploitation attempts
 - [ ] **Tooling attribution** — Byte-sequence signatures from known C2 frameworks in handshakes
 - [ ] **Credential reuse patterns** — Same username/password tried across multiple deckies/services
--- a/development/DEVELOPMENT_V2.md
+++ b/development/DEVELOPMENT_V2.md
@@ -0,0 +1,545 @@
+# DECNET Development Roadmap — V2
+
+Post-v1 direction. Everything here is *after* the v1 box is closed; this
+document exists to make sure the schema and architectural decisions we take
+*before* v1 ships don't box us out of the interesting post-v1 work.
+
+---
+
+## Keystroke Dynamics & Session Profiling
+
+**Goal:** graduate the Profiler from IP-keyed attribution to
+identity-independent correlation using structured per-session feature
+vectors. Attackers rotate IPs; they don't rotate their hands.
+
+The sessrec pipeline (v1) already lands every keystroke as a `ch:"i"` event
+with a `t` timestamp in the asciinema day-shard. The raw data is sitting
+on disk. The work is *not* collection — it's feature extraction, schema,
+and correlation primitives.
+
+### Features — cheap, post-processing over existing shards
+
+All of these are derived from a single pass over a session's `"i"` events.
+No new capture infra.
+
+- **Inter-keystroke interval (IKI) distribution**
+  `kd_iki_mean`, `kd_iki_stdev`, `kd_iki_p50`, `kd_iki_p95`.
+  Humans: 80–250ms, high variance. `sshpass`/`paramiko`/`expect`: <5ms,
+  near-zero variance. Paste attacks: bimodal (one huge gap, then a burst).
+- **Burst ratio**
+  `kd_burst_ratio` = fraction of keystrokes within <30ms of the previous
+  one. High = pasted commands, low = typed. One number; cleanly separates
+  operator-at-keyboard from automation.
+- **Control-character mix** (not just backspace — the whole family)
+  `kd_ctrl_backspace`, `kd_ctrl_wkill` (`\x17`), `kd_ctrl_ukill` (`\x15`),
+  `kd_ctrl_abort` (`\x03`), `kd_ctrl_eof` (`\x04`), `kd_arrow_rate`
+  (`\x1b[A/B/C/D`), `kd_tab_rate` (`\x09`).
+  *Presence of any control char* → bot/human split. *Mix* →
+  tooling/experience fingerprint. Heavy Ctrl-W = experienced Unix user.
+  Heavy arrows = history editing. Heavy tab = exploratory recon. Bots
+  emit `\r`-terminated literals and nothing else.
+- **Prompt-to-enter latency distribution**
+  `kd_enter_latency_p50`, `kd_enter_latency_p95`, and crucially the
+  **ratio p95/p50** as a cheap tail-heaviness indicator. Shape matters
+  more than median. Readers have long right-tails; memorized playbooks
+  have tight distributions; bots have cliff-edge distributions at whatever
+  `sleep` is hardcoded. p50 alone blurs these together.
+- **Typing-to-think ratio**
+  `kd_think_ratio` = idle gap (>2s before Enter) / total session time.
+  Recon/read behavior vs. memorized execution.
+- **Digraph rhythm fingerprint** — `kd_digraph_simhash`, 64-bit.
+  **Use SimHash (or MinHash) over quantile-bucketed digraph timings**,
+  not a regular hash. Hamming-distance comparable — similar rhythms get
+  similar hashes, which is the entire point. A plain hash of quantized
+  timings loses this: one digraph off = totally different hash. SimHash
+  is ~30 lines. This is the feature that graduates "fingerprint" into
+  "identity."
+
+### Schema — the single most important decision on this page
+
+The features above *must* live in a dedicated `session_profile` table,
+UUID-keyed, foreign key to the owning `session_recorded` Log row.
+**Not** in `meta_json_b64`. **Not** as ad-hoc bounty strings.
+
+Rationale:
+- Correlation wants `find_similar_sessions(sid, ε)` — that's a SQL query
+  over indexed float columns, not a 50k-row JSON parse.
+- Retrofitting is brutal. Decide the shape now, when the table is empty.
+- Federation (see below) needs these as structured columns to be
+  gossipable without per-operator parsing quirks.
+
+Sketch:
+
+```sql
+CREATE TABLE session_profile (
+    sid              TEXT PRIMARY KEY,           -- session UUID
+    log_id           INTEGER REFERENCES logs(id), -- owning session_recorded
+    schema_version   INTEGER NOT NULL,            -- evolve features without breaking gossip
+    -- timing moments
+    kd_iki_mean              REAL,
+    kd_iki_stdev             REAL,
+    kd_iki_p50               REAL,
+    kd_iki_p95               REAL,
+    kd_enter_latency_p50     REAL,
+    kd_enter_latency_p95     REAL,
+    -- ratios
+    kd_burst_ratio           REAL,
+    kd_think_ratio           REAL,
+    -- control-char rates
+    kd_ctrl_backspace        REAL,
+    kd_ctrl_wkill            REAL,
+    kd_ctrl_ukill            REAL,
+    kd_ctrl_abort            REAL,
+    kd_ctrl_eof              REAL,
+    kd_arrow_rate            REAL,
+    kd_tab_rate              REAL,
+    -- rhythm fingerprint
+    kd_digraph_simhash       BLOB,                -- 8 bytes, Hamming-comparable
+    -- derived
+    total_keystrokes         INTEGER,
+    session_duration_s       REAL,
+    created_at               TIMESTAMP
+);
+CREATE INDEX ix_session_profile_simhash ON session_profile(kd_digraph_simhash);
+```
+
+`schema_version` is non-negotiable from day one. Federation gossip in v2
+requires cross-operator compatibility; bumping feature definitions without
+a version field will silently poison other operators' clustering.
+
+### Sequencing — build the shell before the features
+
+The natural instinct is: features first, then correlation. **Invert it.**
+
+1. **`session_profile` table + empty write path** — one row per session, all
+   nulls. Ships immediately.
+2. **Correlator `find_similar_sessions(sid, ε)` primitive — stubbed.**
+   Returns empty. Wire the API, wire the UI surface in `SessionDrawer`
+   ("Similar Sessions: none yet").
+3. **First features** — the five cheapest (IKI moments, burst ratio,
+   control-char mix). Populate the table.
+4. **Similarity function goes live** — Euclidean distance over normalized
+   float features, Hamming distance over simhash. No ML needed.
+5. **Digraph simhash** — once cheap features are validated as useful.
+6. **Correlation graph integration** — `CorrelationEngine` learns to
+   follow profile-similarity edges, not just IP edges.
+
+**Why inverted:** once operators see a session profile with no "similar
+sessions" surface, they'll ask for it, and the UX (what's shown, how
+distance is rendered, what actions the link affords) will drive which
+features matter. Build the shell, let demand signal feature priority.
+
+### Correlation — what this enables
+
+Today, `CorrelationEngine` keys on `attacker_ip`. Session profiles let it
+graduate to **identity-independent correlation**.
+
+Concrete scenario:
+> Attacker hits operator A's maze from IP X. Three weeks later, hits
+> operator B's maze from IP Y. IPs don't match. But:
+> - `kd_digraph_simhash` Hamming distance: 3
+> - HASSH fingerprint: identical
+> - JARM: identical
+> - Command-sequence 3-gram overlap: 60%
+>
+> That's a cross-operator identity claim with receipts. SQL query, not
+> research project.
+
+Without structured session profiles, that analysis is literally impossible.
+With them, it's a join.
+
+### Federation implication (v2/v3)
+
+Session profile vectors are **exactly** the thing to gossip in the
+federation layer. They are:
+- **Small** — a few floats + an 8-byte hash. Cheap on the wire.
+- **Semantically meaningful** — encode identity without encoding
+  operator-specific infrastructure or PII.
+- **Collision-rich** — similar vectors across operators = shared adversary,
+  same pattern as the fingerprint-tuple idea, but richer and noisier-signal.
+
+The `session_profile` schema is effectively the v2 federation wire format.
+Design it that way from day one:
+- `schema_version` field (mentioned above).
+- No operator-identifying fields (decky name, internal IP, host labels).
+- SimHash specifically because Hamming distance works across operators
+  without needing shared training data.
+
+### Cost estimate
+
+- Five cheap features + table + stubbed `find_similar_sessions`:
+  **½ to 1 day** of implementation once the codebase is known.
+- Digraph simhash + live similarity: **another 1–2 days**.
+- Correlation engine integration: **depends on how deep the graph walk
+  goes** — 2–5 days for a first pass.
+
+The expensive part is not implementation. It's **deciding the schema well
+enough that we don't regret it in six months.** Hence this document.
+
+### What *not* to build
+
+- **Typing biometric login.** That's the research-paper framing. Wrong
+  frame for a honeypot. We're doing *tooling attribution* and *operator
+  clustering*, not authentication.
+- **Hold time / pressure / velocity.** Not on the SSH wire. Dead-end
+  without attacker-side instrumentation they will not run.
+- **ML clustering before similarity.** Euclidean + Hamming over normalized
+  features handles the first useful year of data. Don't reach for sklearn
+  until the simple thing demonstrably fails.
+
+---
+
+## Open questions to resolve before writing code
+
+1. **Normalization strategy for Euclidean distance** — z-score per-feature
+   over rolling window? Fixed population stats? Operator-local vs.
+   gossip-aligned?
+2. **ε tuning** — start empirically. Seed the UI with "show top-N nearest"
+   rather than a distance threshold. Learn ε from operator feedback.
+3. **Retention** — session profiles are small; keep indefinitely? Or
+   co-expire with the owning log row?
+4. **Privacy boundary on gossip** — do we hash the sid on the wire, or
+   exchange it plaintext? First pass: hashed, with a challenge-response
+   if two operators want to confirm same-session.
+
+---
+
+## Federation
+
+**Goal:** cross-operator threat-intel sharing. An operator in country A
+observes an attacker, and an operator in country B benefits — without
+either operator leaking internal infrastructure, attracting legal
+exposure, or becoming part of the other's attack surface.
+
+### Framing — federation, not P2P
+
+"Federation API + P2P" is two contradictory models. Pick **federation
+(Mastodon/ActivityPub shape), not P2P.** Reasons:
+
+- Operators already run persistent, addressable infrastructure. There is
+  a DECNET master host with a stable identity. That's a server, not a
+  transient peer. The hard problem libp2p/Nostr exist to solve is already
+  solved here.
+- Threat-intel sharing is fundamentally **many-to-many gossip with audit
+  trails**, not many-to-many streaming. Federated server-to-server gossip
+  maps naturally; DHT/P2P overhead buys nothing.
+- SWARM already ships mTLS + per-host cert fingerprint pinning. Promoting
+  that to cross-operator is a small, understood step. Bolting on libp2p
+  is a ground-up rewrite.
+
+### Scale — design for thousands, not millions
+
+Realistic ceiling for a security-operator federation is **low thousands**.
+Points of reference: Mastodon ~10k servers, Tor ~7k relays, Nostr ~2k
+active relays. A niche-of-a-niche like threat-intel federation will not
+exceed these.
+
+**Design explicitly for 1k operators, with an escape hatch at 10k.**
+Million-scale assumptions force Kafka/DHT/consensus theater that
+strangles actual work.
+
+### The hard problem is trust, not protocol
+
+Every threat-intel federation that ignored trust became a spam cesspool
+(early AlienVault OTX, half the ISAC world). Answers required:
+
+- **Sybil resistance** — what stops an adversary spinning 50 fake
+  operators to poison clustering? First-pass answer: **gated enrollment
+  via a central registry signed by the project root**. Yes, centralized.
+  "Centralized root, federated leaves" is Mastodon's model and it works.
+  Decentralize only if adoption forces it. Don't premature-decentralize.
+- **Adversarial join** — what stops an attacker running a decoy operator
+  specifically to map *what other operators observe*? This is the
+  terrifying one. Gossip must be **asymmetric by design**: publish
+  simhashes and other lossy fingerprints, not raw session data. Answer
+  queries with binary matches (yes/no + count + first-seen), not full
+  session payloads. The attacker-operator learns "this simhash is
+  known to someone," nothing more.
+- **Jurisdictional blast radius** — IP addresses are PII under GDPR. An
+  operator in Germany gossiping an attacker IP to an operator in
+  Singapore may commit a crime. **Per-operator, per-field opt-out with a
+  default-deny posture for PII-adjacent data** is non-negotiable.
+  Geo-tagged operator registry entries let the federation enforce this at
+  the protocol layer rather than the honor system.
+- **Legal chill** — CFAA, NIS2, sector-specific rules. Having a clear
+  "this operator chose to share X" audit trail per record protects
+  everyone. Every gossiped fact carries the originating operator's
+  signature.
+
+### What to build first — the two-operator handshake
+
+Build **one primitive and nothing else**: two operators who've manually
+exchanged pubkeys making signed queries to each other to answer one
+question — **"have you seen this SimHash?"**
+
+Response: `{ seen: bool, count: int, first_seen: timestamp }`. Nothing
+more. No sid, no decky, no IP, no raw session data.
+
+Why: if that primitive doesn't produce value for two operators, scaling
+it to a thousand won't either. If it does, the scaling is mostly
+operational — directory service, retry/backoff, rate limits — which are
+all solved problems. **The design risk lives entirely in the primitive,
+not in the scale-out.**
+
+Explicit non-goals for first iteration:
+- No pub-sub.
+- No DHT.
+- No gossip protocol.
+- No operator discovery.
+- No multi-hop.
+
+Just two pubkeys, one question, a signed answer.
+
+### Sequencing
+
+1. **Operator identity** — Ed25519 keypair per operator, generated at
+   install. Self-signed manifest (operator name, pubkey, contact, geo).
+2. **Two-operator handshake** — mTLS over HTTPS, pubkey pinning, one
+   RPC: `QuerySimHash(hash) → {seen, count, first_seen}`. Manual peer
+   config in YAML.
+3. **Registry** — central signed directory of known operators, fetched
+   on boot. Enables discovery without mandating central routing.
+4. **Additional query types** — JA3/JA4 lookup, HASSH lookup, command-
+   n-gram match. Same shape: lossy fingerprint in, binary+metadata out.
+5. **Publish path** — operators periodically push new fingerprints to
+   peers (gossiped, not polled). Signed, deduplicated by fingerprint.
+6. **Clustering & visualization** — UI surface for "this simhash is
+   known across N operators, first seen by operator-X on date-Y."
+
+### Codebase-aware observations
+
+- **`session_profile` *is* the federation wire format.** `schema_version`
+  from day one is non-negotiable — retrofitting cross-operator
+  compatibility after the fact is a nightmare.
+- **SWARM mTLS is the starting point**, not the finishing point. The
+  per-host fingerprint-pin pattern (memory:
+  feedback_mtls_pin_per_host.md) extends naturally to per-operator pins.
+- **The bus stays local.** Federation is cross-host in a way the bus was
+  explicitly scoped away from ("cross-host federation is out of MVP
+  scope"). A separate `decnet federation` worker is the right shape, not
+  bridging the bus over TCP.
+- **Attack surface.** Federation endpoints on operator hosts *are*
+  targets. If the coordination layer is compromised, honeypots become
+  attack infra. Bind federation RPC to a separate interface, separate
+  cert chain, separate systemd unit. Assume the federation daemon will
+  eventually be breached and design blast-radius containment into the
+  architecture — it must not share credentials, sockets, or filesystem
+  trust with the local DECNET workers.
+
+### Open questions
+
+1. **Who runs the root registry?** Project root (ANTI) as v2 default;
+   path to handoff/multi-root federation in v3.
+2. **Revocation** — how is a compromised operator kicked? Registry
+   signs a revocation list, peers refuse queries from revoked pubkeys.
+   Cache TTL?
+3. **Rate-limiting adversarial joins** — a registered operator can still
+   query-flood to enumerate fingerprints. Per-peer query budgets, with
+   a reputation signal that decays silence and rewards useful
+   publishing.
+4. **Consent UX** — what does an operator opt into when they enable
+   federation? Single toggle is wrong; per-category (fingerprints /
+   profiles / commands / IPs) is right. Defaults matter more than
+   flexibility.
+
+### Trust model refinement — 2026-04-22 design review
+
+The framing above (central signed registry, gated enrollment, revocation
+lists, reputation algorithms) is **superseded by a social-trust model**
+arrived at through adversarial design review. Captured here verbatim so
+the iteration trail isn't lost.
+
+**The governing insight:** trust is not technical, it is human. Instead
+of solving cross-operator trust with crypto/PKI/reputation, **leave it
+to humans**. Two operators meet at a conference, have beers, decide to
+federate. Recurse ad infinitum. No zero-knowledge proofs, no
+decentralized governance, no CRL theater.
+
+This is a deliberate deferral of a hard problem, not a claim that the
+hard problem is solved. The rest of this subsection documents why the
+social-trust model holds up under attack and where its residual weaknesses
+live.
+
+#### Attacks considered and outcomes
+
+**1. Transitive trust collapse.** First framing ("recurse ad infinitum")
+implied A→B→C gossip flow, which is how PGP's web of trust died.
+**Resolution:** model is hub-and-spoke, not transitive. Every federation
+edge is a manual, mutually-made handshake ("beershake"). A learns that C
+exists (because B mentioned C), but A does not federate with C until
+A and C separately beershake. Topology metadata leaks (B tells A that
+C exists, which C may not have consented to share), but gossip does not.
+
+**2. Attackers go to conferences too.** Social trust filters for
+"drinks beer at BSides," not "not-an-adversary." Ransomware affiliates
+and red teams can stand up DECNET, be charming, and join. **Accepted.**
+Social consequences scale better than cryptographic ones for this class
+of problem: if operator A's sponsored peer B starts gossiping garbage,
+A's other federates see that A brought B in — reputation damage is the
+brake. Not perfect, but it's a real cost.
+
+**3. The query IS the intel.** Aggregate-only responses
+(`{seen, count, first_seen}`) don't defeat recon — a phishing operator
+querying "has anyone seen `paypa1-security.com`?" learns whether their
+cover domain has burned. **Resolution: federation is push-only, not
+pull.** Peers send what they chose to send; nobody can ask on demand.
+C still gets data, but not on-demand data. This closes the dangerous
+recon lane outright.
+
+**4. Compromised-peer inheritance.** A friend's DECNET master is a box
+on the internet. When it gets rooted, the attacker inherits every
+federation edge that admin held. **Conceded as a real risk.** No clean
+mitigation beyond the push-only constraint (limits what the compromised
+node can exfiltrate in real-time) and the hub-and-spoke constraint
+(limits blast radius to that operator's direct peers).
+
+**5. Revocation non-transitivity.** If trust is social, so is distrust.
+A kicks B; Carol (who also federates with B) still relays to B.
+**Resolution: see #2 — A's kick is visible to A's other federates,
+sponsorship accountability propagates socially in real-time via the
+topology-transparency mechanism (see below). Not a coordination
+problem DECNET solves; one it exposes so humans can solve it.**
+
+**6. Legal/compliance at enterprise scale.** GDPR, HIPAA, FFIEC,
+data-residency — informal federation has no DPA and will not survive
+first enterprise deal. **Resolution: write down the deferral.** v1
+federation is explicitly for informal peer networks only. A DPA
+framework gates any regulated-org federation; that is v3 scope.
+
+#### The full model
+
+- **Hub-and-spoke, pairwise beershakes.** No transitive trust.
+- **Push-only, never pull.** Peers push what they choose; no on-demand
+  queries, no recon surface.
+- **Sponsorship-as-reputation.** A brought B in; if B misbehaves, A's
+  reputation across A's other federates degrades. Social cost, real-time.
+- **Hash-evidence on every contribution.** To prevent fabricated pushes
+  that game contribution ratios, every shared fact must include a
+  verifiable fingerprint (message sha256, cert fingerprint, artifact
+  sha256) — not a free-text claim. "I saw domain X" without an artifact
+  hash is not a contribution.
+- **Contribution ratios, enforced per peer.** Pure consumers get starved;
+  peers must push roughly as much as they receive. Paired with
+  hash-evidence above so ratios can't be gamed with garbage.
+- **Topology transparency.** Every federate can see the full federation
+  graph from their vantage point: who sponsored whom, who kicked whom,
+  contribution volumes. Makes sponsorship accountability observable in
+  real-time rather than post-incident.
+- **Omission-based canary watermarking.** Individualizing "saw domain X"
+  across N peers is impossible (you can't watermark a string). Instead,
+  A withholds X from peer 23 specifically; if X surfaces externally, A
+  can triangulate across multiple omission canaries over time. Forensic
+  tool, not preventive.
+
+#### Accepted residuals
+
+These are structural to gossip systems and are **accepted with
+mitigation, not eliminated:**
+
+- **Audit gap.** Misbehaving federates leak silently — queries (or in
+  push-only, consumption) are supposed to happen. By the time a peer
+  notices the leak, the data has already moved. Mitigation: omission
+  canaries provide post-hoc forensic attribution; sponsorship
+  accountability provides the social pressure to catch it faster.
+- **Correlation-at-receiver.** C receives push from A, B, D, E. None
+  shared much individually; C correlates across them to build a picture
+  no sender authorized. Cannot be designed out without killing the
+  federation's entire value proposition. Priced in, documented.
+- **Push cadence as metadata.** Even push-only, timing/volume of what A
+  pushes tells receivers about A's current coverage/posture. Low
+  bandwidth, probably unfixable without batching jitter that hurts
+  timeliness. Accepted.
+- **Topology metadata leak.** B telling A that C exists (as part of the
+  "recurse ad infinitum" socialization) is itself signal C may not have
+  consented to share. In regulated sectors, even "bank X runs deception"
+  is information. Minor, but noted.
+
+#### Why the social-trust framing is correct *now*
+
+The earlier subsection (central registry, gated enrollment, Ed25519
+per-operator identities, signed revocation lists) is not wrong, it is
+**premature**. Building that machinery before there is a real
+federation with real users is the "million-scale assumptions that
+strangle actual work" trap called out in the Scale section above. The
+social-trust model ships when there are two friends with two
+deployments who want to try it. The crypto/registry model ships when
+there is a customer whose compliance team requires it.
+
+What cannot be deferred: **the wire format**. `session_profile`,
+`smtp_targets`, and future federation-adjacent tables must still carry
+`schema_version` from day one. Privacy-preserving shape
+(`{seen, count, first_seen}` aggregate-only, no attacker identity) is
+the right posture independent of trust model — minimizing leak surface
+is always correct.
+
+#### What this changes in the earlier Open Questions
+
+- **#1 Root registry** — no longer v2-blocking. Deferred to v3 or
+  whenever a federation outgrows social coordination.
+- **#2 Revocation** — answered. "A kicks B; A's other federates see it
+  in the topology-transparency view and make their own call." No CRL.
+- **#3 Rate-limiting adversarial joins** — partially answered. Push-only
+  eliminates the query-flood vector. Per-peer rate limits on push still
+  apply to prevent contribution-ratio gaming via spam.
+- **#4 Consent UX** — unchanged. Per-category opt-in with default-deny
+  on PII-adjacent data is still the right shape.
+
+#### Second round — scope-verified pull as a complementary channel
+
+Follow-up design review revisited the "query IS the intel" problem from
+a different angle. Push-only closes the recon attack but sacrifices the
+defender's most useful query shape: "has anyone seen anything new about
+MY brand today?"
+
+**Proposal:** operators verify domain scope at registration time (ACME
+dns-01 pattern — TXT record challenge), and pull queries are restricted
+to data about scope-verified domains. BigBank can query about
+`bigbank.com`; they cannot query about `competitor.com`.
+
+**This is a genuine addition, not a replacement for push-only.** Both
+channels coexist with different threat models:
+
+- **Push-only channel** — peer-volunteered gossip. Handles the case
+  where the domain being attacked is one the defender does NOT own
+  (lookalikes, typosquats, newly-registered phishing infra). This is
+  the defender's primary use case and scope-verified pull cannot serve
+  it without fuzzy matching, which attackers abuse.
+- **Scope-verified pull channel** — bounded "what's new about my
+  verified scope" queries. Narrow, auditable, recon-resistant.
+
+**Attacks considered on the pull channel:**
+
+1. **Indirect-reference / lookalike queries.** The intel defenders most
+   need is about domains they DON'T own (`bigbank-secure-login.support`
+   targeting BigBank). Allowing fuzzy/lookalike matching under "claimed
+   typo of my scope" reopens the recon lane. **Resolution: exact-match
+   only on the pull channel. Lookalike intel flows through push-only.**
+2. **Domain graveyard.** Attacker buys an expired domain, DNS-verifies,
+   pulls historical phishing intel for a brand they're about to revive.
+   **Mitigation: scope applies prospectively only — queries bounded to
+   intel indexed since the operator's verification timestamp.** Bake in
+   from day one, hard to retrofit.
+3. **Subdomain scope inference.** Wildcard matching (`*.bigbank.com`)
+   invites overreach. **Resolution: explicit list of scoped domains,
+   each individually DNS-verified. No wildcard inference.**
+4. **Self-lookup leaks coverage maps to the asker.** BigBank querying
+   their scope learns which peers have visibility into BigBank-targeting
+   campaigns. **Resolution: aggregate-only response
+   (`{seen, count, first_seen}`) with no per-peer attribution.** Already
+   implicit.
+5. **MSSP multi-tenant churn.** An MSSP claims scope over 200 client
+   brands; clients leave, the DECNET retains ex-scope.
+   **Mitigation: periodic re-verification (weekly cadence) of every
+   scoped domain's TXT record.**
+
+**Residual, accepted:** scope-verified pull cannot serve queries about
+domains the defender doesn't control. That's the structural limit —
+push-only covers it.
+
+**Net model for v2 federation:**
+- Identity: Ed25519 keypair + DNS-verified scope list (explicit, not
+  wildcard, periodic re-verify).
+- Channel 1 — push: hash-evidenced contributions, peer-volunteered, no
+  query surface.
+- Channel 2 — pull: scope-verified, exact-match, prospective-only,
+  aggregate-response, rate-limited.