DECNET/development/DEVELOPMENT_V2.md

# DECNET Development Roadmap — V2

Post-v1 direction. Everything here is *after* the v1 box is closed; this
document exists to make sure the schema and architectural decisions we take
*before* v1 ships don't box us out of the interesting post-v1 work.

---

## Keystroke Dynamics & Session Profiling

**Goal:** graduate the Profiler from IP-keyed attribution to
identity-independent correlation using structured per-session feature
vectors. Attackers rotate IPs; they don't rotate their hands.

The sessrec pipeline (v1) already lands every keystroke as a `ch:"i"` event
with a `t` timestamp in the asciinema day-shard. The raw data is sitting
on disk. The work is *not* collection — it's feature extraction, schema,
and correlation primitives.

### Features — cheap, post-processing over existing shards

All of these are derived from a single pass over a session's `"i"` events.
No new capture infra.

- **Inter-keystroke interval (IKI) distribution**
  `kd_iki_mean`, `kd_iki_stdev`, `kd_iki_p50`, `kd_iki_p95`.
  Humans: 80–250ms, high variance. `sshpass`/`paramiko`/`expect`: <5ms,
  near-zero variance. Paste attacks: bimodal (one huge gap, then a burst).
- **Burst ratio**
  `kd_burst_ratio` = fraction of keystrokes within <30ms of the previous
  one. High = pasted commands, low = typed. One number; cleanly separates
  operator-at-keyboard from automation.
- **Control-character mix** (not just backspace — the whole family)
  `kd_ctrl_backspace`, `kd_ctrl_wkill` (`\x17`), `kd_ctrl_ukill` (`\x15`),
  `kd_ctrl_abort` (`\x03`), `kd_ctrl_eof` (`\x04`), `kd_arrow_rate`
  (`\x1b[A/B/C/D`), `kd_tab_rate` (`\x09`).
  *Presence of any control char* → bot/human split. *Mix* →
  tooling/experience fingerprint. Heavy Ctrl-W = experienced Unix user.
  Heavy arrows = history editing. Heavy tab = exploratory recon. Bots
  emit `\r`-terminated literals and nothing else.
- **Prompt-to-enter latency distribution**
  `kd_enter_latency_p50`, `kd_enter_latency_p95`, and crucially the
  **ratio p95/p50** as a cheap tail-heaviness indicator. Shape matters
  more than median. Readers have long right-tails; memorized playbooks
  have tight distributions; bots have cliff-edge distributions at whatever
  `sleep` is hardcoded. p50 alone blurs these together.
- **Typing-to-think ratio**
  `kd_think_ratio` = idle gap (>2s before Enter) / total session time.
  Recon/read behavior vs. memorized execution.
- **Digraph rhythm fingerprint** — `kd_digraph_simhash`, 64-bit.
  **Use SimHash (or MinHash) over quantile-bucketed digraph timings**,
  not a regular hash. Hamming-distance comparable — similar rhythms get
  similar hashes, which is the entire point. A plain hash of quantized
  timings loses this: one digraph off = totally different hash. SimHash
  is ~30 lines. This is the feature that graduates "fingerprint" into
  "identity."

### Schema — the single most important decision on this page

The features above *must* live in a dedicated `session_profile` table,
UUID-keyed, foreign key to the owning `session_recorded` Log row.
**Not** in `meta_json_b64`. **Not** as ad-hoc bounty strings.

Rationale:
- Correlation wants `find_similar_sessions(sid, ε)` — that's a SQL query
  over indexed float columns, not a 50k-row JSON parse.
- Retrofitting is brutal. Decide the shape now, when the table is empty.
- Federation (see below) needs these as structured columns to be
  gossipable without per-operator parsing quirks.

Sketch:

```sql
CREATE TABLE session_profile (
    sid              TEXT PRIMARY KEY,           -- session UUID
    log_id           INTEGER REFERENCES logs(id), -- owning session_recorded
    schema_version   INTEGER NOT NULL,            -- evolve features without breaking gossip
    -- timing moments
    kd_iki_mean              REAL,
    kd_iki_stdev             REAL,
    kd_iki_p50               REAL,
    kd_iki_p95               REAL,
    kd_enter_latency_p50     REAL,
    kd_enter_latency_p95     REAL,
    -- ratios
    kd_burst_ratio           REAL,
    kd_think_ratio           REAL,
    -- control-char rates
    kd_ctrl_backspace        REAL,
    kd_ctrl_wkill            REAL,
    kd_ctrl_ukill            REAL,
    kd_ctrl_abort            REAL,
    kd_ctrl_eof              REAL,
    kd_arrow_rate            REAL,
    kd_tab_rate              REAL,
    -- rhythm fingerprint
    kd_digraph_simhash       BLOB,                -- 8 bytes, Hamming-comparable
    -- derived
    total_keystrokes         INTEGER,
    session_duration_s       REAL,
    created_at               TIMESTAMP
);
CREATE INDEX ix_session_profile_simhash ON session_profile(kd_digraph_simhash);
```

`schema_version` is non-negotiable from day one. Federation gossip in v2
requires cross-operator compatibility; bumping feature definitions without
a version field will silently poison other operators' clustering.

### Sequencing — build the shell before the features

The natural instinct is: features first, then correlation. **Invert it.**

1. **`session_profile` table + empty write path** — one row per session, all
   nulls. Ships immediately.
2. **Correlator `find_similar_sessions(sid, ε)` primitive — stubbed.**
   Returns empty. Wire the API, wire the UI surface in `SessionDrawer`
   ("Similar Sessions: none yet").
3. **First features** — the five cheapest (IKI moments, burst ratio,
   control-char mix). Populate the table.
4. **Similarity function goes live** — Euclidean distance over normalized
   float features, Hamming distance over simhash. No ML needed.
5. **Digraph simhash** — once cheap features are validated as useful.
6. **Correlation graph integration** — `CorrelationEngine` learns to
   follow profile-similarity edges, not just IP edges.

**Why inverted:** once operators see a session profile with no "similar
sessions" surface, they'll ask for it, and the UX (what's shown, how
distance is rendered, what actions the link affords) will drive which
features matter. Build the shell, let demand signal feature priority.

### Correlation — what this enables

Today, `CorrelationEngine` keys on `attacker_ip`. Session profiles let it
graduate to **identity-independent correlation**.

Concrete scenario:
> Attacker hits operator A's maze from IP X. Three weeks later, hits
> operator B's maze from IP Y. IPs don't match. But:
> - `kd_digraph_simhash` Hamming distance: 3
> - HASSH fingerprint: identical
> - JARM: identical
> - Command-sequence 3-gram overlap: 60%
>
> That's a cross-operator identity claim with receipts. SQL query, not
> research project.

Without structured session profiles, that analysis is literally impossible.
With them, it's a join.

### Federation implication (v2/v3)

Session profile vectors are **exactly** the thing to gossip in the
federation layer. They are:
- **Small** — a few floats + an 8-byte hash. Cheap on the wire.
- **Semantically meaningful** — encode identity without encoding
  operator-specific infrastructure or PII.
- **Collision-rich** — similar vectors across operators = shared adversary,
  same pattern as the fingerprint-tuple idea, but richer and noisier-signal.

The `session_profile` schema is effectively the v2 federation wire format.
Design it that way from day one:
- `schema_version` field (mentioned above).
- No operator-identifying fields (decky name, internal IP, host labels).
- SimHash specifically because Hamming distance works across operators
  without needing shared training data.

### Cost estimate

- Five cheap features + table + stubbed `find_similar_sessions`:
  **½ to 1 day** of implementation once the codebase is known.
- Digraph simhash + live similarity: **another 1–2 days**.
- Correlation engine integration: **depends on how deep the graph walk
  goes** — 2–5 days for a first pass.

The expensive part is not implementation. It's **deciding the schema well
enough that we don't regret it in six months.** Hence this document.

### What *not* to build

- **Typing biometric login.** That's the research-paper framing. Wrong
  frame for a honeypot. We're doing *tooling attribution* and *operator
  clustering*, not authentication.
- **Hold time / pressure / velocity.** Not on the SSH wire. Dead-end
  without attacker-side instrumentation they will not run.
- **ML clustering before similarity.** Euclidean + Hamming over normalized
  features handles the first useful year of data. Don't reach for sklearn
  until the simple thing demonstrably fails.

---

## Open questions to resolve before writing code

1. **Normalization strategy for Euclidean distance** — z-score per-feature
   over rolling window? Fixed population stats? Operator-local vs.
   gossip-aligned?
2. **ε tuning** — start empirically. Seed the UI with "show top-N nearest"
   rather than a distance threshold. Learn ε from operator feedback.
3. **Retention** — session profiles are small; keep indefinitely? Or
   co-expire with the owning log row?
4. **Privacy boundary on gossip** — do we hash the sid on the wire, or
   exchange it plaintext? First pass: hashed, with a challenge-response
   if two operators want to confirm same-session.

---

## Federation

**Goal:** cross-operator threat-intel sharing. An operator in country A
observes an attacker, and an operator in country B benefits — without
either operator leaking internal infrastructure, attracting legal
exposure, or becoming part of the other's attack surface.

### Framing — federation, not P2P

"Federation API + P2P" is two contradictory models. Pick **federation
(Mastodon/ActivityPub shape), not P2P.** Reasons:

- Operators already run persistent, addressable infrastructure. There is
  a DECNET master host with a stable identity. That's a server, not a
  transient peer. The hard problem libp2p/Nostr exist to solve is already
  solved here.
- Threat-intel sharing is fundamentally **many-to-many gossip with audit
  trails**, not many-to-many streaming. Federated server-to-server gossip
  maps naturally; DHT/P2P overhead buys nothing.
- SWARM already ships mTLS + per-host cert fingerprint pinning. Promoting
  that to cross-operator is a small, understood step. Bolting on libp2p
  is a ground-up rewrite.

### Scale — design for thousands, not millions

Realistic ceiling for a security-operator federation is **low thousands**.
Points of reference: Mastodon ~10k servers, Tor ~7k relays, Nostr ~2k
active relays. A niche-of-a-niche like threat-intel federation will not
exceed these.

**Design explicitly for 1k operators, with an escape hatch at 10k.**
Million-scale assumptions force Kafka/DHT/consensus theater that
strangles actual work.

### The hard problem is trust, not protocol

Every threat-intel federation that ignored trust became a spam cesspool
(early AlienVault OTX, half the ISAC world). Answers required:

- **Sybil resistance** — what stops an adversary spinning 50 fake
  operators to poison clustering? First-pass answer: **gated enrollment
  via a central registry signed by the project root**. Yes, centralized.
  "Centralized root, federated leaves" is Mastodon's model and it works.
  Decentralize only if adoption forces it. Don't premature-decentralize.
- **Adversarial join** — what stops an attacker running a decoy operator
  specifically to map *what other operators observe*? This is the
  terrifying one. Gossip must be **asymmetric by design**: publish
  simhashes and other lossy fingerprints, not raw session data. Answer
  queries with binary matches (yes/no + count + first-seen), not full
  session payloads. The attacker-operator learns "this simhash is
  known to someone," nothing more.
- **Jurisdictional blast radius** — IP addresses are PII under GDPR. An
  operator in Germany gossiping an attacker IP to an operator in
  Singapore may commit a crime. **Per-operator, per-field opt-out with a
  default-deny posture for PII-adjacent data** is non-negotiable.
  Geo-tagged operator registry entries let the federation enforce this at
  the protocol layer rather than the honor system.
- **Legal chill** — CFAA, NIS2, sector-specific rules. Having a clear
  "this operator chose to share X" audit trail per record protects
  everyone. Every gossiped fact carries the originating operator's
  signature.

### What to build first — the two-operator handshake

Build **one primitive and nothing else**: two operators who've manually
exchanged pubkeys making signed queries to each other to answer one
question — **"have you seen this SimHash?"**

Response: `{ seen: bool, count: int, first_seen: timestamp }`. Nothing
more. No sid, no decky, no IP, no raw session data.

Why: if that primitive doesn't produce value for two operators, scaling
it to a thousand won't either. If it does, the scaling is mostly
operational — directory service, retry/backoff, rate limits — which are
all solved problems. **The design risk lives entirely in the primitive,
not in the scale-out.**

Explicit non-goals for first iteration:
- No pub-sub.
- No DHT.
- No gossip protocol.
- No operator discovery.
- No multi-hop.

Just two pubkeys, one question, a signed answer.

### Sequencing

1. **Operator identity** — Ed25519 keypair per operator, generated at
   install. Self-signed manifest (operator name, pubkey, contact, geo).
2. **Two-operator handshake** — mTLS over HTTPS, pubkey pinning, one
   RPC: `QuerySimHash(hash) → {seen, count, first_seen}`. Manual peer
   config in YAML.
3. **Registry** — central signed directory of known operators, fetched
   on boot. Enables discovery without mandating central routing.
4. **Additional query types** — JA3/JA4 lookup, HASSH lookup, command-
   n-gram match. Same shape: lossy fingerprint in, binary+metadata out.
5. **Publish path** — operators periodically push new fingerprints to
   peers (gossiped, not polled). Signed, deduplicated by fingerprint.
6. **Clustering & visualization** — UI surface for "this simhash is
   known across N operators, first seen by operator-X on date-Y."

### Codebase-aware observations

- **`session_profile` *is* the federation wire format.** `schema_version`
  from day one is non-negotiable — retrofitting cross-operator
  compatibility after the fact is a nightmare.
- **SWARM mTLS is the starting point**, not the finishing point. The
  per-host fingerprint-pin pattern (memory:
  feedback_mtls_pin_per_host.md) extends naturally to per-operator pins.
- **The bus stays local.** Federation is cross-host in a way the bus was
  explicitly scoped away from ("cross-host federation is out of MVP
  scope"). A separate `decnet federation` worker is the right shape, not
  bridging the bus over TCP.
- **Attack surface.** Federation endpoints on operator hosts *are*
  targets. If the coordination layer is compromised, honeypots become
  attack infra. Bind federation RPC to a separate interface, separate
  cert chain, separate systemd unit. Assume the federation daemon will
  eventually be breached and design blast-radius containment into the
  architecture — it must not share credentials, sockets, or filesystem
  trust with the local DECNET workers.

### Open questions

1. **Who runs the root registry?** Project root (ANTI) as v2 default;
   path to handoff/multi-root federation in v3.
2. **Revocation** — how is a compromised operator kicked? Registry
   signs a revocation list, peers refuse queries from revoked pubkeys.
   Cache TTL?
3. **Rate-limiting adversarial joins** — a registered operator can still
   query-flood to enumerate fingerprints. Per-peer query budgets, with
   a reputation signal that decays silence and rewards useful
   publishing.
4. **Consent UX** — what does an operator opt into when they enable
   federation? Single toggle is wrong; per-category (fingerprints /
   profiles / commands / IPs) is right. Defaults matter more than
   flexibility.

### Trust model refinement — 2026-04-22 design review

The framing above (central signed registry, gated enrollment, revocation
lists, reputation algorithms) is **superseded by a social-trust model**
arrived at through adversarial design review. Captured here verbatim so
the iteration trail isn't lost.

**The governing insight:** trust is not technical, it is human. Instead
of solving cross-operator trust with crypto/PKI/reputation, **leave it
to humans**. Two operators meet at a conference, have beers, decide to
federate. Recurse ad infinitum. No zero-knowledge proofs, no
decentralized governance, no CRL theater.

This is a deliberate deferral of a hard problem, not a claim that the
hard problem is solved. The rest of this subsection documents why the
social-trust model holds up under attack and where its residual weaknesses
live.

#### Attacks considered and outcomes

**1. Transitive trust collapse.** First framing ("recurse ad infinitum")
implied A→B→C gossip flow, which is how PGP's web of trust died.
**Resolution:** model is hub-and-spoke, not transitive. Every federation
edge is a manual, mutually-made handshake ("beershake"). A learns that C
exists (because B mentioned C), but A does not federate with C until
A and C separately beershake. Topology metadata leaks (B tells A that
C exists, which C may not have consented to share), but gossip does not.

**2. Attackers go to conferences too.** Social trust filters for
"drinks beer at BSides," not "not-an-adversary." Ransomware affiliates
and red teams can stand up DECNET, be charming, and join. **Accepted.**
Social consequences scale better than cryptographic ones for this class
of problem: if operator A's sponsored peer B starts gossiping garbage,
A's other federates see that A brought B in — reputation damage is the
brake. Not perfect, but it's a real cost.

**3. The query IS the intel.** Aggregate-only responses
(`{seen, count, first_seen}`) don't defeat recon — a phishing operator
querying "has anyone seen `paypa1-security.com`?" learns whether their
cover domain has burned. **Resolution: federation is push-only, not
pull.** Peers send what they chose to send; nobody can ask on demand.
C still gets data, but not on-demand data. This closes the dangerous
recon lane outright.

**4. Compromised-peer inheritance.** A friend's DECNET master is a box
on the internet. When it gets rooted, the attacker inherits every
federation edge that admin held. **Conceded as a real risk.** No clean
mitigation beyond the push-only constraint (limits what the compromised
node can exfiltrate in real-time) and the hub-and-spoke constraint
(limits blast radius to that operator's direct peers).

**5. Revocation non-transitivity.** If trust is social, so is distrust.
A kicks B; Carol (who also federates with B) still relays to B.
**Resolution: see #2 — A's kick is visible to A's other federates,
sponsorship accountability propagates socially in real-time via the
topology-transparency mechanism (see below). Not a coordination
problem DECNET solves; one it exposes so humans can solve it.**

**6. Legal/compliance at enterprise scale.** GDPR, HIPAA, FFIEC,
data-residency — informal federation has no DPA and will not survive
first enterprise deal. **Resolution: write down the deferral.** v1
federation is explicitly for informal peer networks only. A DPA
framework gates any regulated-org federation; that is v3 scope.

#### The full model

- **Hub-and-spoke, pairwise beershakes.** No transitive trust.
- **Push-only, never pull.** Peers push what they choose; no on-demand
  queries, no recon surface.
- **Sponsorship-as-reputation.** A brought B in; if B misbehaves, A's
  reputation across A's other federates degrades. Social cost, real-time.
- **Hash-evidence on every contribution.** To prevent fabricated pushes
  that game contribution ratios, every shared fact must include a
  verifiable fingerprint (message sha256, cert fingerprint, artifact
  sha256) — not a free-text claim. "I saw domain X" without an artifact
  hash is not a contribution.
- **Contribution ratios, enforced per peer.** Pure consumers get starved;
  peers must push roughly as much as they receive. Paired with
  hash-evidence above so ratios can't be gamed with garbage.
- **Topology transparency.** Every federate can see the full federation
  graph from their vantage point: who sponsored whom, who kicked whom,
  contribution volumes. Makes sponsorship accountability observable in
  real-time rather than post-incident.
- **Omission-based canary watermarking.** Individualizing "saw domain X"
  across N peers is impossible (you can't watermark a string). Instead,
  A withholds X from peer 23 specifically; if X surfaces externally, A
  can triangulate across multiple omission canaries over time. Forensic
  tool, not preventive.

#### Accepted residuals

These are structural to gossip systems and are **accepted with
mitigation, not eliminated:**

- **Audit gap.** Misbehaving federates leak silently — queries (or in
  push-only, consumption) are supposed to happen. By the time a peer
  notices the leak, the data has already moved. Mitigation: omission
  canaries provide post-hoc forensic attribution; sponsorship
  accountability provides the social pressure to catch it faster.
- **Correlation-at-receiver.** C receives push from A, B, D, E. None
  shared much individually; C correlates across them to build a picture
  no sender authorized. Cannot be designed out without killing the
  federation's entire value proposition. Priced in, documented.
- **Push cadence as metadata.** Even push-only, timing/volume of what A
  pushes tells receivers about A's current coverage/posture. Low
  bandwidth, probably unfixable without batching jitter that hurts
  timeliness. Accepted.
- **Topology metadata leak.** B telling A that C exists (as part of the
  "recurse ad infinitum" socialization) is itself signal C may not have
  consented to share. In regulated sectors, even "bank X runs deception"
  is information. Minor, but noted.

#### Why the social-trust framing is correct *now*

The earlier subsection (central registry, gated enrollment, Ed25519
per-operator identities, signed revocation lists) is not wrong, it is
**premature**. Building that machinery before there is a real
federation with real users is the "million-scale assumptions that
strangle actual work" trap called out in the Scale section above. The
social-trust model ships when there are two friends with two
deployments who want to try it. The crypto/registry model ships when
there is a customer whose compliance team requires it.

What cannot be deferred: **the wire format**. `session_profile`,
`smtp_targets`, and future federation-adjacent tables must still carry
`schema_version` from day one. Privacy-preserving shape
(`{seen, count, first_seen}` aggregate-only, no attacker identity) is
the right posture independent of trust model — minimizing leak surface
is always correct.

#### What this changes in the earlier Open Questions

- **#1 Root registry** — no longer v2-blocking. Deferred to v3 or
  whenever a federation outgrows social coordination.
- **#2 Revocation** — answered. "A kicks B; A's other federates see it
  in the topology-transparency view and make their own call." No CRL.
- **#3 Rate-limiting adversarial joins** — partially answered. Push-only
  eliminates the query-flood vector. Per-peer rate limits on push still
  apply to prevent contribution-ratio gaming via spam.
- **#4 Consent UX** — unchanged. Per-category opt-in with default-deny
  on PII-adjacent data is still the right shape.

#### Second round — scope-verified pull as a complementary channel

Follow-up design review revisited the "query IS the intel" problem from
a different angle. Push-only closes the recon attack but sacrifices the
defender's most useful query shape: "has anyone seen anything new about
MY brand today?"

**Proposal:** operators verify domain scope at registration time (ACME
dns-01 pattern — TXT record challenge), and pull queries are restricted
to data about scope-verified domains. BigBank can query about
`bigbank.com`; they cannot query about `competitor.com`.

**This is a genuine addition, not a replacement for push-only.** Both
channels coexist with different threat models:

- **Push-only channel** — peer-volunteered gossip. Handles the case
  where the domain being attacked is one the defender does NOT own
  (lookalikes, typosquats, newly-registered phishing infra). This is
  the defender's primary use case and scope-verified pull cannot serve
  it without fuzzy matching, which attackers abuse.
- **Scope-verified pull channel** — bounded "what's new about my
  verified scope" queries. Narrow, auditable, recon-resistant.

**Attacks considered on the pull channel:**

1. **Indirect-reference / lookalike queries.** The intel defenders most
   need is about domains they DON'T own (`bigbank-secure-login.support`
   targeting BigBank). Allowing fuzzy/lookalike matching under "claimed
   typo of my scope" reopens the recon lane. **Resolution: exact-match
   only on the pull channel. Lookalike intel flows through push-only.**
2. **Domain graveyard.** Attacker buys an expired domain, DNS-verifies,
   pulls historical phishing intel for a brand they're about to revive.
   **Mitigation: scope applies prospectively only — queries bounded to
   intel indexed since the operator's verification timestamp.** Bake in
   from day one, hard to retrofit.
3. **Subdomain scope inference.** Wildcard matching (`*.bigbank.com`)
   invites overreach. **Resolution: explicit list of scoped domains,
   each individually DNS-verified. No wildcard inference.**
4. **Self-lookup leaks coverage maps to the asker.** BigBank querying
   their scope learns which peers have visibility into BigBank-targeting
   campaigns. **Resolution: aggregate-only response
   (`{seen, count, first_seen}`) with no per-peer attribution.** Already
   implicit.
5. **MSSP multi-tenant churn.** An MSSP claims scope over 200 client
   brands; clients leave, the DECNET retains ex-scope.
   **Mitigation: periodic re-verification (weekly cadence) of every
   scoped domain's TXT record.**

**Residual, accepted:** scope-verified pull cannot serve queries about
domains the defender doesn't control. That's the structural limit —
push-only covers it.

**Net model for v2 federation:**
- Identity: Ed25519 keypair + DNS-verified scope list (explicit, not
  wildcard, periodic re-verify).
- Channel 1 — push: hash-evidenced contributions, peer-volunteered, no
  query surface.
- Channel 2 — pull: scope-verified, exact-match, prospective-only,
  aggregate-response, rate-limited.

---

## Campaign Clustering — DSL Evolution

The DSL currently models campaigns as linear phase sequences with clear actor assignments. Real campaigns are messier — phases overlap, actors share responsibilities, tool signatures drift over time. The fixtures don't test for overlapping phases or ambiguous actor assignments. That's probably fine for v1 — the six fixtures cover the known failure modes — but the replay tier will reveal whether you need to add fixtures for phase overlap or role ambiguity. The DSL has a natural extension path: concurrent phases, multi-actor per phase, probabilistic phase ordering. You don't need it now, but the design doesn't block it.

---

## Threat Intel Enrichment — Provider Backlog

Long list of candidate sources for `decnet/intel/`. Open / free-tier
prioritized; Shodan is the explicit paid exception. v1 ships three
(GreyNoise Community, AbuseIPDB, abuse.ch); the rest are post-v1 fodder
slotted in as demand surfaces.

### Reputation / abuse reports
- AbuseIPDB — community abuse scores, free 1k/day **[v1]**
- CrowdSec CTI — community blocklist API, free
- Spamhaus DROP/EDROP — hijacked netblocks, free
- CINS Score (Sentinel IPS) — reputation feed, free
- FireHOL IP lists — aggregated reputation (GitHub), free
- Project Honey Pot HTTP:BL — DNSBL for HTTP attackers, free
- Emerging Threats open — free blocklist

### Scanner / noise classification
- GreyNoise Community API — purpose-built for honeypot noise filtering, free **[v1]**
- DShield / SANS ISC — scanned-IP feeds, free
- Tor Project exit-node list — free, no key

### Active C2 / malware attribution
- abuse.ch Feodo Tracker — botnet C2 IPs, free, no key **[v1]**
- abuse.ch ThreatFox — IOCs from malware analysis, free **[v1]**
- abuse.ch URLhaus — malicious URLs, free
- abuse.ch SSLBL — malicious TLS certs, free
- abuse.ch MalwareBazaar — payload hashes (pairs with payload capture)
- AlienVault OTX — pulse-based IOCs, free with key

### Host scan / infrastructure
- Shodan — paid, cheap tiers (approved exception)
- Censys — free tier, host scan data
- BinaryEdge — ~250/mo free
- CIRCL passive DNS / passive SSL — free for researchers
- VirusTotal — 4 lookups/min free

### Network ownership / geo
- Team Cymru IP-to-ASN whois — free DNS-based, no key
- IPinfo — free tier, ASN/company
- MaxMind GeoLite2 — already in use (GeoIP mapping)

### Misc
- Cloudflare Radar — aggregate intel, free
- Pulsedive — IOC enrichment, free tier
- MISP communities — federated OSINT