The threat-intel surface was IP-keyed on day one as an expedient — the
worker is woken by IP-bearing bus events. ANTI's call: don't carry that
debt. NO IPs as primary keys anywhere on the attacker-intel surface.
Schema:
- attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid.
- attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column.
Updated on every upsert; useful for SIEM payloads and audit lookups,
but explicitly NOT a key. Model docstring says so.
- Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all()
builds the new shape on fresh DBs.
Repo:
- upsert_attacker_intel now keys on attacker_uuid.
- get_attacker_intel_by_ip → get_attacker_intel_by_uuid.
- get_unenriched_attacker_ips → get_unenriched_attackers, returning
[{uuid, ip}] tuples so the worker writes by UUID and dispatches
provider calls by IP without a second round-trip.
Worker:
- _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for
provider egress.
- attacker.intel.enriched bus payload gains attacker_uuid alongside
attacker_ip — webhook → SIEM consumers benefit; no removal.
API:
- GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace,
never deployed beyond dev).
- GET /api/v1/attackers/{uuid}/intel is the only public route, matching
every other /attackers/* route.
Frontend:
- <IntelPanel uuid={id!} /> uses the URL param directly, fetches in
parallel with the rest of AttackerDetail rather than waiting on
attacker.ip.
Tests: re-keyed in place, 39 passed (same coverage as before the
refactor). Provider-impl tests untouched.
DEBT-041: closed in DEBT.md (entry preserved as historical rationale,
summary table flipped to ✅, remaining-open list shortened by one).
CLI command mirrors the reuse-correlate shape (--poll-interval, --ttl-hours,
--daemon). Run it under systemd as a sibling worker.
The API endpoint returns the most recent cached row for an attacker IP
or 404. Auth-gated via require_viewer like every other attacker route.
Also extends the worker test with a real FakeBus so the
attacker.intel.enriched publish path is exercised end-to-end (no longer
a no-op against NullBus).
`for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...`
was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row,
drowning the rest of the ORIGIN section. The 100-IP wall is itself a
signal (WAF-bypass-list probing) that deserves a short badge, not a
flood.
Backend:
- get_attacker_ip_leaks gains `limit: int = 10` parameter — caller
only ever needs a sample, not the full set.
- New count_attacker_ip_leaks() returns the unbounded COUNT(*) via
one cheap SQL aggregate.
- Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N}
so the UI can render a rotation badge independent of list length.
UI:
- New LeakedIPsRow component. First 5 distinct IPs rendered inline
with hover tooltips (unchanged). When > 5, a `+ N more` expand
button reveals the rest of the sample; when total exceeds the
10-row cap, a subtle `(+M beyond sample)` note appears.
- When total ≥ 20, a red `ROTATION · N` tag renders leading the
row with a tooltip explaining the semantic: "almost certainly
XFF-rotation / WAF-bypass probing, not a real attribution leak."
DB churn is deliberately not capped — 100k rows × ~500 B is tolerable.
If it becomes a problem we can add an ingester-side count-and-skip;
for now the UX fix is the whole story.
Added test_ip_leaks_total_reported_separately_from_list asserting
the endpoint shape matches what the UI consumes.
Attackers routinely front their scanners with VPNs/proxies, so the
TCP source we log is the proxy egress, not the real host. But a
surprising number of attacker setups are misconfigured: the proxy
forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP
/ CDN-variant) header. From our side that's a free attribution leak.
New _detect_ip_leak extractor in decnet/web/ingester.py fires at
ingest time per HTTP request. Logic:
1. Require service=http, source_ip present, headers present.
2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or
CIDRs) → legitimate reverse-proxy forwarding, skip.
3. Walk proxy-family headers in priority order: Forwarded (RFC 7239)
→ X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP.
4. Extract the left-most parseable IP from the winning header.
5. If that IP differs from the TCP source → emit a bounty with
bounty_type="ip_leak" carrying {source_ip, real_ip_claim,
source_header, headers_seen, path, method}.
Storage is the existing Bounty table — no schema change; de-dup is
handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so
repeat requests with the same leaked IP don't spam.
AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN
listing distinct real_ip_claim values; hover tooltip shows the source
header + path of the most recent leak. Only shown when at least one
ip_leak bounty exists.
RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4,
IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only
IPs that actually parse.
Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For
mismatches". Phase 3 of the three-phase Attacker Intelligence series
(phases 1: scanned-vs-interacted, 2: PTR records already shipped).
DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's
"revisit when verified-proxy config lands" note — same token set
future rate-limit work will consume.
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.
Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:
- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
activity (file_captured/upload/download_attempt/retr), pub/sub
(publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
unknown_*).
- Everything else defaults to scan. Conservative by design: new
template verbs show up as "scanned" until explicitly promoted.
Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.
Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.
Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.
Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail
(admin) endpoints, plus two new sections on the attacker detail page:
VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL
with a drawer that decodes headers, lists attachments, and downloads the raw
.eml via the existing artifact endpoint (?service=smtp).
Adds get_attacker_transcripts (mirror of artifacts for session_recorded
logs) and get_session_log for sid→shard resolution. New
/api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events
out of the shared JSONL day-shard via an mtime-keyed byte-offset index
— never scans the whole shard per request. New
/api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both
endpoints admin-gated.
Schemathesis was failing CI on routes that returned status codes not
declared in their OpenAPI responses= dicts. Adds the missing codes
across swarm_updates, swarm_mgmt, swarm, fleet and attackers routers.
Also adds 400 to every POST/PUT/PATCH that accepts a JSON body —
Starlette returns 400 on malformed/non-UTF8 bodies before FastAPI's
422 validation runs, which schemathesis fuzzing trips every time.
No handler logic changed.
Adds the server-side wiring and frontend UI to surface files captured
by the SSH honeypot for a given attacker.
- New repository method get_attacker_artifacts (abstract + SQLModel
impl) that joins the attacker's IP to `file_captured` log rows.
- New route GET /attackers/{uuid}/artifacts.
- New router /artifacts/{decky}/{service}/{stored_as} that streams a
quarantined file back to an authenticated viewer.
- AttackerDetail grows an ArtifactDrawer panel with per-file metadata
(sha256, size, orig_path) and a download action.
- ssh service fragment now sets NODE_NAME=decky_name so logs and the
host-side artifacts bind-mount share the same decky identifier.
Every /stats call ran SELECT count(*) FROM logs + SELECT count(DISTINCT
attacker_ip) FROM logs; every /logs and /attackers call ran an
unfiltered count for the paginator. At 500 concurrent users these
serialize through aiosqlite's worker threads and dominate wall time.
Cache at the router layer (repo stays dialect-agnostic):
- /stats response: 5s TTL
- /logs total (only when no filters): 2s TTL
- /attackers total (only when no filters): 2s TTL
Filtered paths bypass the cache. Pattern reused from api_get_config
and api_get_health (asyncio.Lock + time.monotonic window + lazy lock).
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
prevent xdist from distributing them across workers (fixture isolation)
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.
Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.
Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
- Add @require_role() decorators to all GET/POST/PUT endpoints
- Centralize role-based access control per memory: RBAC null-role bug required server-side gating
- Admin (manage_admins), Editor (write ops), Viewer (read ops), Public endpoints
- Removes client-side role checks as per memory: server-side UI gating is mandatory
New GET /attackers/{uuid}/commands?limit=&offset=&service= endpoint
serves commands with server-side pagination and optional service filter.
AttackerDetail frontend fetches commands from this endpoint with
page controls. Service badge filter now drives both the API query
and the local fingerprint filter.
API now accepts ?service=https to filter attackers by targeted service.
Service badges are clickable in both the attacker list and detail views,
navigating to a filtered view. Active filter shows as a dismissable tag.
Migrate Attacker model from IP-based to UUID-based primary key with
auto-migration for old schema. Add GET /attackers (paginated, search,
sort) and GET /attackers/{uuid} API routes. Rewrite Attackers.tsx as
a card grid with full threat info and create AttackerDetail.tsx as a
dedicated detail page with back navigation, stats, commands table,
and fingerprints.